Parsing Polynomials in C

This autumn, I worked as one of the teaching assistants in EDAA25, the C programming course at Lund University’s Computer Science department. This mainly involves grading the programming assignments the students are required to hand in during the course. After grading a lot of solutions, I’m amazed by how many students have difficulties structuring their code in a reasonable and readable way. Many solutions barely work, or rely on undefined behaviour to work, but even the correct ones are extremely difficult to read and understand. A lot of students write functions that span multiple printed pages. The students are mostly fourth or fifth year students pursuing a master of computer science and engineering degree. I believe they ought to be able to do better.

One could probably draw some conclusions from this regarding how much programming the students have actually learned during their first 3–4 years of studying. However, that’s for another post. In this post, I’m presenting a solution for part of the third programming assignment which I think is ideal and easy to understand.

Before continuing, allow me to state the obvious: If you are taking the course, stop reading now and solve the assignment on your own! You will learn nothing from copying this solution and it will only hurt you during the exam (which is usually pretty excruciating). Also, the parsing function that I’m presenting here is not the must difficult part of the assignment.

Problem Description

The assignment consists of implementing a couple of function that can parse, multiply and print polynomials. The format of a polynomial is fairly simple:

x^2 - 2x + 1

I will only present an implementation of the parsing function.

Structure of a Polynomial

Below is a simplified (but hopefully complete) EBNF notation of a polynomial (optional whitespace is left out).

polynomial  = term , { sign, term } ;
term        = coefficient
            | [ coefficient ] , variable
            | [ coefficient ] , variable , exponent ;
sign        = "+" | "-" ;
coefficient = number ;
variable    = "x" ;
exponent    = "^" , number ;
number      = digit , { digit } ;
digit       = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
            | "8" | "9" ;

That is, a polynomial consists of a number of terms separated by a sign. A term consists of three cases:

Only a coefficient, e.g. 2
An optional coefficient and a variable, e.g. 2x or x
An optional coefficient, followed by a variable and an exponent, e.g. 2x^2 or x^100.

Here are some example polynomials:

 1
 x^2 + 3
 x^1000 - 6x + 2

Implementation

The students are given some handout code that specifies which functions and data types should be implemented:

typedef struct poly_t poly_t;

/* Parse the input string and return a new poly_t. */
poly_t* new_poly_from_string(const char*);
/* Free the input poly_t. */
void free_poly(poly_t*);
/* Multiply the two input poly_t's. */ 
poly_t* mul(poly_t*, poly_t*);
/* Print the input poly_t in the same format we parsed above. */ 
void print_poly(poly_t*);

Thus, the students need to implement one data type (poly_t) and four functions (new_poly_from_string(), free_poly(), mul(), print_poly()). The only error checking the students are required to do is to detect the error, print a message about it and call exit(EXIT_FAILURE).

Since the students aren’t given any limit on the number of terms their implementations should handle, it is expected that their solution should build on a linked list (I believe this is mentioned during one of the lectures). I.e. each term in a polynomial is represented by a node in the linked list. The data structure looks like this:

/* Represents a term in a polynomial. */
struct poly_t {
	int	e;	/* Exponent of this term. */
	int	c;	/* Coefficient of this term.*/
	poly_t*	next;	/* The next term. */
};

This seems like a fairly straight forward and simple, yet still flexible, way to represent a polynomial. However, many students try to define the data structure in much more creative ways which often involve multiple arrays and a guinea pig. (By the way, do a Google Image search for guinea pigs, they are absolutely adorable!)

Next, we’ll get to the point of this post: The implementation of the new_poly_from_string function that parses a string containing a polynomial. This solution is split into a couple of small and easily understood helper functions. The students seem to have forgotten that you don’t need to implement all functionality in a single function.

The legibility is especially important since the students are required to print out their solutions and we (as teaching assistants) have to grade these printed versions¹. After realizing that basically all implementations of this function span at least 1.5 pages, I found myself wishing that the students would think about readability for even one second before handing in their solutions.

The idea of my implementation is to do a simple one pass parse of the string and implement an accept_* function for each part it parses. We’ll use a top-down approach so let’s start with the finished function that parses the polynomial:

/* Struct to keep track of the position in the string. */
typedef struct parser_ctx_t {
	const char*	str;
} parser_ctx_t;

poly_t* new_poly_from_string(const char* str)
{
	poly_t*		poly;
	poly_t*		term;
	parser_ctx_t	ctx;

	poly	= NULL;
	term	= NULL;
	ctx.str	= str;

	while (*ctx.str) {
		if (term == NULL) {
			/* Parse the first term. */
			term = accept_term(&ctx);
			/* Keep a pointer to the first term. */
			poly = term;
		} else {
			term->next = accept_term(&ctx);
			/* Advance to the new term. */
			term = term->next;
		}
	}

	return poly;
}

Look how clean and easy to understand that function is (and it would print nicely)!

OK, let’s go through this code a bit. The first thing we do is to define a type called parser_ctx_t that holds a pointer to the current position in the string. An instance of this type will be passed around to the accept_* functions in order for them to advance the string pointer. This could be done by passing around a pointer to the pointer to the string directly, but frankly I like this better. Mostly because it looks nicer.

The rest of the function is fairly straight forward. As long as there is anything left to parse from the string, we parse another term by calling accept_term(). The loop simply builds up the linked list one term at a time and keeps a reference to the first term. The poly pointer will point to the first term (and thus be returned at the end) while term will point to the last parsed term. Next, let’s look at accept_term():

static poly_t* new_term(int c, int e)
{
	poly_t*	poly;

	poly	= malloc(sizeof(poly_t));
	if (poly == NULL)
		error(__FILE__, __LINE__, __func__,
		      "Failed to allocate poly.");

	poly->c		= c;
	poly->e		= e;
	poly->next	= NULL;

	return poly;
}

static poly_t* accept_term(parser_ctx_t* ctx)
{
	poly_t*	term;
	int	c;
	int	e;

	c	= accept_coefficient(ctx);
	e	= accept_exponent(ctx);
	term	= new_term(c, e);

	return term;
}

Well, that’s simple. We have a helper function to allocate a new term and perform error checking on the memory allocation. The error() function is provided to the students in the handout code. It simply prints the message in a nicely formatted way and exits the program with exit code EXIT_FAILURE.

The accept_term() function is not very interesting. It uses two helper functions to parse the coefficient and exponent of a term and then creates and returns a new term. Let’s continue to the two helper functions accept_term() uses:

static int accept_coefficient(parser_ctx_t* ctx)
{
	int	c;	/* The coefficient to parse. */
	int	sign;	/* The sign of that coefficient. */

	sign	= accept_sign(ctx);
	c	= accept_number(ctx);

	return c * sign;
}

static int accept_exponent(parser_ctx_t* ctx)
{
	int	e;	/* The exponent to parse. */

	e	= 0;	/* Default exponent to zero. */

	accept_space(ctx);
	if (*ctx->str == 'x') {
		e = 1;
		ctx->str++;

		if (*ctx->str == '^') {
			ctx->str++;
			e = accept_number(ctx);
		}
	}

	return e;
}

accept_coefficient() is fairly uninteresting as well, we accept a sign (i.e. a plus or a minus) and then a number. accept_exponent() on the other hand is a bit more interesting. For good measure, we use accept_space() to discard any whitespace before the actual exponent. Now we have a couple of cases:

If we have no variable (i.e. an x), we default the exponent to zero.
If we have a variable but no caret, we set the exponent to one.
If we also have a caret, we parse the number succeeding the caret.

I should probably point out that the students are not required to handle negative exponents although it would be trivial to add in my solution.

The last helper functions are accept_space(), accept_sign() and accept_number():

static void accept_space(parser_ctx_t* ctx)
{
	while (isspace(*ctx->str))
		ctx->str++;
}

static int accept_sign(parser_ctx_t* ctx)
{
	int	sign;	/* The sign to parse. */

	sign	= 1;	/* Default to positive sign. */

	accept_space(ctx);

	if (*ctx->str == '-') {
		sign = -1;
		ctx->str++;
	} else if (*ctx->str == '+') {
		ctx->str++;
	}

	return sign;
}

static int accept_number(parser_ctx_t* ctx)
{
	int	num;	/* The coefficient to parse. */
	char*	end;	/* The end pointer returned by strtol. */

	num	= strtol(ctx->str, &end, 10);
	if (ctx->str == end) {
		/* No number to parse, default to 1. */
		num	= 1;
	}
	ctx->str	= end;

	return num;
}

accept_space() simply discards any whitespace in the string. accept_sign() parses a sign and returns +/- 1 which allows us to multiply it with a number when parsing the coefficient.

accept_number() is a bit more interesting and I’m pretty satisfied with the implementation. It leans heavily on strtol() but using functions in the C standard library is obviously OK in the course. strtol() discards any whitespace and then parses a number from the string, stopping at the first non-digit. It stores the location at which it stops in the pointer provided as the second argument. If the pointer it set is the same as the one we provided, it didn’t parse any number. If this is the case, we default the parsed number to 1. This works in both our cases when we need to parse numbers:

If we try to parse a coefficient and doesn’t have a number to parse, we have a term with only a variable (e.g. x or x^23) which should have the coefficient 1.
If we try to parse an exponent and there’s no caret, then we know we should skip calling accept_number() since there’s no number to parse. If we do have a caret in the string, there will be a number to parse and accept_number() will succeed in parsing it.

And that’s it! A simple implementation which is easy to understand and follow.

Finishing thoughts

While there are most likely some minor problems with this solution, it is simple to understand and parses well-formatted polynomials which is what the assignment requires. The main point I wanted to make here was that it is possible to implement this function in a way that doesn’t require complicated lookahead/lookbehind or multiple passes through the string while making it readable at the same time.

I have my reservations about this system as well but it’s the one we’re stuck with for various reasons ^[return]

Erik Jansson

$ cat /dev/random

Parsing Polynomials in C

Problem Description

Structure of a Polynomial

Implementation

Finishing thoughts