This autumn, I worked as one of the teaching assistants in EDAA25, the C programming course at Lund University’s Computer Science department. This mainly involves grading the programming assignments the students are required to hand in during the course. After grading a lot of solutions, I’m amazed by how many students have difficulties structuring their code in a reasonable and readable way. Many solutions barely work, or rely on undefined behaviour to work, but even the correct ones are extremely difficult to read and understand. A lot of students write functions that span multiple printed pages. The students are mostly fourth or fifth year students pursuing a master of computer science and engineering degree. I believe they ought to be able to do better.
One could probably draw some conclusions from this regarding how much programming the students have actually learned during their first 3–4 years of studying. However, that’s for another post. In this post, I’m presenting a solution for part of the third programming assignment which I think is ideal and easy to understand.
Before continuing, allow me to state the obvious: If you are taking the course, stop reading now and solve the assignment on your own! You will learn nothing from copying this solution and it will only hurt you during the exam (which is usually pretty excruciating). Also, the parsing function that I’m presenting here is not the must difficult part of the assignment.
Problem Description
The assignment consists of implementing a couple of function that can parse, multiply and print polynomials. The format of a polynomial is fairly simple:
x^2 - 2x + 1
I will only present an implementation of the parsing function.
Structure of a Polynomial
Below is a simplified (but hopefully complete) EBNF notation of a polynomial (optional whitespace is left out).
polynomial = term , { sign, term } ;
term = coefficient
| [ coefficient ] , variable
| [ coefficient ] , variable , exponent ;
sign = "+" | "-" ;
coefficient = number ;
variable = "x" ;
exponent = "^" , number ;
number = digit , { digit } ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
| "8" | "9" ;
That is, a polynomial consists of a number of terms separated by a sign. A term consists of three cases:
- Only a coefficient, e.g.
2
- An optional coefficient and a variable, e.g.
2x
orx
- An optional coefficient, followed by a variable and an exponent, e.g.
2x^2
orx^100
.
Here are some example polynomials:
1
x^2 + 3
x^1000 - 6x + 2
Implementation
The students are given some handout code that specifies which functions and data types should be implemented:
typedef struct poly_t poly_t;
/* Parse the input string and return a new poly_t. */
poly_t* new_poly_from_string(const char*);
/* Free the input poly_t. */
void free_poly(poly_t*);
/* Multiply the two input poly_t's. */
poly_t* mul(poly_t*, poly_t*);
/* Print the input poly_t in the same format we parsed above. */
void print_poly(poly_t*);
Thus, the students need to implement one data type (poly_t
) and four
functions (new_poly_from_string()
, free_poly()
, mul()
,
print_poly()
). The only error checking the students are required to do
is to detect the error, print a message about it and call
exit(EXIT_FAILURE)
.
Since the students aren’t given any limit on the number of terms their implementations should handle, it is expected that their solution should build on a linked list (I believe this is mentioned during one of the lectures). I.e. each term in a polynomial is represented by a node in the linked list. The data structure looks like this:
/* Represents a term in a polynomial. */
struct poly_t {
int e; /* Exponent of this term. */
int c; /* Coefficient of this term.*/
poly_t* next; /* The next term. */
};
This seems like a fairly straight forward and simple, yet still flexible, way to represent a polynomial. However, many students try to define the data structure in much more creative ways which often involve multiple arrays and a guinea pig. (By the way, do a Google Image search for guinea pigs, they are absolutely adorable!)
Next, we’ll get to the point of this post: The implementation of the
new_poly_from_string
function that parses a string containing a
polynomial. This solution is split into a couple of small and easily
understood helper functions. The students seem to have forgotten that
you don’t need to implement all functionality in a single function.
The legibility is especially important since the students are required to print out their solutions and we (as teaching assistants) have to grade these printed versions1. After realizing that basically all implementations of this function span at least 1.5 pages, I found myself wishing that the students would think about readability for even one second before handing in their solutions.
The idea of my implementation is to do a simple one pass parse of the
string and implement an accept_*
function for each part it parses.
We’ll use a top-down approach so let’s start with the finished function
that parses the polynomial:
/* Struct to keep track of the position in the string. */
typedef struct parser_ctx_t {
const char* str;
} parser_ctx_t;
poly_t* new_poly_from_string(const char* str)
{
poly_t* poly;
poly_t* term;
parser_ctx_t ctx;
poly = NULL;
term = NULL;
ctx.str = str;
while (*ctx.str) {
if (term == NULL) {
/* Parse the first term. */
term = accept_term(&ctx);
/* Keep a pointer to the first term. */
poly = term;
} else {
term->next = accept_term(&ctx);
/* Advance to the new term. */
term = term->next;
}
}
return poly;
}
Look how clean and easy to understand that function is (and it would print nicely)!
OK, let’s go through this code a bit. The first thing we do is to define
a type called parser_ctx_t
that holds a pointer to the current
position in the string. An instance of this type will be passed around
to the accept_*
functions in order for them to advance the string
pointer. This could be done by passing around a pointer to the pointer
to the string directly, but frankly I like this better. Mostly because
it looks nicer.
The rest of the function is fairly straight forward. As long as there is
anything left to parse from the string, we parse another term by calling
accept_term()
. The loop simply builds up the linked list one term at a
time and keeps a reference to the first term. The poly
pointer will
point to the first term (and thus be returned at the end) while term
will point to the last parsed term. Next, let’s look at accept_term()
:
static poly_t* new_term(int c, int e)
{
poly_t* poly;
poly = malloc(sizeof(poly_t));
if (poly == NULL)
error(__FILE__, __LINE__, __func__,
"Failed to allocate poly.");
poly->c = c;
poly->e = e;
poly->next = NULL;
return poly;
}
static poly_t* accept_term(parser_ctx_t* ctx)
{
poly_t* term;
int c;
int e;
c = accept_coefficient(ctx);
e = accept_exponent(ctx);
term = new_term(c, e);
return term;
}
Well, that’s simple. We have a helper function to allocate a new term
and perform error checking on the memory allocation. The error()
function is provided to the students in the handout code. It simply
prints the message in a nicely formatted way and exits the program with
exit code EXIT_FAILURE
.
The accept_term()
function is not very interesting. It uses two helper
functions to parse the coefficient and exponent of a term and then
creates and returns a new term. Let’s continue to the two helper
functions accept_term()
uses:
static int accept_coefficient(parser_ctx_t* ctx)
{
int c; /* The coefficient to parse. */
int sign; /* The sign of that coefficient. */
sign = accept_sign(ctx);
c = accept_number(ctx);
return c * sign;
}
static int accept_exponent(parser_ctx_t* ctx)
{
int e; /* The exponent to parse. */
e = 0; /* Default exponent to zero. */
accept_space(ctx);
if (*ctx->str == 'x') {
e = 1;
ctx->str++;
if (*ctx->str == '^') {
ctx->str++;
e = accept_number(ctx);
}
}
return e;
}
accept_coefficient()
is fairly uninteresting as well, we accept a sign
(i.e. a plus or a minus) and then a number. accept_exponent()
on the
other hand is a bit more interesting. For good measure, we use
accept_space()
to discard any whitespace before the actual exponent.
Now we have a couple of cases:
- If we have no variable (i.e. an x), we default the exponent to zero.
- If we have a variable but no caret, we set the exponent to one.
- If we also have a caret, we parse the number succeeding the caret.
I should probably point out that the students are not required to handle negative exponents although it would be trivial to add in my solution.
The last helper functions are accept_space()
, accept_sign()
and
accept_number()
:
static void accept_space(parser_ctx_t* ctx)
{
while (isspace(*ctx->str))
ctx->str++;
}
static int accept_sign(parser_ctx_t* ctx)
{
int sign; /* The sign to parse. */
sign = 1; /* Default to positive sign. */
accept_space(ctx);
if (*ctx->str == '-') {
sign = -1;
ctx->str++;
} else if (*ctx->str == '+') {
ctx->str++;
}
return sign;
}
static int accept_number(parser_ctx_t* ctx)
{
int num; /* The coefficient to parse. */
char* end; /* The end pointer returned by strtol. */
num = strtol(ctx->str, &end, 10);
if (ctx->str == end) {
/* No number to parse, default to 1. */
num = 1;
}
ctx->str = end;
return num;
}
accept_space()
simply discards any whitespace in the string.
accept_sign()
parses a sign and returns +/- 1 which allows us to
multiply it with a number when parsing the coefficient.
accept_number()
is a bit more interesting and I’m pretty satisfied
with the implementation. It leans heavily on strtol()
but using
functions in the C standard library is obviously OK in the course.
strtol()
discards any whitespace and then parses a number from the
string, stopping at the first non-digit. It stores the location at which
it stops in the pointer provided as the second argument. If the pointer
it set is the same as the one we provided, it didn’t parse any number.
If this is the case, we default the parsed number to 1. This works in
both our cases when we need to parse numbers:
- If we try to parse a coefficient and doesn’t have a number to parse,
we have a term with only a variable (e.g.
x
orx^23
) which should have the coefficient 1. - If we try to parse an exponent and there’s no caret, then we know we
should skip calling
accept_number()
since there’s no number to parse. If we do have a caret in the string, there will be a number to parse andaccept_number()
will succeed in parsing it.
And that’s it! A simple implementation which is easy to understand and follow.
Finishing thoughts
While there are most likely some minor problems with this solution, it is simple to understand and parses well-formatted polynomials which is what the assignment requires. The main point I wanted to make here was that it is possible to implement this function in a way that doesn’t require complicated lookahead/lookbehind or multiple passes through the string while making it readable at the same time.
- I have my reservations about this system as well but it’s the one we’re stuck with for various reasons [return]