Shunting yard algorithm (C)

From LiteratePrograms
Jump to: navigation, search
Other implementations: C | Perl

In this article, we describe an implementation of the Shunting yard algorithm in C. The algorithm is a simple way of parsing expressions in infix notation.

In this implementation we evaluate the parsed expression making a very basic calculator. The implementation could easily be modified to generate output in RPN.

This program expects an infix expression as the first argument. The expression is split into tokens, which can be either numbers, operators or parentheses.

Contents

[edit] The algorithm

The algorithm works by reading each token in order, and processing them by the following rules:

  • If it is a number token, it is pushed on a number stack (or the output queue, if we were generating RPN)
  • If it is an operator, pop operators from the operator stack and evaluate them with values from the number stack, until we reach one that has lower precedence. Then we push the original operator.
    • In the case of a right associative operator, we also stop if we reach an operator of the same precedence.
  • If it is a left parenthesis, just push it.
  • If it is a right parenthesis, we pop all operators (and evaluate them) until the matching left parenthesis. The parentheses are thrown away.
  • After all tokens are read, the remaining operators on stack are popped and evaluated.

[edit] Tokens

The main function reads the expression, generating the tokens using a hand-written tokenizer. For more complex examples, we might use a tool like Lex. To keep things simple, we only allow integer numbers and single-character operators. Whitespace is ignored.

<<main>>=

int main(int argc, char *argv[])
{
	char *expr;
	char *tstart=NULL;
	struct op_s startop={'X', 0, ASSOC_NONE, 0, NULL};	/* Dummy operator to mark start */
	struct op_s *op=NULL;
	int n1, n2;
	struct op_s *lastop=&startop;

	if(argc<2) {
		fprintf(stderr, "Usage: %s <expression>\n", argv[0]);
		return EXIT_FAILURE;
	}

	for(expr=argv[1]; *expr; ++expr) {
		if(!tstart) {

We use the getop() function to check if this is actually an operator.

<<main>>=
			if((op=getop(*expr))) {

That function returns a pointer to the right operator in the ops table, which contain information like:

  • Precedence
  • Association
  • If it is an unary operator
  • An evaluation function
<<ops>>=
enum {ASSOC_NONE=0, ASSOC_LEFT, ASSOC_RIGHT};

struct op_s {
	char op;
	int prec;
	int assoc;
	int unary;
	int (*eval)(int a1, int a2);
} ops[]={
	{'_', 10, ASSOC_RIGHT, 1, eval_uminus},
	{'^', 9, ASSOC_RIGHT, 0, eval_exp},
	{'*', 8, ASSOC_LEFT, 0, eval_mul},
	{'/', 8, ASSOC_LEFT, 0, eval_div},
	{'%', 8, ASSOC_LEFT, 0, eval_mod},
	{'+', 5, ASSOC_LEFT, 0, eval_add},
	{'-', 5, ASSOC_LEFT, 0, eval_sub},
	{'(', 0, ASSOC_NONE, 0, NULL},
	{')', 0, ASSOC_NONE, 0, NULL}
};

struct op_s *getop(char ch)
{
	int i;
	for(i=0; i<sizeof ops/sizeof ops[0]; ++i) {
		if(ops[i].op==ch) return ops+i;
	}
	return NULL;
}

The '-' character can actually mean either the binary subtract operator or the unary negate operator. To find out which it is, we use the fact that a binary operator can not show up right after another operator or a left parenthesis.

To separate them, we use the '_' character for the unary variant.

<<main>>=
				if(lastop && (lastop==&startop || lastop->op!=')')) {
					if(op->op=='-') op=getop('_');
					else if(op->op!='(') {
						fprintf(stderr, "ERROR: Illegal use of binary operator (%c)\n", op->op);
						exit(EXIT_FAILURE);
					}
				}
				shunt_op(op);
				lastop=op;
			} else if(isdigit(*expr)) tstart=expr;
			else if(!isspace(*expr)) {
				fprintf(stderr, "ERROR: Syntax error\n");
				return EXIT_FAILURE;
			}
		} else {
			if(isspace(*expr)) {
				push_numstack(atoi(tstart));
				tstart=NULL;
				lastop=NULL;
			} else if((op=getop(*expr))) {
				push_numstack(atoi(tstart));
				tstart=NULL;
				shunt_op(op);
				lastop=op;
			} else if(!isdigit(*expr)) {
				fprintf(stderr, "ERROR: Syntax error\n");
				return EXIT_FAILURE;
			}
		}
	}
	if(tstart) push_numstack(atoi(tstart));

After all tokens are handled, we use a simple loop to evaluate all remaining tokens on top of the operator stack.

<<main>>=

	while(nopstack) {
		op=pop_opstack();
		n1=pop_numstack();
		if(op->unary) push_numstack(op->eval(n1, 0));
		else {
			n2=pop_numstack();
			push_numstack(op->eval(n2, n1));
		}
	}

When we are done, there better be exactly one value left on the number stack. This is the result of our expression, and is printed.

<<main>>=
	if(nnumstack!=1) {
		fprintf(stderr, "ERROR: Number stack has %d elements after evaluation. Should be 1.\n", nnumstack);
		return EXIT_FAILURE;
	}
	printf("%d\n", numstack[0]);

	return EXIT_SUCCESS;
}

[edit] Operators

Handling number tokens is simple, we just use the push_numstack() helper function. Operators are not that simple, so we handle it in a separate function.

<<shunt_op>>=
void shunt_op(struct op_s *op)
{
	struct op_s *pop;
	int n1, n2;

As mentioned earlier, we need to handle parentheses specially. A left parenthesis is simply pushed on the stack.

<<shunt_op>>=
	if(op->op=='(') {
		push_opstack(op);
		return;

When we get a right parenthesis, we need to evaluate everything until the matching left-parenthesis. The result of this evaluation is just pushed on the number stack.

<<shunt_op>>=
	} else if(op->op==')') {
		while(nopstack>0 && opstack[nopstack-1]->op!='(') {
			pop=pop_opstack();
			n1=pop_numstack();

When evaluating an unary operator, we only use one value from the number stack.

<<shunt_op>>=
			if(pop->unary) push_numstack(pop->eval(n1, 0));
			else {
				n2=pop_numstack();
				push_numstack(pop->eval(n2, n1));
			}
		}

Having a right-parenthesis without a match is obviously an error.

<<shunt_op>>=
		if(!(pop=pop_opstack()) || pop->op!='(') {
			fprintf(stderr, "ERROR: Stack error. No matching \'(\'\n");
			exit(EXIT_FAILURE);
		}
		return;
	}

	if(op->assoc==ASSOC_RIGHT) {
		while(nopstack && op->prec<opstack[nopstack-1]->prec) {
			pop=pop_opstack();
			n1=pop_numstack();
			if(pop->unary) push_numstack(pop->eval(n1, 0));
			else {
				n2=pop_numstack();
				push_numstack(pop->eval(n2, n1));
			}
		}
	} else {
		while(nopstack && op->prec<=opstack[nopstack-1]->prec) {
			pop=pop_opstack();
			n1=pop_numstack();
			if(pop->unary) push_numstack(pop->eval(n1, 0));
			else {
				n2=pop_numstack();
				push_numstack(pop->eval(n2, n1));
			}
		}
	}
	push_opstack(op);
}

[edit] Support functions

We provide evaluation functions for all our operators. The unary minus operator is special in that it just ignores it's second argument.

<<eval functions>>=
int eval_uminus(int a1, int a2) 
{
	return -a1;
}
int eval_exp(int a1, int a2)
{
	return a2<0 ? 0 : (a2==0?1:a1*eval_exp(a1, a2-1));
}
int eval_mul(int a1, int a2) 
{
	return a1*a2;
}
int eval_div(int a1, int a2) 
{
	if(!a2) {
		fprintf(stderr, "ERROR: Division by zero\n");
		exit(EXIT_FAILURE);
	}
	return a1/a2;
}
int eval_mod(int a1, int a2) 
{
	if(!a2) {
		fprintf(stderr, "ERROR: Division by zero\n");
		exit(EXIT_FAILURE);
	}
	return a1%a2;
}
int eval_add(int a1, int a2) 
{
	return a1+a2;
}
int eval_sub(int a1, int a2) 
{
	return a1-a2;
}

We also provide special functions for operating our two stacks.

<<stack>>=
struct op_s *opstack[MAXOPSTACK];
int nopstack=0;

int numstack[MAXNUMSTACK];
int nnumstack=0;

void push_opstack(struct op_s *op)
{
	if(nopstack>MAXOPSTACK-1) {
		fprintf(stderr, "ERROR: Operator stack overflow\n");
		exit(EXIT_FAILURE);
	}
	opstack[nopstack++]=op;
}

struct op_s *pop_opstack()
{
	if(!nopstack) {
		fprintf(stderr, "ERROR: Operator stack empty\n");
		exit(EXIT_FAILURE);
	}
	return opstack[--nopstack];
}

void push_numstack(int num)
{
	if(nnumstack>MAXNUMSTACK-1) {
		fprintf(stderr, "ERROR: Number stack overflow\n");
		exit(EXIT_FAILURE);
	}
	numstack[nnumstack++]=num;
}

int pop_numstack()
{
	if(!nnumstack) {
		fprintf(stderr, "ERROR: Number stack empty\n");
		exit(EXIT_FAILURE);
	}
	return numstack[--nnumstack];
}

[edit] Building and running

This program should build on most C compilers, but is only tested on gcc on NetBSD (using gcc -o shunt shunt.c). It is invoked like this: ./shunt '1+2*3' on UNIX-like systems.

<<shunt.c>>=

#include<stdlib.h>
#include<stdio.h>
#include<ctype.h>

#define MAXOPSTACK 64
#define MAXNUMSTACK 64

eval functions

ops

stack

shunt_op

main

Test code for this program is available.
When making changes, please make sure it passes all tests in the test procedure.

Download code
hijacker
hijacker
hijacker
hijacker