# Shunting yard algorithm (Perl)

In this article, we describe an implementation of the Shunting yard algorithm in Perl. The algorithm is a simple way of parsing expressions in infix notation.

In this implementation we generate a form of RPN which is readable by the dc UNIX tool.

## Contents |

## [edit] Tokens

The first step is to split the input string containing the expression into tokens. The only token types we support are numbers and operators (+, -, *, /) and parentheses. This is easily accomplished with the *split* function.

<<tokens>>=my@tokens=split(/\ *([\+\-\*\/\(\)]|\d+\.\d+|\d+) */, $expr);

## [edit] Operators

We use a hash to store the operator precedences. The unary *-* operator is called *-u* to distinguish it from the binary operator with the same name.

<<operators>>=my%prec=('-u'=>5, '*'=>4, '/'=>3, '+'=>2, '-'=>1, '('=>0, ''=>9);

Right-associative operators must be treated specially. In this example, we only have one (the unary *-*), and store it in another hash.

<<operators>>=my%right=('-u'=>1);

We provide a function to easily find the operator precedence.

<<operators>>=sub getprec {my($op)=@_;returnexists$prec{$op}?$prec{$op}:-1;}

The way we used the *split* function, will leave some empty tokens. We just ignore them.

A unary operator is the first token or any operator that is preceded by another operator (not parentheses).

<<tokens-preprocessing>>=!$tokenandnext;if($tokeneq'-'andgetprec($last)>=0){$token='-u';}

## [edit] The parser

The shunting yard algorithm is quite simple. All numbers are added to the output stream (here represented by *@rpn*). Operators are pushed on a stack. Each time we reach a new operator, we pop operators from the stack until we reach one that has lower precedence. In the case of a right associative operator, we also stop if we reach an operator of the same precedence.

All popped operators are appended to the output stream.

When we reach a right parenthesis, we pop all operators until the matching left parenthesis. The parentheses are thrown away.

<<parsing>>=# Parsingmy@op_stack;my@rpn;my$last="";foreachmy$token(@tokens){tokens-preprocessingif($token=~/^\d+$|^\d+\.\d+$/){if($last=~/^\d+$|^\d+\.\d+$/ || $lasteq")"){die"Value tokens must be separated by an operator";}push(@rpn, $token);}elsif($tokeneq'('){push(@op_stack, $token);}elsif($tokeneq')'){while($op_stack[-1]ne'('){my$t=pop(@op_stack);push(@rpn, $t);}pop(@op_stack)eq'('ordie"No matching (";}elsif((my$pr=getprec($token))>0){if(exists$right{$token}){while(scalar@op_stack>0 && $pr<getprec($op_stack[-1])){push(@rpn,pop(@op_stack));}}else{while(scalar@op_stack>0 && $pr<=getprec($op_stack[-1])){push(@rpn,pop(@op_stack));}}push(@op_stack, $token);}else{die"Unknown token: \"$token\"";}$last=$token;}

When we have reached the end of the input stream, all remaining operators are popped and appended to the output stream.

<<parsing>>=while(scalar@op_stack>0){push(@rpn,pop(@op_stack));}

## [edit] Output

As we want to generate code for *dc*, we have to do some small adjustments in the output stream.

The unary *-* operator doesn't exist in *dc*, so we have to fake it by multiplying with *-1*, which is encoded as *_1* in *dc*.

We also want the result to be printed, so we append the *p* command.

<<output>>=foreachmy$token(@rpn){if($tokeneq'-u'){}else{}}

## [edit] The program

This program expects an expression on the command line. The output can be piped to *dc* like this: `./shunting-yard.perl '1+2*3' | dc`

.

<<shunting-yard.perl>>=#!/usr/bin/env perlusestrict;usewarnings;my($expr)=@ARGV;if(!$expr){exit1;}tokens operators parsing output

Download code |