Shunting yard algorithm (Perl)

From LiteratePrograms
Jump to: navigation, search
Other implementations: C | Perl | Python

In this article, we describe an implementation of the Shunting yard algorithm in Perl. The algorithm is a simple way of parsing expressions in infix notation.

In this implementation we generate a form of RPN which is readable by the dc UNIX tool.

Contents

[edit] Tokens

The first step is to split the input string containing the expression into tokens. The only token types we support are numbers and operators (+, -, *, /) and parentheses. This is easily accomplished with the split function.

<<tokens>>=
my @tokens=split(/\ *([\+\-\*\/\(\)]|\d+\.\d+|\d+) */, $expr);

[edit] Operators

We use a hash to store the operator precedences. The unary - operator is called -u to distinguish it from the binary operator with the same name.

<<operators>>=
my %prec=('-u'=>5, '*'=>4, '/'=>3, '+'=>2, '-'=>1, '('=>0, ''=>9);

Right-associative operators must be treated specially. In this example, we only have one (the unary -), and store it in another hash.

<<operators>>=
my %right=('-u'=>1);

We provide a function to easily find the operator precedence.

<<operators>>=
sub getprec {
	my ($op)=@_;
	return exists $prec{$op}?$prec{$op}:-1;
}

The way we used the split function, will leave some empty tokens. We just ignore them.

A unary operator is the first token or any operator that is preceded by another operator (not parentheses).

<<tokens-preprocessing>>=
	!$token and next;

	if($token eq '-' and getprec($last)>=0) {$token='-u';}

[edit] The parser

The shunting yard algorithm is quite simple. All numbers are added to the output stream (here represented by @rpn). Operators are pushed on a stack. Each time we reach a new operator, we pop operators from the stack until we reach one that has lower precedence. In the case of a right associative operator, we also stop if we reach an operator of the same precedence.

All popped operators are appended to the output stream.

When we reach a right parenthesis, we pop all operators until the matching left parenthesis. The parentheses are thrown away.

<<parsing>>=
# Parsing

my @op_stack;
my @rpn;

my $last="";
foreach my $token (@tokens) {
tokens-preprocessing

	if($token=~/^\d+$|^\d+\.\d+$/) {
		if($last=~/^\d+$|^\d+\.\d+$/ || $last eq ")") {
			die "Value tokens must be separated by an operator";
		}
		push(@rpn, $token);
	} elsif($token eq '(') {
		push(@op_stack, $token);
	} elsif($token eq ')') {
		while($op_stack[-1] ne '(') {
			my $t=pop(@op_stack);
			push(@rpn, $t);
		}
		pop(@op_stack) eq '(' or die "No matching (";
	} elsif((my $pr=getprec($token))>0) {
		if(exists $right{$token}) {
			while(scalar @op_stack>0 && $pr<getprec($op_stack[-1])) {
				push(@rpn, pop(@op_stack));
			}
		} else {
			while(scalar @op_stack>0 && $pr<=getprec($op_stack[-1])) {
				push(@rpn, pop(@op_stack));
			}
		}
		push(@op_stack, $token);
	} else {
		die "Unknown token: \"$token\"";
	}
	$last=$token;
}

When we have reached the end of the input stream, all remaining operators are popped and appended to the output stream.

<<parsing>>=

while(scalar @op_stack>0) {
	push(@rpn, pop(@op_stack));
}


[edit] Output

As we want to generate code for dc, we have to do some small adjustments in the output stream.

The unary - operator doesn't exist in dc, so we have to fake it by multiplying with -1, which is encoded as _1 in dc.

We also want the result to be printed, so we append the p command.

<<output>>=

foreach my $token (@rpn) {
	if($token eq '-u') {print '_1* ';}
	else {print "$token ";}
}
print "p\n";

[edit] The program

This program expects an expression on the command line. The output can be piped to dc like this: ./shunting-yard.perl '1+2*3' | dc.

<<shunting-yard.perl>>=
#!/usr/bin/env perl

use strict;
use warnings;

my ($expr)=@ARGV;

if(!$expr) {
	print "Usage $0 expression\n";
	exit 1;
}

tokens

operators

parsing

output

Download code
hijacker
hijacker
hijacker
hijacker