Desk calculator (Python)

From LiteratePrograms
Jump to: navigation, search

An interpreter for a dc dialect, written in Python. It lies somewhere between a modern dc implementation and the Bell Labs 1971 version. It lacks support for input and output radix conversion and for floating point; however, it can produce arbitrary binary output, and has been confirmed to work with the following LiteratePrograms articles:

Contents

[edit] theory

An interpeter is basically a left fold over the source code, taking an initial state into the final state by applying each command in turn.

As dc programs are self-tokenizing — the source is already a byte-code — the interpeter itself is very simple.

The complications arise in providing the semantics of the two major dc datatypes

[edit] practice

[edit] data structures

There are three data structures in the interpreter:

  • a stack — to store already-evaluated values
  • a continuation — to store yet-to-be-evaluated code
  • a dictionary — to provide the abstract registers

The decisions to implement the stack and dictionary with a list and dict are rather obvious. As dc commands are (mostly) single bytes, the continuation is a list of bytes, for which we use a string.

<<data structures>>=
#############################################

s,c,d = [],"",{}

stack elements
continuation utilities

[edit] the inner loop

The interpreter itself is equally simple: while there are source bytecodes left in the continuation, we look up the corresponding commands, and execute them.

#############################################
<<interpreter>>=
auxiliary commands
cmds = {
command table
}
def interp(str):	
	cpush(str)
	while len(c):
		cmds[cpop()]()

[edit] printing (I)

For example, the codes p and f correspond to commands to print the top of the stack and the entire stack, respectively.

<<command table>>=
'p':	lambda: debugpr(s[-1:]),
'f':	lambda: debugpr(s),
<<auxiliary commands>>=
def debugpr(s):
	for v in s:
		print v

[edit] arithmetic

Arithmetic operations are a little more complicated. To implement each operation in the target language, we perform the corresponding operation in the host language, along with some explicit bookkeeping: stack manipulation and ensuring that the results are injected into our Numeric Value objects.

<<command table>>=
'+':	lambda: s.append(NVal(s.pop(-2) + s.pop())),
'-':	lambda: s.append(NVal(s.pop(-2) - s.pop())),
'*':	lambda: s.append(NVal(s.pop(-2) * s.pop())),
'/':	lambda: s.append(NVal(s.pop(-2) //s.pop())),
'%':	lambda: s.append(NVal(s.pop(-2) % s.pop())),
'^':	lambda: s.append(NVal(s.pop(-2) **s.pop())),
'~':	lambda: s.extend(map(NVal,divmod(s.pop(-2),s.pop()))),

Question: what should be done to make this code more robust to changes in execution order?

[edit] stack operations

The stack operations would be relatively straightforward, if it were not for the need to avoid statements in lambda expressions. clear and reverse/swap would be clearer as statements:

c → s[:]=[]
r → s[-3:]=s[-1:-3:-1] 
<<command table>>=
'd':	lambda: s.append(s[-1]),
'c':	lambda: s.__setitem__(slice( 0,None),[]),
'r':	lambda: s.__setitem__(slice(-3,None),s[-1:-3:-1]),

[edit] printing (II)

The command n to print a value, as well as z, to push the current size of the stack, are straightforward. P, to print, and Z, to determine the length of a value, have different behaviors depending upon whether or not the top of the stack is a number or a string.

<<command table>>=
'n':	lambda: write("%s" % s.pop()),
'P':	lambda: s.pop().prval(),
'Z':	lambda: s.append(NVal(s.pop().sizeof())),
'z':	lambda: s.append(len(s)),

We model this polymorphism by providing two distinct classes for stack values.

As long as we have the two classes, we can do some partial evaluation: as the result of adding a number to the continuation will simply be that it is shifted to the stack, we avoid the intermediate steps and place it there to begin with.

<<stack elements>>=
class SVal(str):
	def execute(self):	cpush(self + '\0')
	def sizeof(self):	len(self)
	def prval(self):	write(self)

class NVal(long):
	def execute(self):	s.append(self)
	def sizeof(self):
		if self==0:	return 1
		else:		return int(math.log10(self))+1
	def prval(self):
		q,str = self,""	
		while q > 0:
			q,r = divmod(q,0x100)
			str = chr(r) + str
		write(str)

Question: why is our task made easier by using long integers in Python, rather than C longs?

Question: if we did not append the extra NULL byte when executing a string, this implemenation would be properly tail-recursive. Why do we not do so?'

[edit] execution control

Supra, we saw that executing a macro was as simple as pushing it onto the continuation, "scheduling" it for evaluation by the interpreter.

"Pushing" a function onto the continuation is implemented by prepending it to the string.

Likewise, popping off the code for a command and trimming a continuation (to provide an early exit or to skip over comments) are implemented with easy string manipulations.

<<continuation utilities>>=
def cpop():
	global c
	r,c=c[0],c[1:]
	return r
def cpush(b):
	global c
	c = b+c
def ctrim(upto):
	global c
	try:	c = c[c.index(upto)+1:]
	except:	sys.exit(0)
<<command table>>=
'#':	lambda: ctrim('\n'),
'q':	lambda: (ctrim('\0'),ctrim('\0')),

Note that we have used null bytes to delimit levels of macro invocation in the continuation.

Question: how is the double exit behavior of q related to the end-entire-method behavior of ^ in Smalltalk?

Unfortunately, to support this behavior, we cannot delete the nesting information to support tail-call optimization.

Still, each tail call only adds 1 byte, so Generating all rationals (dc) runs, if not to \infty, at least for much longer than it does in the Unix implementation.

[edit] registers

Continuations were somewhat involved; the dictionary is trivial. load and store manipulate the interpreter dict in the manner than one expects.

<<command table>>=
'l':	lambda: s.append(d[cpop()]),
's':	lambda: d.update([(cpop(),s.pop())]),

[edit] conditional branches

Taken conditional branches require that the macro be looked up in the dictionary, and its value executed.

<<command table>>=
'>':	lambda: compare('>'),
'<':	lambda: compare('<'),
'=':	lambda: compare('='),
'!':	lambda: compare('!'+cpop()),
<<auxiliary commands>>=
def compare(how):
	cmpfunc = {
	'>':	lambda a,b: b > a,
	'!>':	lambda a,b: b <= a,
	'<':	lambda a,b: b < a,
	'!<':	lambda a,b: b >= a,
	'=':	lambda a,b: b == a,
	'!=':	lambda a,b: b != a
	}
	if cmpfunc[how](s.pop(-2),s.pop()):
		d[cpop()].execute()
	else:
		cpop()

[edit] other control flow

In contrast, executing from the stack, or using ? to execute user input, are both straightforward.

<<command table>>=
'x':	lambda: s.pop().execute(),
'?':	lambda: cpush(sys.stdin.readline()),

Question: what are the advantages of evaluating user input? the disadvantages?

[edit] literals

Of course, in order to operate on values, we must have some way to specify them in the first place.

Here we process numeric and string literals. One might wish to take advantage of library routines to do the lexing and value conversion, but perhaps the character-by-character implementation here better demonstrates that the two forms of literals are their own little domain specific languages, and so we interpret them with the same strategy as the outer interpreter: a left fold.

<<command table>>=
'[':	dostr,
'_':	lambda: donum(-1,0),
'0':	lambda: donum( 1,0),
'1':	lambda: donum( 1,1),
'2':	lambda: donum( 1,2),
'3':	lambda: donum( 1,3),
'4':	lambda: donum( 1,4),
'5':	lambda: donum( 1,5),
'6':	lambda: donum( 1,6),
'7':	lambda: donum( 1,7),
'8':	lambda: donum( 1,8),
'9':	lambda: donum( 1,9),
<<auxiliary commands>>=
def donum(sign,acc):
	while c[0].isdigit():
		acc = 10*acc + int(cpop())
	acc *= sign
	s.append(NVal(acc))

def dostr():
	nest,acc = 1,[]
	while nest:
		c = cpop()
		if   c == ']': nest-=1
		elif c == '[': nest+=1
		if not nest:	break
		acc.append(c)
	s.append(SVal("".join(acc)))

Question: in theory, is it valid to say that the semantics of an algebraic construct (such as a literal value or an entire program) are given by means of a function folding over its syntax?

[edit] the remainder

All that remains is to declare whitespace to code for the "identity" operation — ie, it does nothing.

<<command table>>=
' ':	nop,
'\t':	nop,
'\r':	nop,
'\n':	nop,
'\0':	nop
<<auxiliary commands>>=
def nop(): pass

Strictly speaking, this is not necessary, but giving the programmer the choice of format allows dc code to be (a little) less opaque.

Compare [sa d la + sr la lr p d ls >m] with [sadla+srlalrpdls>m] — in a way, free whitespace is the poor man's semantic hilighting.

[edit] wrapping up

Finally, we wrap everything up in a module

<<dc.py>>=
#############################################
import sys, math
write=sys.stdout.write
data structures
interpreter
command line interface

and provide a command line interface allowing for either interactive or file input.

<<command line interface>>=
#############################################

if __name__=="__main__":
	arg = sys.argv[1:]
	if not len(arg):
		while True:
			interp(sys.stdin.readline())
	while len(arg):
		if arg[0] == '-e':
			str = arg[1]
			arg = arg[2:]
		elif arg[0] == '-f':
			arg = arg[1:]
			continue	
		else:
			f = open(arg[0],"rb")
			str = f.read()
			f.close()
			arg = arg[1:]
		interp(str)

so, for example, one may say

dc.py -e 6581840dnP

or, having visited Merge sort (dc),

dc.py -f mergesort.dc
Download code
hijacker
hijacker
hijacker
hijacker