Wiki (sh)

From LiteratePrograms
Jump to: navigation, search

A rudimentary wiki, written as a Bourne shell CGI program.

Contents

[edit] theory

A wiki is a server: it accepts requests and generates replies. Some servers must keep session state with their clients; in this case the program runs to completion, providing a single reply for each request, and the filesystem holds the only durable state.

Question: What is the difference between a single-reply server and a procedure call?

[edit] practice

We map WikiPages directly onto files in the filesystem, so this wiki is simple enough to be straight-line code.

<<serve WikiPage>>=
parse request
synchronize with filesystem
reply to user

During a request, there are only two minor complications:

  • distinguishing WikiPages (which have content in the filesystem) from WikiWords (which have yet to be created)
  • tracking the listing state (to turn the linear lists of wikitext into the hierarchical lists of HTML)

(in fact, of the handful of program variables: request,wikipage,allpages, and listing, the latter is the only one which actually varies after it has been initialized)

Question: What about the filesystem? How often does it vary during a progam run?

[edit] parse request

We expect the request to be packaged in the QUERY_STRING as request+wikipage, allowing us to set the positional arguments directly after url-decoding, which we then assign to the more mnemonic request (the verb) and wikipage (the direct object).

<<parse request>>=
set -- $(echo $QUERY_STRING | urldecode)
request=$1
wikipage=$2

If there is no request, we redirect the user to the WelcomePage, and need do nothing more.

<<parse request>>=
[ $request ] || { reply "Location: $me?read+$homepage"; exit; }

[edit] synchronize with filesystem

If we have been asked to perform a write, we update the filesystem. (the data to be written is found in url-encoded format on standard input)

Because there will be no further mutations, we generate the list of currently valid WikiPages now.

<<synchronize with filesystem>>=
[ $request = write ] && sed 's/wikitext=//' | urldecode > $wikipage
allpages=$(ls [A-Z][a-z]*[A-Z][a-z]*)

[edit] reply to user

Wiki.png

Depending upon the particular request, we respond differently.

  • for read or write we mark up the source to display the wiki page
  • for edit or create we provide an edit field with the source wikitext of the page
  • for link we generate a list of pages containing references to the current one
<<reply to user>>=
reply 'Content-Type: text/html'
case $request in
    read|write) cat <<-EOF
	<html><body><h1>$wikipage</h1>
	(<a href=$me?edit+$wikipage>edit</a>)
	(<a href=$me?link+$wikipage>links</a>)
	<hr>
	$(markup < "$wikipage")
	</body></html>
	EOF
	;;
    edit|create) cat <<-EOF
	<html><body><h1>editing: $wikipage</h1>
	(<a href=$me?read+$wikipage>page</a>)<hr>
	<form action=$me?write+$wikipage method=POST>
	<textarea name=wikitext rows=20 cols=60>
	$(escapehtml < "$wikipage")
	</textarea><br><input type=submit>
	</form></body></html>
	EOF
	;;
    link) cat <<-EOF
	<html><body><h1>pages linking to: $wikipage</h1>
	(<a href=$me?read+$wikipage>page</a>)<hr>
	$(grep -l "$wikipage" $allpages | sed -e 's/^/* /' | markup)
	</body></html>
	EOF
	;;
    *) cat <<-EOF
	<html><body><h1>unknown request</h1>
	(<a href=$me?read+$homepage>home</a>)<hr>
	didn't grok <code>$QUERY_STRING</code>
	</body></html>
	EOF
	;;
esac

Question: This server, being a toy demonstation, is not intended to be secure enough for production environments. What changes are necessary to ensure that a malicious request can't be used to execute shell commands?

Exercise: Implement RecentChanges (hint: try using ls -t).

[edit] marking up

Most of the work of markup is handled by an AWK program, which has the list of current WikiPages passed to it in an environment variable, PAGE.

<<define markup functions>>=
wikify() {
    PAGE="$allpages" awk '
    <<awk patterns>>'
}
markup() { escapehtml | tr -d "\r" | wikify; }

Unfortunately, the split function generates the reverse of the relation we want, so we invert it into the page dictionary.

<<awk patterns>>=
    BEGIN { split(ENVIRON["PAGE"],rev)
            for(r in rev) { page[rev[r]] = r } }

By keeping a listing variable (holding the proper termination tag during open lists) we can handle at most one level of list elements.

<<awk patterns>>=
    listing && $0 !~ /^[#\*]/  { print listing; listing = "" }
    $0 ~ /^# /   { if(!listing) { print "<ol>"; listing = "</ol>" } }
    $0 ~ /^\* /  { if(!listing) { print "<ul>"; listing = "</ul>" } }
    listing      { sub(/./, "<li>") }

Some wikitext markup is straightforward,

<<awk patterns>>=
    $0 == ""     { $0 = "<p>" }
    $0 == "----" { $0 = "<hr>" }

but for the WikiWords themselves, we check each field of each line, and when we find a WikiWord, we make the proper substitution depending upon whether or not it exists in the filesystem. External URLs are similar, but not handled so carefully.

<<awk patterns>>=
    { for(i=1; i<=NF; ++i) {
        if(match($i, /^[A-Z][a-z]+[A-Z][a-z]+/))	{
          wikiname = substr($i, RSTART, RLENGTH)
          if(page[wikiname]) l = "<a href='$me'?read+"wikiname">"wikiname"</a>"
          else               l = wikiname"<a href='$me'?create+"wikiname">?</a>"
          $i = l substr($i, RSTART+RLENGTH)
        } else if(match($i, /http:\/\//)) {
          $i = "<a href=\""$i"\">"$i"</a>"
        }
      }
      print }

Question: How could the WikiWord/URL subsitition be made in a less procedural, more declarative manner? Would a subprocess help?

Exercise: Extend the implementation to multiple listing levels (and fix the bug relating to immediately consecutive lists).

[edit] ancillary

We must also provide a few definitions that, in more mainstream programming languages, would be library functions.

  • reply generates a header for the response to the browser
  • urldecode decodes text received from the browser
  • escapehtml encodes text to be sent to the browser
<<define CGI functions>>=
reply()      { printf '%s\r\n\r\n' "$1"; }
urldecode()  { echo "16i[$(sed -e 's/+/ /g;s/\%\(..\)/]P\1P[/g')]P" | dc; }
escapehtml() { sed -e 's/&/\&amp;/g;s/</\&lt;/g;s/>/\&gt;/g'; }

[edit] wrapping up

Finally, we provide a configuration variable me which should be set to reflect the URL under which the webserver runs this program,

<<wiki.sh>>=
#!/bin/sh
me="wiki.sh"
homepage="WelcomePage"
define CGI functions
define markup functions

serve WikiPage

and provide the intial Wiki content:

<<WelcomePage>>=
Welcome to this bare-bones Wiki.
* click on WikiWords to enter new definitions,
* or to navigate to existing pages, like this WelcomePage.

a (very) few other features:
# numbered lists
# URLs such as http://en.literateprograms.org/ are automatically linked.
# horizontal rules
----
Download code
hijacker
hijacker
hijacker
hijacker