Word count (J)

From LiteratePrograms
Jump to: navigation, search
Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx

This is not an implementation of the UNIX wc tool. Instead, it is a response to a paper of Gibbons, as discussed on Lambda the Ultimate, which makes the claim that three different wordcount programs might all have arisen from the same high-level design, namely the composition length \circ words. This being a "Word count" program, we favor renaming the composition to count \circ words, and implement it in the terse, but powerful, J array programming language.


[edit] theory

By placing an ordering on character classes (nonblank < blank), we avoid Algol-style folds and use whole-array operations.

Some things to be aware of when reading J code:

  • the . and : do not occur alone, but are diacritic marks that modify the interpretation of the character which they follow.
  • { and } (as well as the brackets and a few others) have their own, independent meanings and usually occur unpaired.

[edit] implementation

[edit] locating drops

Within a word, the classification increases monotonically, so the crux of the program is to flag all the spots where the classification decreases — where a nonblank character follows a blank. We avoid iteration by comparing the entire boolean array with a shifted version of itself: Wcpythonfunctional.png

In J, we need not even make up a temporary name, such as bs, but can instead leave the array argument implicit.

  • }., or behead, produces the array without its first element
  • }:, or curtail, produces the array without its last element
  • < performs the obvious comparison

[edit] flagging blanks

Blanks are easily found — the expression blank = space \or tab \or linefeed turns into a membership check.

  • {, or from, selects items from an array
  • a., or alphabet, contains the system character set (so we will select ASCII space, tab, and linefeed)
  • e., or member (in), checks if the elements of its left argument somewhere in its right
  • but ~, or passive, reverses the arguments, so now we check for which characters of the right argument occur in the whitespace array given on the left.
(32 9 10{a.)e.~

[edit] indicating words

Now we have a straight-line definition for words: drops \circ blank:


[edit] wrapping up

We still need a definition for count, but this is a traditional idiom in both APL and J.

  • +/, or insert plus, sums up its argument

Finally, we include the boilerplate for a jconsole script:

echo countwords' ',stdin '' NB. word count script (use jconsole)

which will result in a single-line script:

echo +/(}.<}:)(32 9 10{a.)e.~' ',stdin '' NB. word count script (use jconsole)

that can be run as follows:

$ jconsole wc.ijs < wc.ijs

Download code