Word count (J)
From LiteratePrograms
- Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | Perl | Python | Python, functional | Rexx
This is not an implementation
of the UNIX wc tool.
Instead, it is a response to a paper of Gibbons,
as discussed on Lambda the Ultimate,
which makes the claim
that three different wordcount programs
might all have arisen from the same high-level design,
namely the composition
.
This being a "Word count" program,
we favor renaming the composition to
,
and implement it in
the terse, but powerful,
J array programming language.
Contents |
theory
By placing an ordering on character classes (nonblank < blank), we avoid Algol-style folds and use whole-array operations.
Some things to be aware of when reading J code:
- the . and : do not occur alone, but are diacritic marks that modify the interpretation of the character which they follow.
- { and } (as well as the brackets and a few others) have their own, independent meanings and usually occur unpaired.
implementation
locating drops
Within a word, the classification increases monotonically,
so the crux of the program
is to flag all the spots
where the classification decreases —
where a nonblank character follows a blank.
We avoid iteration by comparing the entire
boolean array with a shifted version of itself:
In J, we need not even make up a temporary name, such as bs, but can instead leave the array argument implicit.
- }., or behead, produces the array without its first element
- }:, or curtail, produces the array without its last element
- < performs the obvious comparison
<<drops>>= (}.<}:)
flagging blanks
Blanks are easily found —
the expression
turns into a membership check.
- {, or from, selects items from an array
- a., or alphabet, contains the system character set (so we will select ASCII space, tab, and linefeed)
- e., or member (in), checks if the elements of its left argument somewhere in its right
- but ~, or passive, reverses the arguments, so now we check for which characters of the right argument occur in the whitespace array given on the left.
<<blank>>= (32 9 10{a.)e.~
indicating words
Now we have a straight-line definition for words:
:
<<words>>= dropsblank
wrapping up
We still need a definition for count, but this is a traditional idiom in both APL and J.
- +/, or insert plus, sums up its argument
<<count>>= +/
Finally, we include the boilerplate for a jconsole script:
<<wc.ijs>>= echo countwords' ',stdin '' NB. word count script (use jconsole)
which will result in a single-line script:
echo +/(}.<}:)(32 9 10{a.)e.~' ',stdin '' NB. word count script (use jconsole)
that can be run as follows:
$ jconsole wc.ijs < wc.ijs 12
| Download code |
