# Word count (J)

Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx

This is not an implementation of the UNIX wc tool. Instead, it is a response to a paper of Gibbons, as discussed on Lambda the Ultimate, which makes the claim that three different wordcount programs might all have arisen from the same high-level design, namely the composition $length \circ words$. This being a "Word count" program, we favor renaming the composition to $count \circ words$, and implement it in the terse, but powerful, J array programming language.

## theory

By placing an ordering on character classes (nonblank < blank), we avoid Algol-style folds and use whole-array operations.

Some things to be aware of when reading J code:

• the . and : do not occur alone, but are diacritic marks that modify the interpretation of the character which they follow.
• { and } (as well as the brackets and a few others) have their own, independent meanings and usually occur unpaired.

## implementation

### locating drops

Within a word, the classification increases monotonically, so the crux of the program is to flag all the spots where the classification decreases — where a nonblank character follows a blank. We avoid iteration by comparing the entire boolean array with a shifted version of itself:

In J, we need not even make up a temporary name, such as bs, but can instead leave the array argument implicit.

• }., or behead, produces the array without its first element
• }:, or curtail, produces the array without its last element
• < performs the obvious comparison
```<<drops>>=
(}.<}:)```

### flagging blanks

Blanks are easily found — the expression $blank = space \or tab \or linefeed$ turns into a membership check.

• {, or from, selects items from an array
• a., or alphabet, contains the system character set (so we will select ASCII space, tab, and linefeed)
• e., or member (in), checks if the elements of its left argument somewhere in its right
• but ~, or passive, reverses the arguments, so now we check for which characters of the right argument occur in the whitespace array given on the left.
```<<blank>>=
(32 9 10{a.)e.~```

### indicating words

Now we have a straight-line definition for words: $drops \circ blank$:

```<<words>>=
dropsblank```

## wrapping up

We still need a definition for count, but this is a traditional idiom in both APL and J.

• +/, or insert plus, sums up its argument
```<<count>>=
+/```

Finally, we include the boilerplate for a `jconsole` script:

```<<wc.ijs>>=
echo countwords' ',stdin '' NB. word count script (use jconsole)
```

which will result in a single-line script:

```echo +/(}.<}:)(32 9 10{a.)e.~' ',stdin '' NB. word count script (use jconsole)
```

that can be run as follows:

```\$ jconsole wc.ijs < wc.ijs
12

```
hijacker
hijacker
hijacker
hijacker