# Word count (Python, functional)

Other implementations: Assembly Intel x86 Linux | C | C++ | Haskell | J | Lua | OCaml | Perl | Python, functional | Rexx

This is not an implementation of the UNIX wc tool, in Python, for which see Word count (Python).

Instead, it is a response to a paper of Gibbons, as discussed on Lambda the Ultimate, which makes the claim that three different wordcount programs might all have arisen from the same high-level design, namely the composition $length \circ words$. This being a "Word count" program, we favor renaming the composition to $count \circ words$.

```<<wc>>=
wc=lambda s: count(words(s))
```

## theory

The solutions given by Gibbons follow the normal Algol style (as represented here by the C, Haskell, Perl, and Lua examples) of iterating over an input stream while incrementing a running count. Even the Python example, although it treats whole lines as units, has a for lurking in the comprehension defining the word count for the entire input.

In this case, we make yet another design decision: by placing an ordering on character classes (nonblank < blank), we can replace Algol-style folds with APL-style whole-array operations.

## implementation

Within a word, the classification increases monotonically, so the crux of the program is to flag all the spots where the classification decreases — where a nonblank character follows a blank. We avoid iteration by comparing the entire boolean array with a shifted version of itself:

```<<defns>>=
# Gibbons' unlengths are nondecreasing runs, eg 00001111
drops=lambda bs: bs[1:] < bs[:-1]
```

Blanks are easily found — the expression $blank = space \or tab \or linefeed$ translates straightforwardly:

```<<defns>>=
blank=lambda cs: (cs==' ')+(cs=='\t')+(cs=='\n')
```

Now we have a straight-line definition for words: $drops \circ blank$:

(Question: how do the spaces prepended to the argument help avoid conditional branching? at which arguments do they make a difference?)

```<<defns>>=
chars=lambda cs: array(tuple(cs))
words=lambda cs: drops(blank(chars('  '+cs)))
```

## wrapping up

We still need a definition for count, but numpy.sum provides a suitable implementation:

```<<defns>>=
count=sum
```

Finally, we include the boilerplate:

```<<wcfa.py>>=
from numpy import array, sum
defns
wc
if __name__=="__main__":
import sys