Talk:Word count (Haskell)

From LiteratePrograms
Jump to: navigation, search

[edit] Monads

Presumably there's a nifty monadic way to thread the options (and other state) through the program, but since I'm not well-versed in creating new monads I haven't attempted it yet. --Allan McInnes (talk) 02:07, 9 May 2006 (PDT)

[edit] Help welcome

I am by no means a Haskell expert. In fact, this is probably the most substantial Haskell program I've ever written. So I have no doubt that it could be improved substantially. Any and all suggestions for improvement are most welcome. Or just dive on in and make improvements yourself! --Allan McInnes (talk) 19:52, 18 May 2006 (PDT)

First (small) thing that comes to my mind would be to separate program logic from IO: the function that does the counting shouldn't have an 'IO' return type and not be passed a handle but perhaps only a string so it can be reused to count lines/words/characters of strings as well.
I personally would also avoid having the user to pass an initial WordCount with all components set to 0 on the getCount call. I would also try to not mingle reading files, calculation and printing of file counts and calculation of total count. I may make some more concrete suggestions this evening if I have the time. Ruediger Hanke 00:21, 21 June 2007 (PDT)
Regarding your first point, I was trying to avoid having to load an entire file into memory before performing the count, and couldn't see a good way of doing that without staying in the IO monad. I'm open to suggestions though, and would be interested in seeing solutions that provide the kind of separation you've indicated would be good to see. --Allan McInnes (talk) 11:02, 23 June 2007 (PDT)
Sorry I didn't reply sooner. My last message was posted shortly before I went to CEFP2007 and I didn't manage to write a reply before and forgot afterwards. If you read a file with hGetContents or readFile, the file is read as a lazy stream, so you don't load the entire file at once. But, have you tested your version with a large file (I created one with a million lines "The quick brown fox ..." for testing). It quickly eats up more than a GByte of memory on my machine. I guess the program is too lazy, we need some explicit strictness. I just quickly tried this:
import Data.List
import System.IO

type WordCount = (Int, Int, Int)

countWords :: String -> WordCount
countWords xs = foldl' (\(!c, !w, !l) x -> (c+length x, w+length (words x), l+1))
                       (0, 0, 0)
                       (lines xs)

test fn = do cs <- readFile fn
             putStrLn $ show $ countWords cs
and this runs in constant space on my machine. And countWords is not in the IO monad. This code needs the -fbang-patterns flag, though, so it's probably not suitable for the article. But 'seq' or a datatype with strictness annotations should work as well. But that was my basic idea. Ruediger Hanke 16:03, 1 August 2007 (PDT)
Good point! I'm still getting used to thinking lazily. I don't see why code that requires flags is necessarily inappropriate for the article, although if you're worried about it you could always create a Word count (GHC Haskell) article. --Allan McInnes (talk) 20:08, 4 August 2007 (PDT)