Sieve of Eratosthenes (Haskell)
The Sieve of Eratosthenes is an algorithm for rapidly locating all the prime numbers in a certain range of integers. It operates by marking as composite all nontrivial multiples of each prime in sequence until only primes remain unmarked. It is most wellsuited to implementation using arrays, which are compact and feature random access.
This program is under development.
Please help to debug it. When debugging
is complete, remove the {{develop}} tag.
Contents 
[edit] A first look
This "Sieve of Eratosthenes" is not the genuine algorithm, but is instead a naive version of the original as (incorrectly) taught in most courses on algorithms. A more comprehensive (and far betterperforming!) Sieve will be provided later as infrastructure code (specifically a priority queue) is provided elsewhere.
The core of this naive implementation is the primes
function, detailed below. primes
is just a convenient wrapper that hides implementation detail behind a facade. Called by itself it provides an infinite list of primes which may be used in other circumstances as desired. It accomplishes this by wrapping a tailrecursive function, sieve
, tucked away in a where
clause to keep from polluting the namespace.
<<primes_naive>>= primes :: [Integer] primes = sieve [2..] where sieve (p:xs) = p : sieve [xx < xs, x `mod` p > 0]
The sieve
function in the where
clause takes a single list of integers as its argument and assumes that the first list member is a prime. Since primes
passes in the infinite list [2..], it is guaranteed in this code that the first number will always be prime. It returns with the initial prime p
glued onto the output of a tailrecursive call to itself with a list comprehension that returns only those integers which are not evenly divisible by the provided prime. To more clearly explain:

sieve
is first passed a list of all integers greater than or equal to 2 ([2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...]
);  it uses the 2 as the first prime and then passes itself a list of all remaining integers not divisible by 2 (
[3, 5, 7, 9, 11, 13, 15, 17, ...]
);  it uses the 3 as the second prime and then passes itself a list of all remaining integers not divisible by 3 (
[5, 7, 11, 13, 17, ...]
);  this proceeds recursively ad infinitum.
To those unfamiliar with Haskell's lazy semantics, the above can seem like a mystery. It is best to consider the lists involved not as lists in the traditional data structure sense, but instead as list generators  functions which will calculate on demand the next item in the "list". It is these lazy semantics which permit elegant, readable code like the above to be generated without the fussing around that's usually needed when coding this pareddown "Sieve of Eratosthenes".
Some notes on the provided code:
 Having the pairing of
primes
andsieve
is a safety issue used to guarantee correctness of the ensuing list at the possible expense of some performance. (Using the GHC compiler there is absolutely no performance degradation, but naive implementations could incur some overhead.)  The lines above each function (e.g.
primes :: [Integer]
) are type declarations and are not necessary for this code to compile and work. Haskell can infer the type of these expressions without declarations. The declarations have been provided as a courtesy to the reader to enhance comprehension.
[edit] Test code
This code, as stated earlier, is a naive implementation and not an authentic Sieve of Eratosthenes at all. To prove this by comparing it to later versions, we'll write a simple little test framework that finds the 10,000th prime so we can run this with some form of tool that measures time used at the command line. The code is dead simple:
<<prime_sieve_naive_test.hs>>= module Main where primes_naive main :: IO () main = do print $ show (primes !! 10000)
Executing this on a machine with timing information could look like this:
$ ghc make prime_sieve_naive_test.hs $ time ./prime_sieve_naive_test "104743" real 0m14.817s user 0m14.277s sys 0m0.020s
[edit] A better sieve
A better "sieve", also not the authentic one, can be constructed to provide far better performance. As with the naive version, this version is not the authentic Sieve of Eratosthenes, but it does perform much more quickly.
[edit] Building blocks
As is so often the case in Haskell, the simple building blocks of this solution to finding primes occupies the greatest amount of space, yet requires the least explanation to understand. Two functions, merge
and diff
here occupy over twothirds of the (meaningful) line count, but are trivial to understand.
[edit] Merging infinite lists
<<merge_infinite>>= merge :: (Ord a) => [a] > [a] > [a] merge xs@(x:xt) ys@(y:yt) = case compare x y of LT > x : (merge xt ys) EQ > x : (merge xt yt) GT > y : (merge xs yt)
The merge
function is a highlyspecialized function that merges two infinite, sorted lists. This is not a generalpurpose function usable by a wide variety of clients. It is a very specific function used solely as a helper function. Among the tools that it is missing are the ability to deal with lists that end, lists with duplicate items within them and the ability to deal with unsorted lists.
merge
is passed two lists. It tags them as xs
and ys
using Haskell's @
notation, but also breaks each of them apart into their constituent heads and tails (e.g. (x:xt)
) in the patterns. When entered, it compares the heads of both lists and:
 If
x
is less thany
it constructs a list fromx
and the merging of the tail ofxs
and all of theys
.  If the two heads are equal it constructs a list from one of them (
x
in this case) and the tails of both lists. In this manner it eliminates any entries which may exist in both lists.  If
x
is greater thany
it constructs a list fromy
and the merging of all of thexs
and the tail ofys
.
Note carefully again:
 This function cannot cope with finite lists as it has no terminating condition for an empty list.
 The elements of each list must be sorted when the function is called.
 This system will eliminate duplicates across the lists, but will not be able to cope with lists which have duplicates internally.
[edit] Diffing infinite lists
<<diff_infinite>>= diff :: (Ord a) => [a] > [a] > [a] diff xs@(x:xt) ys@(y:yt) = case compare x y of LT > x : (diff xt ys) EQ > diff xt yt GT > diff xs yt
As with merge
, diff
is not a generalpurpose function. It is a tool with specific characteristics used solely as a building block for calculating primes. This means, as with merge
, the following constraints apply:
 This function will only work on infinite lists.
 The supplied lists must be ordered.
 The interaction between the two lists is not what would be generally desired from a more generalpurpose function.
diff
, in short, removes all of the elements of the second list passed to it from the first. Some examples:
Example 1
take 10 (diff [1..] [2,4..])
The result of this invocation is [1,3,5,7,9,11,13,15,17,19]
. The first list is all natural numbers. The second is all even natural numbers. The diff
call removes all even numbers from the set of natural numbers.
Example 2
take 10 (diff [10..] [2,4..])
The result here is [11,13,15,17,19,21,23,25,27,29]
. Notice how all even numbers have been removed from the list of all natural numbers starting at 10. The ones that are smaller than 10 are ignored.
As with the merge
function, the work is done in the case
statement. Indeed the first case of the statement is exactly the same as for merge
: if the head of the first list is smaller than the head of the second, construct a list formed of the first head and a recursive call to diff
with the first tail and the second list in its entirety. The second says that if the heads match, drop both heads and call diff
recursively on both tails. The final case says that if the second head is lower than the first, make a recursive call keeping the whole first list, but only using the tail of the second list.
Again, as a reminder, neither merge
nor diff
are generalpurpose functions. They are building blocks for use in this primes generator only.
[edit] Core functionality
This is the heart of this prime number generator. It is deceptively short—4 lines of meaningful code—but contains a lot of heavilyinterlocked snippets of sophisticated functionality.
<<primes_better>>= merge_infinite diff_infinite primes, nonprimes :: [Integer] primes = [2, 3, 5] ++ (diff [7, 9 ..] nonprimes) nonprimes = foldr1 f $ map g $ tail primes where f (x:xt) ys = x : (merge xt ys) g p = [ n * p  n < [p, p + 2 ..]]
[edit] Prime numbers
The first function to look at is actually quite simple: primes
. Repeating that code below, it is simply:
primes = [2, 3, 5] ++ (diff [7, 9 ..] nonprimes)
Reading this is simple: primes
is a list consisting of the first three primes ([2,3,5]
) with all odd numbers greater than that ([7,9..]
), after the list of all nonprime numbers (nonprimes
) is removed, appended to it.
Intuitively this is a simple operation. It all hinges, however, on the ability to calculate a list of nonprimes. That is where the hairy part of the code is.
[edit] Nonprime numbers
nonprimes = foldr1 f $ map g $ tail primes
For those unused to some of Haskell's rich functional operators, this line of code looks opaque. What it really translates to, however, is the following code where the function application operator is replaced by regular function application (plus the relevant parentheses to ensure proper order of operations):
nonprimes = foldr1 f (map g (tail primes))
(The reason the $
operator is used, typically, is to avoid complicated nestings of parentheses. Once you're used to it, the first version is much simpler to read than the second.)
Being rightassociative, the $
operator forces the last call – tail primes
– to be evaluated first. This means that the function g
will be applied to the list of prime numbers sans 2
. (Note that the list of prime numbers is needed to calculate the list of prime numbers. With Haskell's lazy evaluation semantics this is all very sensible and possible.)
[edit] Multiples of primes
The next step in decoding this operation is to look at the local function g
. It is some kind of transformation, obviously, which takes a list of primes and converts them to something else. The code for it is:
g p = [ n * p  n < [p, p + 2 ..]]
The naming of the parameter strongly implies that it is a prime. The fact that it's applied to a list of known primes confirms this. This transformation is difficult to read at first. It helps to see the type of it. The type of this transformation is:
g :: Integer > [Integer]
So the code for nonprimes
involves taking a list of primes and for each prime generating a list of nonprimes from a list comprehension that multiplies the prime first by itself, then by the next odd number up, and so on. This is easier to comprehend if you work through some trial numbers.
First, the tail of primes
is 3, 5, 7, 11, 13, 17, etc. The 2 is explicitly dropped. Thus the first number that gets g
applied to it is 3, the second is 5, and so on. The three coming in is converted into an infinite list starting at 9 (3*3), followed by 15 (3*5), then 21, 27, etc. through the list comprehension. So an input list (partial) of primes of [3,5,7,11]
will get converted instead into:
[ [9,15..]  [3*3,3*5,3*7,3*9,...] [25,35..]  [5*5,5*7,5*9,5*12,...] [49,63..]  [7*7,7*9,7*11,7*13,...] [121,143..]  [11*11,11*13,11*15,11*17,...] ]
The beauty (and the mindmangling part) of the scheme is that all of these infinite, circularlydefined lists are calculated only at need, saving both execution time and memory requirements for only those points where they are needed.
[edit] Merging the multiples
The problem we face now is that we have lists of known nonprimes created by squaring primes and calculating multiples from there, but this leaves us with the difficult problem of actually accessing said lists in an efficient manner. This is where the earlier helper merge
enters the picture. It is instructive to look at three things:
 The local helper function's definition:
f (x:xt) ys = x : (merge xt ys)
 The type signature for the
merge
function:merge :: (Ord a) => [a] > [a] > [a]
 The type signature for
foldr1
:foldr1 :: (a > a > a) > [a] > a
Now recall that the local helper is used in a call to foldr1, so it is an accumulator of some form. In this case it folds over a list of lists and accumulates a list. Modifying the type signature of foldr1 with the actual types involved we get foldr1 :: ([Integer] > [Integer] > [Integer]) > [[Integer]] > [Integer]
. In other words we are going to be taking our set of lists of known nonprimes from the local helper g
mapped over the list of known primes and using merge
to give us a single, merged, ordered list of odd nonprimes beginning at 9. First the lists based on 3 and on 5 will get merged. This resulting list is then merged with the list based on 7. This is then further merged with the list based on 11, 13, 17, and so on. Any duplicates calculated will be dropped and what comes out of the process is a neverending stream of nonprimes in increasing order.
[edit] Putting it together
We've now inspected the whole code in sufficient detail that the "big picture" is likely lost. Further some of the interactions in this code are going to be highly inobvious, especially to those coming from a language with more common semantics. So keeping in mind the detailed explanations, here is the overview.
Calculating the primes starts with a short list of known primes ([2,3,5]
). To this list is appended the list of odd numbers starting at 7, which, through the application of diff
, has had all known nonprimes removed from it. This list of nonprimes is generated by taking the list of primes and making a list of lists of multiples of said primes, beginning at the square of the prime and extending upward by multiples of two. This list of lists is then merge
d into a single cohesive list.
The circularity of this definition is of particular note. To make the list of primes you need to make a list of nonprimes. To make a list of nonprimes you need to make a list of primes. In a more conventional programming environment this would be an intractable problem. In Haskell, however, with its lazy evaluation strategy, it is a trivial affair.
Keep in mind that lazy evaluation means that the lists in question (infinite in length) are not processed in their entirety. They are generated on demand as they are needed and not an instant before that. This allows us our bizarre circular definition because we've seeded just enough of the primes list to move us forward in our list of lists of nonprimes. Too, that list of lists which then gets merged is also a bizarre little affair seeing as we are, hypothetically speaking, merging an infinite number of infinitelylong lists. Again the lazy semantics of Haskell permits us to do this as the lists in question, along with their elements, are only produced on need and not a moment sooner.
[edit] Test code
Proof that this version of a primes finder is quicker can be quickly constructed by merely building a test harness similar to the one we gave for the naive sieve.
<<prime_sieve_better_test.hs>>= module Main where primes_better main :: IO () main = do print $ show (primes !! 10000)
Building and timing the resulting executable can be an eyeopener.
$ ghc make prime_sieve_better_test.hs $ time ./prime_sieve_better_test "104743" real 0m0.125s user 0m0.080s sys 0m0.000s
There is more than two orders of magnitude difference between the naive implementation's performance and the improved implementation's when picking only the ten thousandth prime! Further, this disparity increases the farther into the list of prime numbers you delve. What's going on?
[edit] Analysis
To understand the performance differences above you have to understand how the code you see is being processed by Haskell's runtime system behind the scenes. It can be a shock sometimes to see just how much is being done for you in what appears to be a simple line of code.
[edit] Naive prime calculation
One of the things that's not immediately obvious when looking at the naive implementation is the sheer number of calculations it has to make. Let's take another look at the code in question:
primes :: [Integer] primes = sieve [2..] where sieve (p:xs) = p : sieve [xx < xs, x `mod` p > 0]
sieve
is first called with an infinite list of integers starting from 2. This is, in reality, a hidden generator function that generates the next number in the list on demand. The calculation involved in this is trivial and on the order of navigating a real list, so this is not too bad a problem. The problems begin afterwards.
sieve
takes the head of the list (read: is given "2" from the generator) and then passes to itself a list comprehension formed from its tail ([3..]) and a guard function that filters out all numbers divisible by the head. This new infinite list is, like the first one, a generator function  one that performs a modulo division on every subsequent integer!
sieve
again takes the head of that list and passes to itself another list comprehension formed from the tail ([3,5..]) and a guard function that filters based on modulo division. By this point half of the integers greater than 3 are having two modulo divisions calculated and the rest have one. The more primes are found, the greater the number of operations being performed with each forward "tick" of the algorithm. By the time we reach the ten thousandth prime, we have cases where some integers are getting as many as 9999 modulo operations performed on them while each and every one is getting at least 1. To call this wasteful of CPU is a gross understatement!
[edit] References
 The naive implementation was adapted from Melissa O'Neill's paper on the topic. Further the whole entry here was inspired by the contents of that article.
 The better implementation was adapted from the Haskell Wiki and cleaned up for presentation and explanation here.
Download code 