Sieve of Eratosthenes (Haskell)

From LiteratePrograms
Jump to: navigation, search
Other implementations: Bash | C++ | Forth | Haskell | Python | Python, arrays | Scala

The Sieve of Eratosthenes is an algorithm for rapidly locating all the prime numbers in a certain range of integers. It operates by marking as composite all nontrivial multiples of each prime in sequence until only primes remain unmarked. It is most well-suited to implementation using arrays, which are compact and feature random access.


This program is under development.
Please help to debug it. When debugging
is complete, remove the {{develop}} tag.


Contents

[edit] A first look

This "Sieve of Eratosthenes" is not the genuine algorithm, but is instead a naive version of the original as (incorrectly) taught in most courses on algorithms. A more comprehensive (and far better-performing!) Sieve will be provided later as infrastructure code (specifically a priority queue) is provided elsewhere.

The core of this naive implementation is the primes function, detailed below. primes is just a convenient wrapper that hides implementation detail behind a facade. Called by itself it provides an infinite list of primes which may be used in other circumstances as desired. It accomplishes this by wrapping a tail-recursive function, sieve, tucked away in a where clause to keep from polluting the namespace.

<<primes_naive>>=

primes :: [Integer]
primes = sieve [2..]
  where
    sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]

The sieve function in the where-clause takes a single list of integers as its argument and assumes that the first list member is a prime. Since primes passes in the infinite list [2..], it is guaranteed in this code that the first number will always be prime. It returns with the initial prime p glued onto the output of a tail-recursive call to itself with a list comprehension that returns only those integers which are not evenly divisible by the provided prime. To more clearly explain:

  1. sieve is first passed a list of all integers greater than or equal to 2 ([2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...]);
  2. it uses the 2 as the first prime and then passes itself a list of all remaining integers not divisible by 2 ([3, 5, 7, 9, 11, 13, 15, 17, ...]);
  3. it uses the 3 as the second prime and then passes itself a list of all remaining integers not divisible by 3 ([5, 7, 11, 13, 17, ...]);
  4. this proceeds recursively ad infinitum.

To those unfamiliar with Haskell's lazy semantics, the above can seem like a mystery. It is best to consider the lists involved not as lists in the traditional data structure sense, but instead as list generators -- functions which will calculate on demand the next item in the "list". It is these lazy semantics which permit elegant, readable code like the above to be generated without the fussing around that's usually needed when coding this pared-down "Sieve of Eratosthenes".

Some notes on the provided code:

  • Having the pairing of primes and sieve is a safety issue used to guarantee correctness of the ensuing list at the possible expense of some performance. (Using the GHC compiler there is absolutely no performance degradation, but naive implementations could incur some overhead.)
  • The lines above each function (e.g. primes :: [Integer]) are type declarations and are not necessary for this code to compile and work. Haskell can infer the type of these expressions without declarations. The declarations have been provided as a courtesy to the reader to enhance comprehension.

[edit] Test code

This code, as stated earlier, is a naive implementation and not an authentic Sieve of Eratosthenes at all. To prove this by comparing it to later versions, we'll write a simple little test framework that finds the 10,000th prime so we can run this with some form of tool that measures time used at the command line. The code is dead simple:

<<prime_sieve_naive_test.hs>>=
module Main
  where

primes_naive

main :: IO ()
main = do
  print $ show (primes !! 10000)

Executing this on a machine with timing information could look like this:

$ ghc --make prime_sieve_naive_test.hs
$ time ./prime_sieve_naive_test
"104743"

real    0m14.817s
user    0m14.277s
sys     0m0.020s

[edit] A better sieve

A better "sieve", also not the authentic one, can be constructed to provide far better performance. As with the naive version, this version is not the authentic Sieve of Eratosthenes, but it does perform much more quickly.

[edit] Building blocks

As is so often the case in Haskell, the simple building blocks of this solution to finding primes occupies the greatest amount of space, yet requires the least explanation to understand. Two functions, merge and diff here occupy over two-thirds of the (meaningful) line count, but are trivial to understand.

[edit] Merging infinite lists

<<merge_infinite>>=
merge :: (Ord a) => [a] -> [a] -> [a]
merge xs@(x:xt) ys@(y:yt) = 
  case compare x y of
    LT -> x : (merge xt ys)
    EQ -> x : (merge xt yt)
    GT -> y : (merge xs yt)

The merge function is a highly-specialized function that merges two infinite, sorted lists. This is not a general-purpose function usable by a wide variety of clients. It is a very specific function used solely as a helper function. Among the tools that it is missing are the ability to deal with lists that end, lists with duplicate items within them and the ability to deal with unsorted lists.

merge is passed two lists. It tags them as xs and ys using Haskell's @-notation, but also breaks each of them apart into their constituent heads and tails (e.g. (x:xt)) in the patterns. When entered, it compares the heads of both lists and:

  • If x is less than y it constructs a list from x and the merging of the tail of xs and all of the ys.
  • If the two heads are equal it constructs a list from one of them (x in this case) and the tails of both lists. In this manner it eliminates any entries which may exist in both lists.
  • If x is greater than y it constructs a list from y and the merging of all of the xs and the tail of ys.

Note carefully again:

  1. This function cannot cope with finite lists as it has no terminating condition for an empty list.
  2. The elements of each list must be sorted when the function is called.
  3. This system will eliminate duplicates across the lists, but will not be able to cope with lists which have duplicates internally.

[edit] Diffing infinite lists

<<diff_infinite>>=
diff :: (Ord a) => [a] -> [a] -> [a]
diff xs@(x:xt) ys@(y:yt) = 
  case compare x y of
    LT -> x : (diff xt ys)
    EQ -> diff xt yt
    GT -> diff xs yt

As with merge, diff is not a general-purpose function. It is a tool with specific characteristics used solely as a building block for calculating primes. This means, as with merge, the following constraints apply:

  1. This function will only work on infinite lists.
  2. The supplied lists must be ordered.
  3. The interaction between the two lists is not what would be generally desired from a more general-purpose function.

diff, in short, removes all of the elements of the second list passed to it from the first. Some examples:

Example 1

take 10 (diff [1..] [2,4..])

The result of this invocation is [1,3,5,7,9,11,13,15,17,19]. The first list is all natural numbers. The second is all even natural numbers. The diff call removes all even numbers from the set of natural numbers.

Example 2

take 10 (diff [10..] [2,4..])

The result here is [11,13,15,17,19,21,23,25,27,29]. Notice how all even numbers have been removed from the list of all natural numbers starting at 10. The ones that are smaller than 10 are ignored.

As with the merge function, the work is done in the case statement. Indeed the first case of the statement is exactly the same as for merge: if the head of the first list is smaller than the head of the second, construct a list formed of the first head and a recursive call to diff with the first tail and the second list in its entirety. The second says that if the heads match, drop both heads and call diff recursively on both tails. The final case says that if the second head is lower than the first, make a recursive call keeping the whole first list, but only using the tail of the second list.

Again, as a reminder, neither merge nor diff are general-purpose functions. They are building blocks for use in this primes generator only.

[edit] Core functionality

This is the heart of this prime number generator. It is deceptively short—4 lines of meaningful code—but contains a lot of heavily-interlocked snippets of sophisticated functionality.

<<primes_better>>=

merge_infinite

diff_infinite

primes, nonprimes :: [Integer]
primes    = [2, 3, 5] ++ (diff [7, 9 ..] nonprimes) 
nonprimes = foldr1 f $ map g $ tail primes
  where 
    f (x:xt) ys = x : (merge xt ys)
    g p         = [ n * p | n <- [p, p + 2 ..]]

[edit] Prime numbers

The first function to look at is actually quite simple: primes. Repeating that code below, it is simply:

primes    = [2, 3, 5] ++ (diff [7, 9 ..] nonprimes)

Reading this is simple: primes is a list consisting of the first three primes ([2,3,5]) with all odd numbers greater than that ([7,9..]), after the list of all non-prime numbers (nonprimes) is removed, appended to it.

Intuitively this is a simple operation. It all hinges, however, on the ability to calculate a list of non-primes. That is where the hairy part of the code is.

[edit] Non-prime numbers

nonprimes = foldr1 f $ map g $ tail primes

For those unused to some of Haskell's rich functional operators, this line of code looks opaque. What it really translates to, however, is the following code where the function application operator is replaced by regular function application (plus the relevant parentheses to ensure proper order of operations):

nonprimes = foldr1 f (map g (tail primes))

(The reason the $ operator is used, typically, is to avoid complicated nestings of parentheses. Once you're used to it, the first version is much simpler to read than the second.)

Being right-associative, the $ operator forces the last call – tail primes – to be evaluated first. This means that the function g will be applied to the list of prime numbers sans 2. (Note that the list of prime numbers is needed to calculate the list of prime numbers. With Haskell's lazy evaluation semantics this is all very sensible and possible.)

[edit] Multiples of primes

The next step in decoding this operation is to look at the local function g. It is some kind of transformation, obviously, which takes a list of primes and converts them to something else. The code for it is:

g p         = [ n * p | n <- [p, p + 2 ..]]

The naming of the parameter strongly implies that it is a prime. The fact that it's applied to a list of known primes confirms this. This transformation is difficult to read at first. It helps to see the type of it. The type of this transformation is:

g :: Integer -> [Integer]

So the code for nonprimes involves taking a list of primes and for each prime generating a list of non-primes from a list comprehension that multiplies the prime first by itself, then by the next odd number up, and so on. This is easier to comprehend if you work through some trial numbers.

First, the tail of primes is 3, 5, 7, 11, 13, 17, etc. The 2 is explicitly dropped. Thus the first number that gets g applied to it is 3, the second is 5, and so on. The three coming in is converted into an infinite list starting at 9 (3*3), followed by 15 (3*5), then 21, 27, etc. through the list comprehension. So an input list (partial) of primes of [3,5,7,11] will get converted instead into:

[
  [9,15..]     -- [3*3,3*5,3*7,3*9,...]
  [25,35..]    -- [5*5,5*7,5*9,5*12,...]
  [49,63..]    -- [7*7,7*9,7*11,7*13,...]
  [121,143..]  -- [11*11,11*13,11*15,11*17,...]
]

The beauty (and the mind-mangling part) of the scheme is that all of these infinite, circularly-defined lists are calculated only at need, saving both execution time and memory requirements for only those points where they are needed.

[edit] Merging the multiples

The problem we face now is that we have lists of known non-primes created by squaring primes and calculating multiples from there, but this leaves us with the difficult problem of actually accessing said lists in an efficient manner. This is where the earlier helper merge enters the picture. It is instructive to look at three things:

  1. The local helper function's definition:
    f (x:xt) ys = x : (merge xt ys)
  2. The type signature for the merge function:
    merge :: (Ord a) => [a] -> [a] -> [a]
  3. The type signature for foldr1:
    foldr1 :: (a -> a -> a) -> [a] -> a

Now recall that the local helper is used in a call to foldr1, so it is an accumulator of some form. In this case it folds over a list of lists and accumulates a list. Modifying the type signature of foldr1 with the actual types involved we get foldr1 :: ([Integer] -> [Integer] -> [Integer]) -> [[Integer]] -> [Integer]. In other words we are going to be taking our set of lists of known non-primes from the local helper g mapped over the list of known primes and using merge to give us a single, merged, ordered list of odd non-primes beginning at 9. First the lists based on 3 and on 5 will get merged. This resulting list is then merged with the list based on 7. This is then further merged with the list based on 11, 13, 17, and so on. Any duplicates calculated will be dropped and what comes out of the process is a never-ending stream of non-primes in increasing order.

[edit] Putting it together

We've now inspected the whole code in sufficient detail that the "big picture" is likely lost. Further some of the interactions in this code are going to be highly inobvious, especially to those coming from a language with more common semantics. So keeping in mind the detailed explanations, here is the overview.

Calculating the primes starts with a short list of known primes ([2,3,5]). To this list is appended the list of odd numbers starting at 7, which, through the application of diff, has had all known non-primes removed from it. This list of non-primes is generated by taking the list of primes and making a list of lists of multiples of said primes, beginning at the square of the prime and extending upward by multiples of two. This list of lists is then merged into a single cohesive list.

The circularity of this definition is of particular note. To make the list of primes you need to make a list of non-primes. To make a list of non-primes you need to make a list of primes. In a more conventional programming environment this would be an intractable problem. In Haskell, however, with its lazy evaluation strategy, it is a trivial affair.

Keep in mind that lazy evaluation means that the lists in question (infinite in length) are not processed in their entirety. They are generated on demand as they are needed and not an instant before that. This allows us our bizarre circular definition because we've seeded just enough of the primes list to move us forward in our list of lists of non-primes. Too, that list of lists which then gets merged is also a bizarre little affair seeing as we are, hypothetically speaking, merging an infinite number of infinitely-long lists. Again the lazy semantics of Haskell permits us to do this as the lists in question, along with their elements, are only produced on need and not a moment sooner.

[edit] Test code

Proof that this version of a primes finder is quicker can be quickly constructed by merely building a test harness similar to the one we gave for the naive sieve.

<<prime_sieve_better_test.hs>>=
module Main
  where

primes_better

main :: IO ()
main = do
  print $ show (primes !! 10000)

Building and timing the resulting executable can be an eye-opener.

$ ghc --make prime_sieve_better_test.hs
$ time ./prime_sieve_better_test
"104743"

real    0m0.125s
user    0m0.080s
sys     0m0.000s

There is more than two orders of magnitude difference between the naive implementation's performance and the improved implementation's when picking only the ten thousandth prime! Further, this disparity increases the farther into the list of prime numbers you delve. What's going on?

[edit] Analysis

To understand the performance differences above you have to understand how the code you see is being processed by Haskell's runtime system behind the scenes. It can be a shock sometimes to see just how much is being done for you in what appears to be a simple line of code.

[edit] Naive prime calculation

One of the things that's not immediately obvious when looking at the naive implementation is the sheer number of calculations it has to make. Let's take another look at the code in question:

primes :: [Integer]
primes = sieve [2..]
  where
    sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]

sieve is first called with an infinite list of integers starting from 2. This is, in reality, a hidden generator function that generates the next number in the list on demand. The calculation involved in this is trivial and on the order of navigating a real list, so this is not too bad a problem. The problems begin afterwards.

sieve takes the head of the list (read: is given "2" from the generator) and then passes to itself a list comprehension formed from its tail ([3..]) and a guard function that filters out all numbers divisible by the head. This new infinite list is, like the first one, a generator function -- one that performs a modulo division on every subsequent integer!

sieve again takes the head of that list and passes to itself another list comprehension formed from the tail ([3,5..]) and a guard function that filters based on modulo division. By this point half of the integers greater than 3 are having two modulo divisions calculated and the rest have one. The more primes are found, the greater the number of operations being performed with each forward "tick" of the algorithm. By the time we reach the ten thousandth prime, we have cases where some integers are getting as many as 9999 modulo operations performed on them while each and every one is getting at least 1. To call this wasteful of CPU is a gross understatement!

[edit] References

  • The naive implementation was adapted from Melissa O'Neill's paper on the topic. Further the whole entry here was inspired by the contents of that article.
  • The better implementation was adapted from the Haskell Wiki and cleaned up for presentation and explanation here.
Download code
hijacker
hijacker
hijacker
hijacker