Priority Queue (Haskell)

From LiteratePrograms
Jump to: navigation, search

The following code implements a priority search queue in the Haskell programming language. It is chiefly intended to support a better Sieve of Eratosthenes implementation and will be focused on the functionality required for that project. It may later be expanded for more complete functionality.

Please note: this code is not a complete implementation as it is intended to act as a component to another article. As it is worked on, it will be improved, but at this time do not assume this is complete and general-purpose.

A priority search queue is a queue structure that attaches a "priority" to a value. We can treat the "priority" as a key and the whole priority queue as an ordered mapping. It is an extremely useful data type used in a wide variety of circumstances (and, indeed, the motive for this article is to provide infrastructure for an efficient Sieve of Eratosthenes implementation).

Contents

[edit] Data representation

Before we can start to code the functions, we should put some thought into how to represent the data structure. A few moments' reflection gives us this:

<<data type declaration>>=
data Ord k => PriorityQueue k a = Nil | Branch k a (PriorityQueue k a) (PriorityQueue k a)

A PriorityQueue is a recursive data structure built as a tree. The root of the tree is the highest-priority item in the queue (or Nil) and the branches are a traditional binary tree structure for the rest of the leaves with Nil marking the ends. This structure gives us a reasonably efficient ordering of our priority search queue with easy semantics for pedagogical purposes. Other, more efficient, representation schemes (like balanced trees) can of course be imagined.

Note that only the key is limited to being a member of the Ord class. This means that the ordering is strictly by the keys. The data can be of any type and data items of the same priority will be extracted in an arbitrary order.

[edit] Operations

The operations on a normal priority queue are insert, remove and often inspect. These, in order, insert a new key/value pair into a queue; remove (and return) the current highest key/value pairing from a queue; return the current highest key/value pairing of a queue without removing it. This particular priority queue implementation is built for a purpose and has a slightly different interface:

<<interface declarations>>=
-- Return an empty priority queue.
empty

-- Return the highest-priority key.
minKey

-- Return the highest-priority key plus its associated value.
minKeyValue

-- Insert a key/value pair into a queue.
insert

-- Delete the highest-priority key/value pair and insert a new key/value pair into the queue.
deleteMinAndInsert

While this non-standard interface renders the data type less than universally usable, quick inspection of the interface's implementation shows that adding the missing functionality is a trivial exercise at best.

[edit] empty

The empty function provides an empty tree as a basis, thus removing the need to expose the Nil constructor (read: implementation detail) to prying eyes:

<<empty>>=
empty :: Ord k => PriorityQueue k a
empty = Nil

It is, in effect a mere alias for the Nil type constructor.

[edit] minKeyValue

The minKeyValue function is the equivalent of the standard inspect interface. Since the priority queue is a tree rooted with the highest priority, it simply returns the key/value pair of the provided tree:

<<minKeyValue>>=
minKeyValue :: Ord k => PriorityQueue k a -> (k, a)
minKeyValue Nil              = error "empty queue"
minKeyValue (Branch k a _ _) = (k, a)

Note some of the key elements. The first line is a mere type declaration. It is largely unnecessary (the types can be easily inferred by the Haskell compiler) but generally the top-level functions of an interface are best fully-typed as a courtesy to the reader. The second line uses pattern matching on an empty root node and handles the only error condition: taking the highest-priority key/value pair of an empty queue is an undefined operation. The undefined keyword could be used here, but a more friendly error message is likely indicated. (Remember that undefined is a special case of error.)

The final line is the meat of the function. It takes as its pattern a PriorityQueue instance that is not Nil and breaks it apart into its constituents. The left and right branches are not needed and therefore ignored (_). The key and the value are bundled together into a tuple and returned, the queue left untouched.

[edit] minKey

The minKey function can be implemented in a similar fashion:

minKey :: Ord k => PriorityQueue k a -> k
minKey Nil              = error "empty queue"
minKey (Branch k _ _ _) = k

This would work, but is not a good implementation. It replicates much of the minKeyValue functionality, which means that should the implementation of the PriorityQueue type change, this function's modifications would have to be synchronized with the changes to minKeyValue—a highly error-prone operation. (Consider that a very complete implementation of a priority queue could have a dozen functions similar to this with only minor tweaks here and there.) A better solution is this:

<<minKey>>=
minKey :: Ord k => PriorityQueue k a -> k
minKey = fst . minKeyValue

Here we leverage Haskell's nigh-unparallelled ability to combine functions. Consider that the return type for minKeyValue is (k, a) and that the return type for minKey is k alone. If there were only a function we could use to convert a tuple into just its first member!

There is such a function. It is defined in the Haskell Prelude and is thus a standard function available in all Haskell implementations. It is defined thus:

fst :: (a, b) -> a
fst (x, y) = x

So instead of cutting and pasting the implementation of another function and making minor tweaks to it we can instead use function application (and the "points-free style") to call minKeyValue and extract the only part we're interested in. This way even if the implementation of minKeyValue ever changes dramatically, the minValue function will never have to be modified. (Of course if the interface to the function changes, all bets are off. This is why we think about our interfaces before we write our implementations.)

[edit] Helpers

Before continuing with the interface implementations, it would be best to look at and understand the functionality of three helper functions used extensively by the remaining two interface functions:

<<helper declarations>>=
-- Join two queues in sorted order.
union

-- Join two queues without regard to order.
-- (This is a helper to the union helper.)
link

-- Return a queue with a single item from a key/value pair.
singleton

[edit] singleton

The simplest of these is singleton. All this function does is build a PriorityQueue root node with one key/value pair and two Nil branches:

<<singleton>>=
singleton :: Ord k => k -> a -> PriorityQueue k a
singleton k a = Branch k a Nil Nil

It is, again, simply a wrapper around a type constructor (Branch in this case) that hides nasty implementation details from prying eyes. There is room for argument that this function belongs in the interface (to avoid slightly ugly calls like insert k a empty), but at this point it will remain down here in the helpers. As this priority queue is expanded to go beyond its original requirements, singleton could well be elevated into the published interface.

[edit] link

The next function we'll look at—link—is definitely not suited to a published interface. Improper use of it can badly damage a tree. Here's what it looks like:

<<link>>=
link (Branch k a Nil m) r = Branch k a r m
link (Branch k a ll lr) r = Branch k a lr (union ll r)

This code links two queues together. The first pattern matches a queue entry that has no left branch along with a generic queue root. The empty left branch is filled with the passed-in queue root. The second pattern matches a filled queue entry with both branches full. It, as before, ignores the right branch and unites the second queue root with the first's left branch. This function is dangerous to expose because it imposes no ordering whatsoever on the merging. Since a priority queue is an ordered data structure, misuse of this function can lead to severe operational problems.

[edit] union

Notice that, as defined, link uses another helper function in its implementation: union. This is what union looks like:

<<union>>=
union :: Ord k => PriorityQueue k a -> PriorityQueue k a -> PriorityQueue k a
union l Nil = l
union Nil r = r
union l@(Branch kl _ _ _) r@(Branch kr _ _ _)
    | kl <= kr  = link l r
    | otherwise = link r l

It's quite a mouthful, but still easy enough to work out. The first line is its type: it takes two PriorityQueue instances and returns a PriorityQueue. This, combined with its name, is ample clue of what it is supposed to do.

The next two lines are pattern matches and can be read as "a union of any queue with Nil is that queue". It's the remaining three lines that are a problem.

First, let's look at the pattern matching. A representative example is l@(Branch kl _ _ _). What this is doing is matching an expanded Branch node and extracting the key value from it, calling it kl (key, left). What it is also doing, however, is calling the entire node l as a convenience. This way we won't have to keep tossing around code like (Branch kl al ll rl) when we're building things. We can instead just use l and not lose the conceptual forest for the trees.

Anyway, the net effect of the pattern is that the leftmost node of the two passed in is called l and its key kl. The rightmost node is called r and its key kr. This prepares us for the next two lines.

The next two lines contain guards -- a special kind of pattern matching expression used in function declarations, case statements and list comprehensions. In the context here you can view them as a variant of conditional expressions for case expressions; for example this function could be declared thus:

union l@(Branch kl _ _ _) r@(Branch kr _ _ _) =
    if kl <= kr
      then
        link l r
      else
        link r l

The guards are, in this case, more readable than a conditional expression or a case expression would be, however, so they are the most appropriate form.

The first guarded line is a pattern match on a boolean condition. It simply says "if the left key is less than or equal to the right key, link the right node into the left node". Keep in mind the implementation of link above. If the left branch of the left node is Nil, the right node is just inserted in place. If, however, the left branch of the left node is not empty, union is recursively called on it until the larger node is inserted in the right place.

The second guarded line uses the otherwise keyword (like the conditional else) as a catch-all. If the right key is greater than the left key, the left node is linked into the right node's left branch.

[edit] Operations (cont.)

With the helper functions out of the way, we can finally address the remaining two interface functions.

[edit] insert

The first of these is insert:

<<insert>>=
insert :: Ord k => k -> a -> PriorityQueue k a -> PriorityQueue k a
insert k a q = union (singleton k a) q

It is a simple function that just builds a singleton from the passed-in key/value pair and immediately calls union to insert it. There's nothing surprising here.

[edit] deleteMinAndInsert

The next function, deleteMinAndInsert, is only a little bit more involved, this mostly because it's a compound function built to combine operations into a single event. (This is, again, provided for purposes of building a proper Sieve of Eratosthenes implementation. This will later be expanded for more general purposes.)

<<deleteMinAndInsert>>=
deleteMinAndInsert :: Ord k => k -> a -> PriorityQueue k a -> PriorityQueue k a
deleteMinAndInsert k a Nil              = singleton k a
deleteMinAndInsert k a (Branch _ _ l r) = union (insert k a l) r

The first pattern matched is performing this operation on an empty queue. It could be reasoned that this is an undefined condition and, therefore, warrants an error call. We've chosen instead, however, to assume that the deletion is optional and that an empty queue, which by definition can have nothing deleted from it, will just get the new key/value pair inserted.

The second pattern matched is to a node with actual content. We're tossing away that node's key/value pair, so we first insert our new key/value pair into its left tree and then call union on the resulting new tree and the right tree.

[edit] Boilerplate

The rest of the code in the file is book-keeping only.

[edit] Module declaration

First, since we want this code to be usable in other projects, we declare it as a module:

<<module declaration>>=
module PriorityQueue (
    PriorityQueue,
    empty,
    minKey,
    minKeyValue,
    insert,
    deleteMinAndInsert
) where

Note that none of the helper functions are visible in this module's export list. This is a deliberate choice and allows us to change the implementation without the interface being noticeably changed (with the possible exception, of course, of additionally-exposed functions).

[edit] Imports

Of course we're using a few functions from the Prelude in here, so we need to import it. Were we to re-implement this structure in terms of other data structures (lists, let's say), we would, of course, have to import them here.

<<import declarations>>=
import Prelude

[edit] Putting it all together

The overall file is structured like this:

<<PriorityQueue.hs>>=
module declaration
 
import declarations

-- Declare the data type constructors.
data type declaration 

-- Declare the exported interface functions.
interface declarations

-- Declare the private helper functions.
helper declarations

[edit] Background and References

The inspiration for this article came from investigating a high-performance Sieve of Eratosthenes (SoE) for Haskell. While the usual form of this algorithm, as taught in algorithm books and introductions to Haskell, is elegant and simple and highlights some strengths of the language, it fails miserably in performance because it isn't really a proper SoE. Doing some digging, I stumbled across this argument on the Haskell-cafe mailing list. This, in turn, led me to Melissa O'Neill's excellent paper which both highlights why the traditional "SoE" implementation isn't really an SoE and provides a set of ever-more-efficient implementations itself.

Sadly, Ms. O'Neill's paper assumes the presence of a priority queue and, further, a priority queue with some unusual operations. No Haskell environment comes out of the box with a priority queue, so one had to be provided to use her solution. A quick web search revealed this implementation, ready to be adapted to the needs of the SoE. It was quickly stripped down, reorganized, altered to suit the required interface and documented here. A key bug of the implementation (one that would lead to serious performance problems down the road) was repaired in the process.

This document has no formal proof of the qualities of a priority search queue. There is a readable introduction to this topic, however, available online.

Download code
hijacker
hijacker
hijacker
hijacker