Merge sort (Haskell)

From LiteratePrograms

Jump to: navigation, search
Other implementations: ACL2 | C | C++ | dc | Eiffel | Erlang | Haskell | Java | JavaScript | Lisp | OCaml | Oz | Perl | Prolog | Python | Ruby | Scheme

This is an implementation of the merge sort algorithm in Haskell. The sorting predicate is user-specified; use <= to provide the usual stable sorting of numbers. For pedagogical reasons, this implementation is fairly verbose. More concise versions, which make better use of Haskell's capabilities, are also possible.

Contents

Implementation

mergesort

The mergesort function applies a predicate to a list of items that can be compared using that predicate. For a simple list (one element or empty), we just return a duplicate of the current list. For longer lists, we split the list into two halves, recurse on each half, then merge the two halves according to the predicate:

<<mergesort.hs>>=
split
merge

mergesort :: (a -> a -> Bool) -> [a] -> [a]
mergesort pred []   = []
mergesort pred [x]  = [x]
mergesort pred xs = merge pred (mergesort pred xs1) (mergesort pred xs2)
  where
    (xs1,xs2) = split xs

split

The workhorse for splitting is the splitrec recursive function. This method takes two lists together with a work-in-progress result. In each recursion, two elements are eliminated off the first list, and one element transferred from the second list to the result list. Once the first list is exhausted, we return from the recursion. At this point, we must be n / 2 recursions deep -- where n is the length of the original list. As such, the second list will contain the last nn / 2 elements from the original list and the working list will contain the first n / 2 elements that were eliminated from the second list. This pair of lists is the ultimate return value. Due to the way the working list is built, the elements it contains are in reverse order. To ensure a stable sort, the order of the working list is reversed (to restore the original list order) before it is returned.

The split function simply kicks off the recursive chain:

<<split>>=
split :: [a] -> ([a],[a])
split xs = splitrec xs xs []

splitrec :: [a] -> [a] -> [a] -> ([a],[a])
splitrec [] ys zs             = (reverse zs, ys)
splitrec [x] ys zs            = (reverse zs, ys)
splitrec (x1:x2:xs) (y:ys) zs = splitrec xs ys (y:zs) 

A more intuitive way of splitting the list might be to put all odd-positioned elements in one sublist, and the rest in another. For example, we might have written split as

split :: [a] -> ([a],[a])
split []       = ([],[])
split [x]      = ([x],[])
split (x:y:zs) = (x:xs,y:ys)
  where 
    (xs,ys) = split zs

This, however, is not stable: a list that is already partially-ordered according to the predicate might be rearranged. For example, given a binary predicate string-length<= that returns True iff the first argument is not longer than the second, and the list ["aaa", "bbb", "ccc"], you get ["aaa", "ccc", "bbb"] back. The splitrec function avoids this problem by using a copy of the list, which is consumed two elements at a time, as a counter to find the midpoint of the list. A somewhat simpler alternative would have been to use the standard Haskell Prelude function splitAt to break the list in two (but then we would have had to explain how splitAt works).

merge

Merge is the heart of the algorithm and operates by interleaving the elements of two ordered lists in such a way that the combined list is ordered. This process takes only linear (O(n)) time. The merge function takes two lists. If either list is empty, we return a duplicate of the other as the merged result. Otherwise, the lead elements of each list are compared. A true result from the predicate indicates that the first element should precede the second in sorting order. The "winning" element is removed from its list and we recurse on the remainder of the lists. The result, prepended by the removed element, is returned:

<<merge>>=
merge :: (a -> a -> Bool) -> [a] -> [a] -> [a]
merge pred xs []         = xs
merge pred [] ys         = ys
merge pred (x:xs) (y:ys) =
  case pred x y of
    True  -> x: merge pred xs (y:ys)
    False -> y: merge pred (x:xs) ys

Testing

The mergesort implementation can easily be tested in an interactive Haskell environment such as Hugs or the Glasgow Haskell Compiler's GHCi. For example, after loading mergesort.hs into GHCi, typing

mergesort (<=) [1, 5, 6, 4, 3]

at the prompt produces the sorted list

[1,3,4,5,6]
Download code
Personal tools