Merge sort (Haskell)
This is an implementation of the merge sort algorithm in Haskell. The sorting predicate is user-specified; use
<= to provide the usual stable sorting of numbers. For pedagogical reasons, this implementation is fairly verbose. More concise versions, which make better use of Haskell's capabilities, are also possible.
mergesort function applies a predicate to a list of items that can be compared using that predicate. For a simple list (one element or empty), we just return a duplicate of the current list. For longer lists, we split the list into two halves, recurse on each half, then merge the two halves according to the predicate:
<<mergesort.hs>>= mergesort :: (a -> a -> Bool) -> [a] -> [a] mergesort pred  =  mergesort pred [x] = [x] mergesort pred xs = merge pred (mergesort pred xs1) (mergesort pred xs2) where (xs1,xs2) = split xs
To break the list into two halves without having to first measure its length (an extra traversal) we count in twos over it, and use another pointer into the list to advance in steps of one to get the two halves, keeping the original order to ensure a stable sort:
<<split>>= split :: [a] -> ([a],[a]) split xs = go xs xs where go (x:xs) (_:_:zs) = (x:us,vs) where (us,vs)=go xs zs go xs _ = (,xs)
Another way of splitting the list might be to put all odd-positioned elements in one sublist, and the rest in another. For example, we might have written
split :: [a] -> ([a],[a]) split (x:y:zs) = (x:xs,y:ys) where (xs,ys) = split zs split xs = (xs,)
This, however, is not stable: a list that is already partially-ordered according to the predicate might get rearranged. For example, given a binary predicate
string-length<= that returns
True iff the first argument is not longer than the second, and the list
["aaa", "bbb", "ccc"], you get
["aaa", "ccc", "bbb"] back. The first function avoids this problem. A somewhat simpler alternative would have been to use the standard Haskell Prelude function
splitAt to break the list in two (but then we would have had to explain how
splitAt works – and more importantly, to traverse the list in full, in order to find out its length).
Merge is the heart of the algorithm and operates by interleaving the elements of two ordered lists in such a way that the combined list is ordered. This process takes only linear (O(n)) time. The
merge function takes two lists. If either list is empty, we return
a duplicate of the other as the merged result. Otherwise, the lead elements of each list are compared. A true result from the predicate indicates that the first element should precede the second in sorting order (assuming it works like ≤ predicate; the more usual way of defining this would be to assume it were like < and compare
pred y x instead – the extra caution is to preserve the original order of elements where possible). The "winning" element is then removed from its list and prepended to the rest of the result, which we get from merging the remainder of the lists. This is your regular tail recursion modulo cons right there, especially that Haskell is lazy and the result is consumed on demand – head first, tail later – triggering the actual recursive call in truly a tail position, after the lazy cons (
:) data constructor has been consumed / destructed / traversed over.
<<merge>>= merge :: (a -> a -> Bool) -> [a] -> [a] -> [a] merge pred xs  = xs merge pred  ys = ys merge pred (x:xs) (y:ys) | pred x y = x: merge pred xs (y:ys) | otherwise = y: merge pred (x:xs) ys
mergesort (<=) [1, 5, 6, 4, 3]
at the prompt produces the sorted list
 Iterative implementation
mergesort is doubly-recursive. Another way it can be implemented is in the bottom-up, iterative way,
mergesort pred  =  mergesort pred xs = go [[x] | x <- xs] where go xs@(_:_:_) = go (pairs xs) go [xs] = xs pairs (x:y:xs) = merge pred x y : pairs xs pairs xs = xs
using the same
merge function as above, turning each element of the argument list initially into a singleton list using a list-comprehension expression, and then combining these lists in a pair-wise manner with the
merge function, resulting in fewer and fewer lists until only one is left, which is the sorted list – the result we seek.