Quicksort (Python)

From LiteratePrograms
Jump to: navigation, search
Other implementations: AWK | C | C++ | Eiffel | Erlang | Forth | Haskell | Java | JavaScript | Mathematica | Mercury | Oz | Python | Python, arrays | Scala | Sed | Visual Basic .NET

We describe several different implementations of the quicksort algorithm. All of these implementations operate on lists. Therefore, they avoid use of the randomized pivot techniques found in, for example, the C implementation of quicksort, since access to a random element of a list takes O(n) time. This leads to poor behavior on sorted or nearly-sorted input; for a better implementation, see Merge sort (Python) or Quicksort (Python, arrays), which uses Python's array standard library module.


[edit] Implementation

[edit] Using list comprehensions

The most straightforward way to implement quicksort in Python is to use list comprehensions. This approach generates two lists, one of elements greater than or equal to the "pivot" element (in this case the first element of the list), and one of elements less than the pivot. These two lists are then recursively sorted, before being concatenated around the pivot to form the final sorted list.

def qsort1(list):
    """Quicksort using list comprehensions"""
    if list == []: 
        return []
        pivot = list[0]
        lesser = qsort1([x for x in list[1:] if x < pivot])
        greater = qsort1([x for x in list[1:] if x >= pivot])
        return lesser + [pivot] + greater

This implementation has the advantage of being clear and easy to understand, yet quite compact. As it turns out, it is also faster than other list-based quicksorts in Python.

[edit] Using a partitioning function

An alternative to using list comprehensions is to partition the list in a single pass using a partitioning function that accumulates list items that are equal to the pivot, as well as those that are lesser and greater than the pivot. This also eliminates the need to process more than once any items that are equal to the pivot. In theory, this should result in a faster sort than the list comprehension version. However in practice it appears that the list comprehension implementation is not only faster than a single-pass partitioning approach, but approaches the performance of an in-place sort (see Quicksort (Python, arrays)).

The new implementation looks essentially the same as the earlier list comprehension implementation, except that the lesser and greater lists (along with equal) are now generated by the partition function.

partition function
def qsort2(list):
    """Quicksort using a partitioning function"""
    if list == []: 
        return []
        pivot = list[0]
        lesser, equal, greater = partition(list[1:], [], [pivot], [])
        return qsort2(lesser) + equal + qsort2(greater)

The partitioning function can be thought of as a straightforward recursion. The base case has an empty list in its first argument, which corresponds to the list to be partitioned. When the list being partitioned is non-empty, we compare the head of the list to the pivot, and add the head to the appropriate list (lesser, equal, or greater) based on the outcome of these comparisons.

<<partition function (recursive)>>= 
def partition(list, l, e, g):
    if list == []:
        return (l, e, g)
        head = list[0]
        if head < e[0]:
            return partition(list[1:], l + [head], e, g)
        elif head > e[0]:
            return partition(list[1:], l, e, g + [head])
            return partition(list[1:], l, e + [head], g)

Although this is an intuitively appealing way to perform list partitioning, Python's lack of support for recursion elimination causes the recursive implementation to suffer from linear stack usage, which is not acceptable for large lists. An iterative implementation (i.e. a manual tail call optimization) solves this problem:

<<partition function>>= 
def partition(list, l, e, g):
    while list != []:
        head = list.pop(0)
        if head < e[0]:
            l = [head] + l
        elif head > e[0]:
            g = [head] + g
            e = [head] + e
    return (l, e, g)

[edit] Testing

We provide a simple self-test for the quicksort implementations. The heart of this test harness is the testQSort procedure. This procedure accepts a sorting function, a stimulus, an expected response, and a testname as parameters. It applies the sorting function to the stimulus, and checks to see whether the actual result matches the expected one. The results of the test are then printed to the console.

def testQSort(qsort, stimulus, response, test):
    result = qsort(stimulus)
    if response == result:
        print qsort.__name__ + " - passed - " + test
        print qsort.__name__ + " - failed - " + test
        print "Expected: ",
        print response
        print "Got: ",
        print result

The test cases provided as part of the test harness are fairly straightforward. One of the tests checks that the sorting function operates correctly on lists of numbers. The other checks that lists of strings are correctly sorted.

<<test cases>>=
def testNumericSort(f):
    stimulus = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]
    response = [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9]
    testQSort(f, stimulus, response, "numeric sort")
def testStringSort(f):       
    stimulus = ["bob","alice","barry","zoe","charlotte","fred","marvin"]
    response = ["alice","barry","bob","charlotte","fred","marvin","zoe"]
    testQSort(f, stimulus, response, "string sort")

The complete qsort.py consists of the quicksort implementations, the test cases, and the test driver:

test cases

if __name__ == "__main__":
    print "Testing quicksort implementations..."
    qsorts = [qsort1, qsort2, qsort3]
    for q in qsorts:

[edit] Building and running

If you have installed Python, then the quicksort implementations can be tested at the command line by issuing the following command:

python qsort.py

Running the tests should produce the following output:

Testing quicksort implementations...
qsort1 - passed - numeric sort
qsort1 - passed - string sort
qsort2 - passed - numeric sort
qsort2 - passed - string sort

[edit] Performance

Although both quicksort implementations apparently work correctly the question of performance remains. To determine the performance of the quicksorts we use a simple test driver which generates randomly populated integer lists that have a length defined via command-line arguments. The quicksort implementations are then applied to the random lists, using the timing module to obtain execution-time information.

import sys
import random
import timing
import qsort

def checkSorted(a):
    for i in xrange(1, len(a) - 1):
        if a[i] < a[i-1]:
            return False
    return True
numElements = int(sys.argv[1])    
testlist = random.sample(xrange(999999999), numElements)
print "# elements: %d"%numElements
for q in qsort.qsort1, qsort.qsort2:
    print "%s"%q.__name__,
    list = testlist[:]
    result = q(list)
    print "- %.3f secs"%(float(timing.micro()) / 1000000),
    if checkSorted(result):
        print "- passed"
        print "- failed"

The table below compares the performance of both version of quicksort for various sizes of list. Times were taken on a 1.5 GhZ PowerPC G4machine under Mac OS X using the standard Python package; we measured both with and without the -O flag, but the difference was not appreciable.

List size 100 500 1000 5000 10000 50000
qsort1 (s) 0.002 0.009 0.019 0.142 0.270 1.853
qsort2 (s) 0.004 0.029 0.097 0.846 4.161 278.242

The qsort1 list comprehension implementation significantly outperforms the qsort2 single-pass partitioning implementation for all list sizes. This is probably the result of Python's lack of tail-call optimization, and possibly also a result of optimizations for list comprehensions within the Python runtime.

Download code