Factorials with prime factorization (Python)

From LiteratePrograms
Jump to: navigation, search
Other implementations: C | Python

The nth factorial, n!, can be computed straightforwardly by multiplying together the range of integers 1, 2, ... n from the bottom up. However, if n is large, this becomes inefficient. The k:th subproduct has roughly k log k digits, so multiplying by k+1 (assumed to fit in a single machine word) takes k log k time. This leads to a total time of O(n2 log n) for calculating n!.

Many algorithms exist for splitting a product of many factors into smaller pieces to balance the size of each subproduct. However, there is a trick to factorials: we can find the prime factorization of n! quickly, much more quickly than we can compute n! itself. And we can do this without stepping through the list 1, 2, ..., n and factorizing each of these numbers. The crucial observation is that n! must necessarily contain each prime number up to n, and that the power of the prime p in n! is given by

\sum_{i=1}^{\infty} \, \lfloor n/p^i \rfloor.

Notice that this is actually a finite sum, since the floor is zero whenever pi > n. This implementation is based on the following references:


In Python, the sum can be implemented as follows:

<<prime_factorial.py>>=
def multiplicity(n, p):
    """Return the power of the prime number p in the
    factorization of n!"""
    if p > n: return 0
    if p > n//2: return 1
    q, m = n, 0
    while q >= p:
        q //= p
        m += q
    return m

We now need a way to obtain the prime numbers up to n, for which purpose we implement a light-weight sieve of Eratosthenes:

<<prime_factorial.py>>=
def primes(n):
    """Generate a list of the prime numbers [2, 3, ... m] where
    m is the largest prime <= n."""
    n = n + 1
    sieve = range(n)
    sieve[:2] = [0, 0]
    for i in xrange(2, int(n**0.5)+1):
        if sieve[i]:
            for j in xrange(i**2, n, i):
                sieve[j] = 0
    # Filter out the composites, which have been replaced by 0's
    return [p for p in sieve if p]

Knowing the prime factors of n!, and the exponent for each factor, all that needs to be done is taking powers. This is also the most important step of the calculation: if we just multiply all the primes together without further thought, no time will be saved. What we can do instead is perform recursive exponentiation by squaring. For exponentiation of a single integer, this is based on the fact that squaring the number cuts the remaining exponent in half. The technique can easily be applied to a list of numbers as well:

<<prime_factorial.py>>=
def powproduct(ns):
    """Compute the explicit value of a factored integer
    given as a list of (base, exponent) pairs."""
    if not ns:
        return 1
    units = 1
    multi = []
    for base, exp in ns:
        if exp == 0:
            continue
        elif exp == 1:
            units *= base
        else:
            if exp % 2:
                units *= base
            multi.append((base, exp//2))
    return units * powproduct(multi)**2

Wrapping things together, we have:

<<prime_factorial.py>>=
def factorial(n):
    return powproduct((p, multiplicity(n, p)) for p in primes(n))

In practice, this program will perform worse than one based on direct multiplication when n is small. A practical implementation should use an empirically determined cutoff to enable prime factorization only for very large n, say, larger than a few hundred.

Many optimizations are also possible. The most obvious is to compute a large table of primes in advance instead of creating a list for each call to factorial, another is to handle powers of two using binary shifting.

Download code
hijacker
hijacker
hijacker
hijacker