# Parallel prefix (Python)

A quick sketch of monoidal and segmented monoidal parallel prefix operations. cf.

##  theory

Given an associative operator, $\oplus$, and an array of data a0,a1,...,an, we can calculate the prefix sum $a_0, a_0\oplus a_1, ..., a_0 \oplus a_1 \oplus ... \oplus a_n$ logarithmically, as $((a_0 \oplus a_1) \oplus ...) \oplus (... \oplus (a_(n-1) \oplus a_n))$, instead of linearly, as $(((a_0 \oplus a_1) \oplus ...) \oplus a_n)$.

##  practice

Lacking a vectorized python implementation, we provide a virtual vectorization:

<<vector op>>=
from itertools import imap
vect = imap

and then code the parallel prefix scan. Note the interplay between the pairwise application of the operators and the recursion. Instead of tackling the problem linearly, element by element, we are tackling it logarithmically, tackling pairs of subproblems with each recursive step.

<<scan>>=
def scan(op,a):
print a
if len(a) > 1:
a[1::2] = vect(op,a[0::2],a[1::2])
a[1::2] = scan(op,a[1::2])
a[2::2] = vect(op,a[1::2],a[2::2])
print a
return a

##  wrapping up

Note that this technique works for arbitrary associative operators: addition, multiplication, string concatenation, etc.

<<parprefix.py>>=
vector op
scan
segmented
if __name__ == "__main__":
import operator
scan(operator.mul,range(1,17))

and the pairwise operation of the scan is clear in the output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
[1, 5, 9, 13, 17, 21, 25, 29]
[6, 22, 38, 54]
[28, 92]
[120]
[120]
[28, 120]
[6, 28, 66, 120]
[1, 6, 15, 28, 45, 66, 91, 120]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
[2, 12, 30, 56, 90, 132, 182, 240]
[24, 1680, 11880, 43680]
[40320, 518918400]
[20922789888000L]
[20922789888000L]
[40320, 20922789888000L]
[24, 40320, 479001600, 20922789888000L]
[2, 24, 720, 40320, 3628800, 479001600, 87178291200L, 20922789888000L]
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600, 6227020800L, 87178291200L, 1307674368000L, 20922789888000L]
['h', 'i', ' ', 'w', 'o', 'r', 'l', 'd']
['hi', ' w', 'or', 'ld']
['hi w', 'orld']
['hi world']
['hi world']
['hi w', 'hi world']
['hi', 'hi w', 'hi wor', 'hi world']
['h', 'hi', 'hi ', 'hi w', 'hi wo', 'hi wor', 'hi worl', 'hi world']

##  but wait, there's more

Prefix scans also support nested parallelism; adding the capability to "reset" the prefix doesn't affect the underlying associativity of the operator. By adding a boolean flag indicating the start of a subsequence:

<<segmented>>=
def segmented_sum(xps,yps):
x,xf = xps
y,yf = yps
if yf:  return (  y,yf)
else:   return (x+y,xf)

we can sum in parallel over partial prefixes, thus performing a parallel parallel prefix:

<<parprefix.py>>=
print [x for (x,f) in
scan(segmented_sum,zip([7,6,5,4,3,2,1,0],
[1,0,1,0,0,1,0,0]))]

Essentially, the flag provides a barrier preventing the scan from propagating:

[(7, 1), (6, 0), (5, 1), (4, 0), (3, 0), (2, 1), (1, 0), (0, 0)]
[(13, 1), (9, 1), (2, 1), (1, 0)]
[(9, 1), (3, 1)]
[(3, 1)]
[(3, 1)]
[(9, 1), (3, 1)]
[(13, 1), (9, 1), (2, 1), (3, 1)]
[(7, 1), (13, 1), (5, 1), (9, 1), (12, 1), (2, 1), (3, 1), (3, 1)]
[7, 13, 5, 9, 12, 2, 3, 3]

Question: How does the nested parallelism of a segmented scan compare to the vectorized Sieve of Eratosthenes (Bash)?

hijacker
hijacker
hijacker
hijacker