R250/521 (C)

From LiteratePrograms
Jump to: navigation, search
Other implementations: C | Java

This program implements the R250/521 combined generalized feedback shift register algorithm for generation of pseudorandom numbers. The original code was written by Michael Brundage and has been placed in the public domain. [1]

R250 is a simple but efficient algorithm where we keep a buffer of the last 250 words generated. To generate a new word, we XOR the words at indexes 0 and 103. The new word is added to the end of the buffer, pushing all other words down by 1 index and removing the word at index zero. This is most efficiently implemented using a circular buffer, like this:

static unsigned int x[250];
static unsigned int i = 0;
unsigned int simple_rand_R250() {
    unsigned int result = x[i] = x[i] ^ x[(i + 103) % 250];
    return result;

R521 is similar except that it keeps a buffer of the last 521 numbers and XORs words 0 and 168.

[edit] Main implementation

We begin our implementation by defining and initializing the two buffers. To initialize the buffers, we must initialize each entry using another random number generator, such as rand():

#define R250_LEN     250
#define R521_LEN     521
#define ULONG_BITS   (8*sizeof(unsigned long))

unsigned long r250_buffer[R250_LEN];
unsigned long r521_buffer[R521_LEN];

void r250_521_init() {
    r250_521_init variables
    int i = R521_LEN;
    while (i-- > R250_LEN) {
        r521_buffer[i] = rand();
    while (i-- > ULONG_BITS) {
        r250_buffer[i] = rand();
        r521_buffer[i] = rand();
    initialize first ULONG_BITS elements with linearly independent values
    initialize offsets

Loops are reversed and fused for speed; this is probably unnecessary. The rand() calls require stdlib.h:

<<header files>>=
#include <stdlib.h>

Note that we don't simply compute every value using rand(). Suppose we view our buffer as a bit matrix where aij is the ith bit of the jth word. If the columns of this matrix are not linearly independent, in other words if one can be formed by XORing a subset of the others, then this will affect the period of the generator. This is highly improbable if we initialize the buffer with random data, but with pseudorandom data it's difficult to be sure. In any case, it's easy to force the columns to be linearly independent by simply setting setting the ith bit of the ith row to 1 and all bits to the right of that bit to zero, where i goes up to the word size, as in this miniature example:

2 \\
7 \\
4 \\
3 \\
0 & 0 & 0 \\
1 & 0 & 1 \\
1 & 0 & 0 \\
0 & 1 & 1 \\
1 & 1 & 0
1 & 0 & 0 \\
1 & 1 & 0 \\
1 & 0 & 1 \\
0 & 1 & 1 \\
1 & 1 & 0

In the code, we accomplish this by applying a sliding bitmask to the initializers for the first ULONG_BITS elements while assigning them:

<<r250_521_init variables>>=
unsigned long mask1 = 1;
unsigned long mask2 = (unsigned long)(-1L);

<<initialize first ULONG_BITS elements with linearly independent values>>=
while (i-- > 0) {
    r250_buffer[i] = (rand() | mask1) & mask2;
    r521_buffer[i] = (rand() | mask1) & mask2;
    mask2 ^= mask1;
    mask1 >>= 1;
r250_buffer[0] = mask1;
r521_buffer[0] = mask2;

In our first code block we found the index to XOR with using (i + 103) % 250. Instead of performing an expensive modulus operation, we can accomplish the same thing by either adding 103 to i or subtracting 250−103, depending on which yields a nonnegative value (since 0 ≤ i < 250, one of them must). Also, the use of this expression as an array index on many platforms involves an implicit address calculation where it is multiplied by the size of an array element. We precompute these values instead:

<<offset constants>>=
#define R250_IA  (sizeof(unsigned long)*103)
#define R250_IB  (sizeof(unsigned long)*R250_LEN - R250_IA)
#define R521_IA  (sizeof(unsigned long)*168)
#define R521_IB  (sizeof(unsigned long)*R521_LEN - R521_IA)

According to Brundage, "[t]his minor optimization increased perf by about 12%."

In the following code, we will combine R250 and R521 to obtain a generator with a much larger period and better statistical properties. To do this, each time a random number is asked for we simply give the XOR of the next random number from R250 and the next random number from R521. In outline the procedure looks like this:

<<main number generator>>=
unsigned long r250_521_random() {
    r250_521_random variables
    compute the addresses of the words we XOR with next in each buffer
    compute and store the next words r and s in each sequence
    update offset pointers
    return r ^ s;

In the first step, we take advantage of our precomputed offset values to perform arithmetic directly with byte offsets into the buffers, avoiding any implicit calculations. Two file-level variables store the current byte offset into each buffer:

<<file-level variables>>=
int r250_index;
int r521_index;

<<initialize offsets>>=
r250_index = 0;
r521_index = 0;

<<compute the addresses of the words we XOR with next in each buffer>>=
int i1 = r250_index;
int i2 = r521_index;
int j1, j2;
j1 = i1 - R250_IB;
if (j1 < 0)
    j1 = i1 + R250_IA;
j2 = i2 - R521_IB;
if (j2 < 0)
    j2 = i2 + R521_IA;

We can now extract the desired word and XOR the current word with it to get our next random integers r and s:

<<r250_521_random variables>>=
unsigned long r, s;
unsigned char * b1 = (unsigned char*)r250_buffer;
unsigned char * b2 = (unsigned char*)r521_buffer;
unsigned long * tmp1, * tmp2;

<<compute and store the next words r and s in each sequence>>=
tmp1 = (unsigned long *)(b1 + i1);
r = (*(unsigned long *)(b1 + j1)) ^ (*tmp1);
*tmp1 = r;
tmp2 = (unsigned long *)(b2 + i2);
s = (*(unsigned long *)(b2 + j2)) ^ (*tmp2);
*tmp2 = s;

Finally, we increment the offset pointers to the next word, wrapping around if necessary:

<<update offset pointers>>=
i1 = (i1 != sizeof(unsigned long)*(R250_LEN-1)) ? (i1 + sizeof(unsigned long)) : 0;
r250_index = i1;
i2 = (i2 != sizeof(unsigned long)*(R521_LEN-1)) ? (i2 + sizeof(unsigned long)) : 0;
r521_index = i2;

Throughout this code, Brundage performed some careful instruction reordering by hand to improve scheduling. He says: "I also carefully arrange index increments and comparisons to minimize instructions. gcc 3.3 seems a bit weak on instruction reordering. The j1/j2 branches are mispredicted, but nevertheless these optimizations increased perf by another 10%."

In summary, here is our source file, ready to use:

header files
file-level variables

offset constants
main number generator

[edit] Performance

This code performs very quickly. Below is an ad hoc comparison with glibc's standard rand() call on a 2.8 GhZ Pentium 4 machine, which is also a carefully optimized implementation. Compiled with gcc 3.3 on Linux with the "-O3" flag.

Numbers rand (sec) R250/521 (sec)
100 0.001 0.001
10,000 0.001 0.001
1,000,000 0.027 0.011
100,000,000 4.46 1.08
1,000,000,000 52.7 17.7

As one might expect, R250/521 shows little difference at small counts of numbers, but rapidly dominates rand() for even moderate numbers of values, without sacrifice to statistical quality.

Download code