# R250/521 (C)

**Other implementations**:**C**| Java

This program implements the R250/521 combined generalized feedback shift register algorithm for generation of pseudorandom numbers. The original code was written by Michael Brundage and has been placed in the public domain. [1]

R250 is a simple but efficient algorithm where we keep a buffer of the last 250 words generated. To generate a new word, we XOR the words at indexes 0 and 103. The new word is added to the end of the buffer, pushing all other words down by 1 index and removing the word at index zero. This is most efficiently implemented using a circular buffer, like this:

staticunsignedintx[250];staticunsignedinti = 0;unsignedintsimple_rand_R250(){unsignedintresult = x[i] = x[i] ^ x[(i + 103) % 250]; i++;returnresult;}

R521 is similar except that it keeps a buffer of the last 521 numbers and XORs words 0 and 168.

## [edit] Main implementation

We begin our implementation by defining and initializing the two buffers. To initialize the buffers, we must initialize each entry using another random number generator, such as rand():

<<buffers>>=#define R250_LEN 250#define R521_LEN 521#define ULONG_BITS (8*sizeof(unsigned long))unsignedlongr250_buffer[R250_LEN];unsignedlongr521_buffer[R521_LEN];<<initialization>>=voidr250_521_init(){r250_521_init variablesinti = R521_LEN;while(i-- > R250_LEN){r521_buffer[i] = rand();}while(i-- > ULONG_BITS){r250_buffer[i] = rand(); r521_buffer[i] = rand();}initialize first ULONG_BITS elements with linearly independent values initialize offsets}

Loops are reversed and fused for speed; this is probably unnecessary. The `rand()`

calls require stdlib.h:

<<header files>>=#include <stdlib.h>

Note that we don't simply compute every value using `rand()`

. Suppose we view our buffer as a bit matrix where *a*_{ij} is the *i*th bit of the *j*th word. If the columns of this matrix are not linearly independent, in other words if one can be formed by XORing a subset of the others, then this will affect the period of the generator. This is highly improbable if we initialize the buffer with random data, but with pseudorandom data it's difficult to be sure. In any case, it's easy to force the columns to be linearly independent by simply setting setting the *i*th bit of the *i*th row to 1 and all bits to the right of that bit to zero, where *i* goes up to the word size, as in this miniature example:

In the code, we accomplish this by applying a sliding bitmask to the initializers for the first `ULONG_BITS`

elements while assigning them:

<<r250_521_init variables>>=unsignedlongmask1 = 1;unsignedlongmask2 = (unsignedlong)(-1L);<<initialize first ULONG_BITS elements with linearly independent values>>=while(i-- > 0){r250_buffer[i] = (rand() | mask1) & mask2; r521_buffer[i] = (rand() | mask1) & mask2; mask2 ^= mask1; mask1 >>= 1;}r250_buffer[0] = mask1; r521_buffer[0] = mask2;

In our first code block we found the index to XOR with using `(i + 103) % 250`

. Instead of performing an expensive modulus operation, we can accomplish the same thing by either adding 103 to *i* or subtracting 250−103, depending on which yields a nonnegative value (since 0 ≤ *i* < 250, one of them must). Also, the use of this expression as an array index on many platforms involves an implicit address calculation where it is multiplied by the size of an array element. We precompute these values instead:

<<offset constants>>=#define R250_IA (sizeof(unsigned long)*103)#define R250_IB (sizeof(unsigned long)*R250_LEN - R250_IA)#define R521_IA (sizeof(unsigned long)*168)#define R521_IB (sizeof(unsigned long)*R521_LEN - R521_IA)

According to Brundage, "[t]his minor optimization increased perf by about 12%."

In the following code, we will combine R250 and R521 to obtain a generator with a much larger period and better statistical properties. To do this, each time a random number is asked for we simply give the XOR of the next random number from R250 and the next random number from R521. In outline the procedure looks like this:

<<main number generator>>=unsignedlongr250_521_random(){r250_521_random variables compute the addresses of the words we XOR with next in each buffer compute and store the next words r and s in each sequence update offset pointersreturnr ^ s;}

In the first step, we take advantage of our precomputed offset values to perform arithmetic directly with byte offsets into the buffers, avoiding any implicit calculations. Two file-level variables store the current byte offset into each buffer:

<<file-level variables>>=intr250_index;intr521_index;<<initialize offsets>>=r250_index = 0; r521_index = 0;<<compute the addresses of the words we XOR with next in each buffer>>=inti1 = r250_index;inti2 = r521_index;intj1, j2; j1 = i1 - R250_IB;if(j1 < 0) j1 = i1 + R250_IA; j2 = i2 - R521_IB;if(j2 < 0) j2 = i2 + R521_IA;

We can now extract the desired word and XOR the current word with it to get our next random integers *r* and *s*:

<<r250_521_random variables>>=unsignedlongr, s;unsignedchar* b1 = (unsignedchar*)r250_buffer;unsignedchar* b2 = (unsignedchar*)r521_buffer;unsignedlong* tmp1, * tmp2;<<compute and store the next words r and s in each sequence>>=tmp1 = (unsignedlong*)(b1 + i1); r = (*(unsignedlong*)(b1 + j1)) ^ (*tmp1); *tmp1 = r; tmp2 = (unsignedlong*)(b2 + i2); s = (*(unsignedlong*)(b2 + j2)) ^ (*tmp2); *tmp2 = s;

Finally, we increment the offset pointers to the next word, wrapping around if necessary:

<<update offset pointers>>=i1 = (i1 !=sizeof(unsignedlong)*(R250_LEN-1)) ? (i1 +sizeof(unsignedlong)) : 0; r250_index = i1; i2 = (i2 !=sizeof(unsignedlong)*(R521_LEN-1)) ? (i2 +sizeof(unsignedlong)) : 0; r521_index = i2;

Throughout this code, Brundage performed some careful instruction reordering by hand to improve scheduling. He says: "I also carefully arrange index increments and comparisons to minimize instructions. gcc 3.3 seems a bit weak on instruction reordering. The j1/j2 branches are mispredicted, but nevertheless these optimizations increased perf by another 10%."

In summary, here is our source file, ready to use:

<<r250_521.c>>=header files buffers file-level variables initialization offset constants main number generator

## [edit] Performance

This code performs very quickly. Below is an ad hoc comparison with glibc's standard `rand()`

call on a 2.8 GhZ Pentium 4 machine, which is also a carefully optimized implementation. Compiled with gcc 3.3 on Linux with the "-O3" flag.

Numbers | rand (sec) | R250/521 (sec) |
---|---|---|

100 | 0.001 | 0.001 |

10,000 | 0.001 | 0.001 |

1,000,000 | 0.027 | 0.011 |

100,000,000 | 4.46 | 1.08 |

1,000,000,000 | 52.7 | 17.7 |

10,000,000,000 | 38.6 |

As one might expect, R250/521 shows little difference at small counts of numbers, but rapidly dominates `rand()`

for even moderate numbers of values, without sacrifice to statistical quality.

Download code |