## A fast 24 bit PRNG algorithm for the 6502 processor

Popular choices for pseudo random number generators (or PRNG) are linear feedback shift registers (LFSR) and Marsaglia’s xorshift. Both are based on non-zero elements $g$ of maximal order $2^n-1$ in $\mathrm{GL}(n, \mathbb{F}_2)$. Such PRNG algorithms keep a state vector $v \in \mathbb F_2^n$ from which a number of bits is used to generate a “random” number (for example the first $8$ or $16$ coordinates, to get a random byte or word). The operator $g$ is then applied to $v$ to get the next state $\displaystyle v \leftarrow g(v)$ before extracting the next random number, and so forth.

However, both LFSR and xorshift have some drawbacks when trying to implement them on an 8-bit architecture such as the 6502. Both methods rely on bit shifting of the state vector $v$ that consists of several bytes. However, the 6502 only has instructions to shift the accumulator (an 8 bit register) one bit in either direction. On top of that, the 6502 has only two additional registers (X and Y, both 8 bits), that cannot be shifted at all. So implementations need multiple shift instructions and careful use of auxiliary memory locations to store intermediate results. The more instructions and memoy accesses, the more bytes and cycles an algorithm needs. (To be fair, there is a nice implementation of Marsaglia’s algorithm for a 16 bit state and shifts $\ll 7$, $\gg 9$, and $\ll 8$ respectively.)

So let’s turn this situation around. Instead of trying to implement some well-known method, let’s start with some primitive operations that are easy and fast on a 6502 and then try to build a operator $g$ with maximal order from that simple set. I wanted to use a 24 bit (three byte) random state vector and so we’re looking for an operator $g$ of order $2^{24}-1 = 16{,}777{,}215$. Let the three bytes in the random state be a, b, c. We will use the following 18 basic operators as building blocks:

These basic operations can be combined by performing one after another. Such a combination can be denoted as a tuple. For example the tuple (1, 15) represents the following sequence:

1. Clear carry
2. Apply operation 1 to the random state
3. Apply operation 15 to the random state

Note that instead of clearing the carry it is also possible to replace the first occurrence of ROL or ROR in such a sequence with ASL or LSR respectively. It turns out that at least five basic operations must be combined to get an operator of maximum order on the three byte state. In fact, there are no less than 2904 5-tuples that result in a maximal order. However, note that the three bytes a, b, c and the shifts ROL, ROR can be permuted to give equivalent operators. So these 2904 tuples come in 242 groups of 12 equivalent tuples each.

Apart from having maximum order, those operators $g$ are preferred that have a minimal polynomial (of degree 24 in $\mathbb F_2[x]$) with about half of its coefficients equal to 1. For example, both tuples (7, 9, 5, 15, 6) and (7, 7, 4, 6, 8) produce operators with maximal order but the minimal polynomial for the first tuple is $1+x^2+x^3+x^6+x^{11}+x^{12}+x^{15}+x^{16}+x^{17}+x^{19}+x^{20}+x^{23}+x^{24}$

while for the second tuple it is $1+x^7+x^9+x^{16}+x^{24}$.

The following code implements the tuple (7, 9, 5, 15, 6) in 27 bytes and 42 cycles. For any initial state other than (0, 0, 0) this code produces –by repeated application– a sequence of states with period $2^{24}-1$.

                lda a           ; Operation 7 (with carry clear).
asl
eor b
sta b
rol             ; Operation 9.
eor c
sta c
eor a           ; Operation 5.
sta a
lda b           ; Operation 15.
ror
eor c
sta c
eor b           ; Operation 6.
sta b


This entry was posted in Uncategorized. Bookmark the permalink.