# One Clock Cycle Polynomial Math

Error correction codes and cryptographic computations are most easily performed working with $GF(2^n)$ polynomials. By using very special values of $n$ we can build circuits which multiply and square in one clock cycle on an FPGA. These circuits come about by flipping back and forth between a standard polynomial basis and a normal basis representation of elements in $GF(2^n)$.

A normal basis is yet another form of polynomial but instead of adding powers of $\beta$ we add squares of powers: $$a_{n-1}\beta^{2^{n-1}}+a_{n-2}\beta^{2^{n-2}}+...+ a_2\beta^{2^2}+a_1\beta^2+a_0\beta$$ Note that the last term is $\beta^{2^0}$. The first piece of magic to take notice of is that $\beta^{2^n}=\beta$. This comes from Fermat's "little" theorem. The advantage of this representation is that the operation of squaring becomes a simple left rotation - a single clock cycle on an FPGA.

To see why this works, we first notice that when squaring a polynomial in $GF(2^n)$ all the cross terms cancel. The only terms that don't cancel are $a_j\beta^{2^j} \ast a_j\beta^{2^j}=a_j\beta^{2^{j+1}}$. The coefficient $a_j$ is now in front of the term $\beta^{2^{j+1}}$ which is the next term to the left. The term $a_{n-1}$ goes down to $a_0$ due to Fermat's "little" theorem. We have just computed the square of a polynomial using a left rotation by one bit.

A normal basis is "optimal" when it takes the fewest number of terms to compute a multiply. For specific values of $n$, a lot of magic happens which makes the circuits to implement the math exceptionally simple. A "type 1 normal basis" has the following rules: $n+1$ is prime and 2 is a generator for $Z_{n+1}$. The values of $n<500$ for which this is true are: 2, 4, 10, 12,18, 28, 36, 52, 58, 60, 66, 82, 100, 106, 130, 138, 148, 162, 172, 178, 180, 196, 210, 226, 268, 292, 316, 346, 348, 372, 378, 388, 418, 420, 429, 442, 460, 466 and 491.

There is one other optimal normal basis form but the type 1 form has a lot of advantages in an FPGA as well as microcontrollers. The fundamental advantage of type 1 is the ability to flip between a normal basis and a polynomial basis representation for the same value. The second rule is what allows this. Since $2$ is a generator, $2^j$ will be one of the values between $1$ and $n$. For an example, let's look at $n=10$:

j |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |

$2^j \mod 11$ |
1 | 2 | 4 | 8 | 5 | 10 | 9 | 7 | 3 | 6 | 1 |

A polynomial basis (or standard basis) requires we work modulo an irreducible polynomial. By choosing the irreducible polynomial of $\sum_{i=0}^{i=n}\beta^i$ the transform between normal basis and polynomial basis is a simple permutation. Why this choice works is way beyond what I want to discuss, for now we'll just say the mathematicians figured it out a long time ago and we will run with it.

To understand what the above table means let's look at several terms in both bases. $$\begin{array}{c}\beta^{2^0}\rightarrow\beta\\ \beta^{2^1}\rightarrow\beta^2\\ \beta^{2^8}\rightarrow\beta^3\end{array}$$ But something strange happens at $j=5$:

$$\beta^{2^5}\rightarrow\beta^{10}=\beta^9+\beta^8+\beta^7+\beta^6+\beta^5+\beta^4+\beta^3+\beta^2+\beta+1$$ modulo the irreducible polynomial. To map $1 = \beta^0$ into $\beta^{2^j}$ we need to sum over all the $\beta^{2^j}$ terms because only $\beta^{2^5}$ carries the $1$ with it. So we have $$1\rightarrow\sum_{j=0}^{j=n-1}\beta^{2^j}$$

Our polynomial in standard basis can be transformed to normal basis for squaring, then we can use the permutation table to transform back. To multiply, we follow the form described by J. K. Wolf in Discrete Mathematics 106/107 (1992) pg 497: $$\sum_{i=0}^{i=n-1} a_i\beta^i \ast \sum_{j=0}^{j=n-1} b_j\beta^j = \sum_{k=0}^{k=n-1} c_k\beta^k$$ $$ c'_k = \sum_{j=0}^{j=n-1} b_j a_{k-j\mod n+1} \text{ for } k=0\dotso n$$ The fundamental trick here is the coefficient $a_n = 0$ and the term $c_n$ is used to modify the result with $$c_k = c_n + c'_k$$ Setting $a_n=0$ effectively "knocks out" one term in each sum. The extra term $c_n$ actually comes from the use of an extension into $\beta^{n+1} + 1$ as the basis polynomial which is $(\beta+1)$ times the "all ones" irreducible polynomial.

So what does all that mean? The multiplication is just an AND gate, the sums are XOR gates. The last step is a complement of the resulting $c'_k$ if $c_n$ is set, otherwise nothing happens and $c_k = c'_k$. We can clock in the $a_i$ coefficients and $b_j$ coefficients and the result falls out $c_k$. Obviously the clock timing depends on the cascade of gates.

Another approach is to use n clocks to perform the multiply. This drastically reduces the area consumed by the logic. On each clock $j$, if $b_j$ is set, XOR the $a$ register with the $c$ register, then rotate $a$ left. On clock n, use $c_n$ to complement the output if $c_n=1$. Note that $a$ and $c$ are $n+1$ bits wide and the bit $a_n$ is set to zero. This $0$ rotates around and "takes out" the correct bit each time. On a microcontroller, this is the most sensible way to do the computation.

The point of all this is speed. By using the right size field, polynomial multiplies can be computed in a single clock cycle on an FPGA. The same field which allows this also can be represented in a normal basis which makes squaring a polynomial possible in one clock cycle as well. Switching between the two representations is a simple permutation (plus complement if a certain bit is set). Everything else we need to do can be done with these basic functions. If you need to go fast, then one of the above listed field sizes is the best place to start.

- Comments
- Write a Comment Select to add a comment

The point is you are going way, way too deep. This is trivially simple. As you point out $(a+b)^2$ has $2ab$ as a cross term. But $2\mod 2 = 0, so the cross term is 0. The other way to describe it is that you get a cross term from each group and when you add them together you get 0. That?s why XOR works as the sum of terms. I?ve always seen it written as a "field over $GF(2^n)$" , and they always have coefficients in $GF(2)$. I suppose to be correct they should say "an extension field over $GF(2^n)$". Otherwise, XOR doesn't work as the summation operator.

Primitive Normal Bases for Finite Fields

H. W. Lenstra, Jr. and R. J. Schoof

Mathematics of Computation

Vol. 48, No. 177 (Jan., 1987), pp. 217-231

which proves "any finite extension of a finite field has a normal basis consisting of primitive roots." So there _is_ a normal basis over GF(8), but I don't know how you map it back to a standard basis (without reading that reference!).

I like your order, the square of each element goes to the symmetric position.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: