FPGARelated.com
Blogs

Polynomial Math

Mike November 3, 20152 comments

Elliptic Curve Cryptography is used as a public key infrastructure to secure credit cards, phones and communications links. All these devices use either FPGA's or embedded microprocessors to compute the algorithms that make the mathematics work. While the math is not hard, it can be confusing the first time you see it.  This blog is an introduction to the operations of squaring and computing an inverse over a finite field which are used in computing Elliptic Curve arithmetic.  Since squaring is simple let's start with that.

To keep the field size in range of something we can do by hand, let's look at a 5 bit field using the primitive polynomial $\beta^5 + \beta^2 + 1$.  This has 32 elements (including 0) so we can find $\beta^i$ for $0 < i < 30$ and each will be unique.  In general, when we represent an element in the field, we use the form $$b_4\beta^4 + b_3\beta^3 + b_2\beta^2 + b_1\beta + b_0$$  What happens when we square this?

We get a form with a combination of all coefficients $\sum_{i=0}^{i=8}c_i\beta_i$. Let's look at the cross terms: $b_0b_1\beta + b_1\beta b_0 = 0$!  This must be true - either one of $b_0$ or $b_1$ is zero, in which case both terms are zero, or both are 1 and the rule of addition over GF(2) cancels them out.  Thus all cross terms cancel out and the result is $b_4\beta^8+b_3\beta^6+b_2\beta^4+b_1\beta^2+b_0$.  We only need the elements $\beta^6$ and $\beta^8$ in our lookup table to compute a square over our primitive polynomial.  For larger field sizes, the lookup table is larger, but we only need half the table, so squaring is faster than multiplication if we need it as an operation. 

From the primitive polynomial $\beta^5=\beta^2+1$, $\beta^6=\beta^3+\beta$, $\beta^7=\beta^4+\beta^2$, and $\beta^8=\beta^3+\beta^2+1$. For squaring, our lookup table only needs two of these, but for general multiplication we can save them all.  We don't need all 31 elements since any general multiplication of two 5 bit fields will always have a maximum power of $\beta^8$.

Since multiplication is a fast lookup, we can compute an inverse using multiplication along with Fermat's "little" theorem: $\alpha^{q-1}=1$ with $\alpha$ being an element in a finite field of order $q$.  In our present example, $q = 32$ so we have $\beta^{31}=1$ (which we would find if we built a table of all elements).  But this is true in general for any element.  Take $$\alpha=\sum_{i=0}^{i=4}b_i\beta^i$$ then Fermat's theorem still applies.  To compute the inverse of an element, we multiply both sides by $\alpha^{-1}$ so that $\frac{1}{\alpha}=\alpha^{q-2}=\alpha^{30}$.

The integer $30$ is $2^4+2^3+2^2+2$.  So if we compute $u=\alpha^2$ and then $v=u^2=\alpha^4$, $w=v^2=\alpha^8$, and $x=w^2=\alpha^{16}$ then the inverse of $\alpha$ is $x \cdot w \cdot v \cdot u$. We created the exponent 30 by adding the squared components.  So we only need a very small lookup table to compute the squares and a general small lookup table to compute the multiplications.  Doing them one step at a time, we never exceed the size of the lookup table, which is much smaller than the full field size.

Another way to compute an inverse is with the Extended Euclid Algorithm.  We want to find $t \cdot s = 1$ modulo the primitive polynomial where $t$ is what we know and $s$ is what we are looking for.  The basic idea is to use long division and keep the remainder.  To take an example, let's divide our primitive polynomial by $\beta^2 + 1$

$$\require{enclose} \begin{array}{r} \beta^3+{\phantom{\beta^2+}}\beta+1\\ \beta^2+1 \enclose{longdiv}{\beta^5+{\phantom{\beta^4+\beta^3+}}\beta^2+{\phantom{\beta+}}1}\\ \underline{\beta^5+{\phantom{\beta^4+}}\beta^3{\phantom{+\beta^2+\beta+1}}}\\  {\phantom{\beta^5+\beta^4+}}\beta^3+\beta^2{\phantom{+\beta+1}}\\ \underline{{\phantom{\beta^5+\beta^4+}}\beta^3+{\phantom{\beta^2+}}\beta{\phantom{+1}}}\\ {\phantom{\beta^5+ \beta^4 +\beta^3+}}\beta^2+\beta{\phantom{+1}}\\ \underline{\phantom{\beta^5+\beta^4+\beta^3+}\beta^2{\phantom{+\beta}}+1}\\ {\phantom{\beta^5+\beta^4+\beta^3+\beta^2+}}\beta{\phantom{+1}} \end{array}$$

This gives us a quotient $( \beta^3+\beta+1)$ and a remainder $\beta$. The Extended Euclid Algorithm uses the quotient and remainder from long division to determine the inverse modulo a primitive polynomial.  The algorithm starts by setting up the loop 2 steps below zero and then iterates from step zero up until the remainder is zero.  Let's call the primitive polynomial $M$.  The reminder is $r_k$, the quotient is $q_k$ and each step of the solution is $s_k$.

The initialization begins with  $$\begin{array}{c}r_{-2}=M \\r_{-1}=t\\ s_{-2}=0\\ s_{-1}=1 \end{array}$$

The iterative loop is then $$\begin{array}{l}r_k=r_{k-2} \mod r_{k-1}\\ q_k=r_{k-2}/r_{k-1}\\ s_k=q_k s_{k-1} + s_{k-2}\end{array}$$

You might say why not just use division if you are going to use the Euclid algorithm which includes division anyway?  The reason to compute an inverse and then multiply is that all terms are modulo the prime polynomial so all powers are less than that of the prime polynomial and each division would only be a remainder.  Computing the inverse and then multiplying gets us the right answer.

So let's look at an example of both methods.  Let $\alpha = \beta^2+1$.  Then the first method is computed by $$\begin{array}{c}u=\alpha^2=\beta^4+1\\ v=u^2=\beta^8+1=\beta^3+\beta^2\\ w=v^2=\beta^6+\beta^4=\beta^4+\beta^3+\beta\\ x=w^2=\beta^8+\beta^6+\beta^2=\beta+1 \end{array}$$

Once we have the powers of $\alpha$ we can compute the combination: $$\begin{array}{c}u \cdot v=(\beta^4+1)(\beta^3+\beta^3)=\beta^7+\beta^3+\beta^6+\beta^2=\beta^4+\beta\\ u \cdot v \cdot w=(\beta^4+\beta)(\beta^4+\beta^3+\beta)=\beta^8+\beta^7+\beta^5+\beta^5+\beta^4+\beta^2=\beta^3+\beta^2+1\\ \alpha^{-1}=u \cdot v \cdot w \cdot x=(\beta^3+\beta^2+1)(\beta+1)=\beta^4+\beta^3+\beta+\beta^3+\beta^2+1\\ \alpha^{-1}=\beta^4+\beta^2+\beta+1 \end{array}$$

Now let's check: $$\begin{array}{c}\alpha \cdot \alpha^{-1}=(\beta^2+1)(\beta^4+\beta^2+\beta+1)\\ =\beta^6+\beta^4+\beta^3+\beta^2+\beta^4+\beta^2+\beta+1\\ = 1 \end{array}$$

Compare this with Euclid's Algorithm.  Initialize with $$\begin{array}{c}r_{-2}=M=\beta^5+\beta^2+1\\ r_{-1}=t=\beta^2+1\\ s_{-2}=0\\ s_{-1}=1 \end{array}$$

We did the first division already above so we know $$\begin{array}{c}q_0=\beta^3+\beta+1\\ r_0=\beta\\ s_0=q_0 s_{-1}+s{-2} =\beta^3+\beta+1 \end{array}$$

The next iteration gives $$\begin{array}{c}q_1=\frac{\beta^2+1}{\beta}=\beta\\ r_1 = 1\\ s_1=q_1 s_0+s_{-1}=1+\beta(\beta^3+\beta+1)=\beta^4+\beta^2+\beta+1 \end{array}$$

Now we see we have the right answer, but the remainder is not zero.  This is a special case - if we continue the algorithm we get both a zero remainder and zero for $s_2$.  That's because $\beta^2+1$ is the modulus of the prime polynomial.  In an efficient program we keep only the last two steps, so we still have the answer if this happens.  $0$ is obviously never a possible answer for an inverse!

The algorithm you choose to use for a particular application depends on what kind of hardware you have.  In an embedded system multiplication of large fields is more complicated than in an FPGA.  There are many words which must be manipulated in a processor, but in an FPGA the whole field can be operated on at once.  Using lookup tables and shifting can be really fast in an FPGA, so even if there are more multiplies, the time it takes to do them is fairly small compared to the division algorithm  On an embedded microprocessor the opposite is usually true and Euclid's algorithm is the method of choice.  But processors change and include more hardware all the time, so it is worth while knowing the details both of your devices and different algorithms so you can pick the best one for the job at hand.



[ - ]
Comment by jms_nhNovember 4, 2015
interesting! i was going to get to Galois fields someday in an article, never sure how to bring it up. I have a Python library for doing math in GF(2): https://bitbucket.org/jason_s/libgf2
[ - ]
Comment by drmikeNovember 4, 2015
For testing things I use Pari/gp, and then write code to see if I know what I'm doing. Usually I don't :) at least for the first few iterations. It's fun to play with!

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: