On 07/10/15 05:02, rickman wrote:
> On 10/5/2015 1:29 PM, Kevin Neilson wrote:
>>>
>>> We are talking modulo 2 multiplies at every bit, otherwise known as
>>> AND gates with no carry? I'm a bit fuzzy on this.
>>>
>>> Now I'm confused by Kevin's description. If the vector is
>>> multiplied by a scalar, what parts are common and what parts are
>>> unique? What parts of this are fixed vs. variable? The only parts
>>> a tool can optimize are the fixed operands. Or I am totally
>>> missing the concept.
>>>
>> Say you're multiplying a by a vector [b c d]. Let's say we're using
>> the field GF(8) so a is 3 bits. Now a can be thought of as (
>> a0*alpha^0 + a1*alpha^1 + a2*alpha^2 ), where a0 is bit 0 of a, and
>> alpha is the primitive element of the field. Then a*b or a*c or a*d
>> is just a sum of some combination of those 3 values in the
>> parentheses, depending upon the locations of the 1s in b, c, or d.
>> So you can premultiply the three values in the parentheses (the
>> common part) and then take sums of subsets of those three (the
>> individual parts). It's all a bunch of XORs at the end. This is
>> just a complicated way of saying that by writing the HDL at a more
>> explicit level, the synthesizer is better able to find common factors
>> and use a lot fewer gates..
>
> Ok, I'm not at all familiar with GFs. I see now a bit of what you are
> saying. But to be honest, I don't know the tools would have any trouble
> with the example you give. The tools are pretty durn good at
> optimizing... *but*... there are two things to optimize for, size and
> performance. They are sometimes mutually exclusive, sometimes not. If
> you ask the tool to give you the optimum size, I don't think you will do
> better if you code it differently, while describing *exactly* the same
> behavior.
>
> If you ask the tool to optimize for speed, the tool will feel free to
> duplicate logic if it allows higher performance, for example, by
> combining terms in different ways. Or less logic may require a longer
> chain of LUTs which will be slower. LUT sizes in FPGAs don't always
> match the logical breakdown so that speed or size can vary a lot
> depending on the partitioning.
>
GF(8) is a Galios Field with 8 elements. That means that there are 8
different elements. These can be written in different ways. Sometimes
people like to be completely abstract and write 0, 1, a, b, c, d, e, f.
<http://www.wolframalpha.com/input/?i=GF%288%29>
Others will want to use the polynomial forms (which are used in the
definition of GF(8)) and write: 0, 1, x, x+1, x�, x�+1, x�+x, x�+x+1.
<http://www.math.uic.edu/~leon/mcs425-s08/handouts/field.pdf>
That form is easily written in binary: 000, 001, 010, 011, 100, etc.
One key point about a GF (or any field) is that you can combine elements
through addition, subtraction, multiplication, and division (except by
0) and get another element of the field. To make this work, however,
addition and multiplication (and therefore their inverses) are very
different from how you normally define them.
A GF is always of size p^n - in this case, p is 2 and n is 3. Your
elements are a set of n elements from Z/p (the integers modulo p).
Addition is just pairwise addition of elements, modulo p. For the case
p = 2 (which is commonly used for practical applications, because it
suits computing so neatly), this means you can hold your elements as a
simple binary string of n digits, and addition is just xor.
Multiplication is a little more complex. Treat your elements as
polynomials (as shown earlier), and multiply them. Any factors of x^i
can be reduced modulo p. Then the whole thing is reduced modulo the
field's defining irreducible degree n polynomial (in this case, x� + x +
1). This always gets you back to another element in the field.
The real magic here is that the multiplication table (excluding 0) forms
a Latin square - every element appears exactly once in each row and
column. This means that division "works".
It's easy to get a finite field like this for size p, where p is prime -
it's just the integers modulo p, and you can use normal addition and
multiplication modulo p. But the beauty of GF's is that they give you
fields of different sizes - in particular, size 2^n.
Back to the task in hand - implementing GF(8) on an FPGA. Addition is
just xor - the tools should not have trouble optimising that!
Multiplication in GF is usually done with lookup tables. For a field as
small as GF(8), you would have a table with all the elements. This
could be handled in a 6x3 bit "ROM", or the tools could generate logic
and then reduce it to simple gates (for example, the LSB bit of the
product of non-zero elements is the xnor of the LSB bits of the
operands). For larger fields, such as the very useful GF(2^8), you will
probably use lookup tables for the logs and antilogs, and use them for
multiplication and division.
A key area of practical application for GF's is in error detection and
correction. A good example is for RAID6 (two redundant disks in a RAID
array) - there is an excellent paper on the subject by the guy who
implemented RAID 6 in Linux, using GF(2^8). It gives a good
introduction to Galois fields.
<https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>