We have been doing a project on high speed aes using subpippelining concepts we would be happy if we find some code which may help us.. if anyone in this group has any access pls help us
FPGA imple. of aes
Started by ●March 8, 2006
Reply by ●March 8, 20062006-03-08
Take a look at the following cores, they might help you.. http://www.opencores.org/browse.cgi/filter/category_crypto Mordehay.
Reply by ●March 9, 20062006-03-09
i saw it its not of much help..as we are doing it based on subpipelining concepts and composite field arithmetic if you find something of such sort please do help us thanks in advance
Reply by ●March 9, 20062006-03-09
manjunath.rg@gmail.com wrote:> i saw it > its not of much help..as we are doing it based on subpipelining > concepts and composite field arithmetic if you find something of such > sort please do help usdo you have a C based implemention somewhere as an example?
Reply by ●March 9, 20062006-03-09
I've made a implementation of the aes core in fpga which work with pipelining - i.e. only 4 sboxes that I use and itterate each 5 times for every round. I cannot give you the code/spec due to IP issues... the design nature depeneds on what is the speed (i.e. clk cycles) you need for each round and how much memories you can spare (dpbram = 2 sboxes). Hope it helps, Mordehay.
Reply by ●March 9, 20062006-03-09
See http://www.ht-lab.com/freecores/AES/aes.html No pipelining but perhaps the testbench can save you some time. Hans www.ht-lab.com <manjunath.rg@gmail.com> wrote in message news:1141827140.488659.53020@u72g2000cwu.googlegroups.com...> We have been doing a project on high speed aes using subpippelining > concepts we would be happy if we find some code which may help us.. if > anyone in this group has any access pls help us >
Reply by ●March 10, 20062006-03-10
On 9 Mar 2006 04:31:18 -0800, fpga_toys@yahoo.com wrote:> >manjunath.rg@gmail.com wrote: >> i saw it >> its not of much help..as we are doing it based on subpipelining >> concepts and composite field arithmetic if you find something of such >> sort please do help us > >do you have a C based implemention somewhere as an example?You want to try it in your C -> hardware compiler? I'd be interested in the results. AES is a public algorithm, and widely available. The original proposal (RIJNDAEL) was written in C, and is designed to give good performance on machines that can manipulate 8 bit chunks o' data (e.g. most modern CPUs), so it is a good match to C. http://www.google.com.au/search?q=AES+C http://www.google.com.au/search?q=RIJNDAEL+source+code http://en.wikipedia.org/wiki/Advanced_Encryption_Standard Note that AES is a block cypher. These can be used with or without feedback around the outside. The latency isn't so important when not using feedback, which allows the use of subpipelining to increase the clock rate. Unfortunately, many of the interesting crypto applications use block cyphers with feedback (e.g. CBC, CFB), so the latency affects the throughput, and subpipelining doesn't help. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation Google shows that there are many papers claiming rather fast AES in FPGAs (with some fine print saying they're using a non-feedback mode). I've never seen a feedback mode cypher in a real world application get anything over some Gb/s. Regards, Allan
Reply by ●March 10, 20062006-03-10
Allan Herriman wrote:> You want to try it in your C -> hardware compiler? I'd be interested > in the results.Me too :) I'll take a look at it this weekend, as it might make another interesting example for the next FpgaC release. I have a pipelined RSA-72 I did two years ago when looking at building dnet engines that is a monster because of the barrel shifters and LUT RAMS required for retiming. First glance at the referenced materials suggests the problem with AES is going to be 80 or more block rams for S box lookups tables to get any reasonable parallelism. It's not clear there is an easy way to avoid using sbox tables, as the algorithm for creating the table is itterative. The rest of the requirements per round seem pretty timid. I have a couple ideas to ponder first. The feedback chaining clearly limits performance unless you have a fair number of independent concurrent streams that can be muxed into the pipeline - like a 11 port mux/switch used to breakout a very fast connection.
Reply by ●March 10, 20062006-03-10
> > Google shows that there are many papers claiming rather fast AES in > FPGAs (with some fine print saying they're using a non-feedback mode). > I've never seen a feedback mode cypher in a real world application get > anything over some Gb/s. > > Regards, > AllanHi Allan, interesting point, but have you thought about what the reasons may be? Let's do some (approximative) calculations. Assume you have a single AES-Round that runs with a 100MHz Clock. This round needs at least 10 clocks to produce an AES Cipher. With 128 Bits Data width that gives: 128 * 100e6 /10 = 1,28e9 Bits per second So that is the limit for the assumed circuit. Adding a feedback path for block cipher modes will extend the number of clocs to create a ciper. Let's assume 14 clocks to produce a CBC cipher Now we have: 128 * 100e6 /14 = 914,3e6 Bits per second That's all what's possible with the assumed circuit. How can we increase the throughput? 1) Wait for better silicon that allows higher clock rates. 2) Use more chip-space to implement aditional rounds and decrease the number of iterations needed in the round. But that may be rather expensive! 3) Improve the rounds latency. Make it fast to the limit. (Which is at about 500MHz as some vendors claim for their products ;-) ) Now let's assume our circuit will still run at 100MHz, but the improved round runs at 500 MHz. That will reduce the round latency to 2 100MHz cycles. Which gives 6 cycles to create the CBC cipher. Now we have: 128 * 100e6 /6 = 2,1e9 Bits per second So, that's the theoretical limit for the assumed circuit. You can exceed it by investing in additional or better (ASIC) silicon, if you have the money. As I understand the original posting, these guys want to spend some work on solution 3 somehow. My tip to manjunath & co.: Have a look at the standard implementations and the book "The design of rijndael" ISBN: 3540425802 Identify the modules and start optimizing the designs to whatever your goal is. Have a nice synthesis Eilert
Reply by ●March 10, 20062006-03-10
In article <1141986452.587421.137510@i40g2000cwc.googlegroups.com>, <fpga_toys@yahoo.com> wrote:> >Allan Herriman wrote: >> You want to try it in your C -> hardware compiler? I'd be interested >> in the results. > >Me too :) > >I'll take a look at it this weekend, as it might make another >interesting example for the next FpgaC release. I have a pipelined >RSA-72 I did two years ago when looking at building dnet engines that >is a monster because of the barrel shifters and LUT RAMS required for >retiming. First glance at the referenced materials suggests the problem >with AES is going to be 80 or more block rams for S box lookups tables >to get any reasonable parallelism. It's not clear there is an easy way >to avoid using sbox tables, as the algorithm for creating the table is >itterative.There has been a lot of research put into efficient implementations of the S-boxes without using lookup tables; http://www.st.com/stonline/press/magazine/stjournal/vol00/pdf/art08.pdf might be an example. I went to a conference in August where http://class.ee.iastate.edu/tyagi/cpre681/papers/AESCHES05.pdf was presented, which runs AES at 25Gbits/second on an XC3S2000; the round function is pipelined into seven stages of three levels of LUT each. Tom






