FPGARelated.com
Forums

Re: fastest FPGA

Started by rickman September 6, 2006
Ray Andraka wrote:
> The SRL16's are actually one of the most versatile blocks on the FPGA. > You can use them for: > > counters > state machines
How can you use SRLs for these? I guess a shift register can be used for a simple FSM like a Johnson ring counter.
Ray Andraka wrote:
> > The SRL16's are actually one of the most versatile blocks on the FPGA. > You can use them for: >
*snip*
> variable delays > fixed delays > > The only one of these that is inferred by synthesis tools is the fixed > delay, and that is also the least interesting out of all these.
Several years ago I managed to get Synplify 6.3 to infer SRL16s for a variable delay. Frankly I was amazed it worked :) I can dig up the code if anyone is interested. cheers, aaron
aholtzma@gmail.com wrote:
> Ray Andraka wrote: > >>The SRL16's are actually one of the most versatile blocks on the FPGA. >>You can use them for: >> > > *snip* > >>variable delays >>fixed delays >> >>The only one of these that is inferred by synthesis tools is the fixed >>delay, and that is also the least interesting out of all these. > > > Several years ago I managed to get Synplify 6.3 to infer SRL16s for a > variable delay. Frankly I was amazed it worked :) I can dig up the code > if anyone is interested. > > cheers, > aaron >
Synplicity infers it as long as there are enough taps in the delay. It won't, for instance, infer an SRL16 for a variable delay with only 1,2,3 or 4 clocks delay, instead it infers flip-flops with a mux. Same template works fine with 9 taps.
Ray Andraka wrote:
> Synplicity infers it as long as there are enough taps in the delay. It > won't, for instance, infer an SRL16 for a variable delay with only 1,2,3 > or 4 clocks delay, instead it infers flip-flops with a mux. Same > template works fine with 9 taps.
While it's nice to have synthesis take care of all cases without concern, if you know you want to target an SRL (so you leave the reset out) why not make it longer so there's no problem inferring it "properly?" There's not a big loss in readability and the implementation is still clean. I personally think Synplify synthesis tools do an excellent job in the first 23 miles of the marathon but that last mile is where things trip up a bit. Very decent results all around but there are particular nuisances that - in my opinion - should be handled better. The overall quality is still better than other tools I've known.
John_H wrote:
> Ray Andraka wrote: > >> Synplicity infers it as long as there are enough taps in the delay. >> It won't, for instance, infer an SRL16 for a variable delay with only >> 1,2,3 or 4 clocks delay, instead it infers flip-flops with a mux. >> Same template works fine with 9 taps. > > > While it's nice to have synthesis take care of all cases without > concern, if you know you want to target an SRL (so you leave the reset > out) why not make it longer so there's no problem inferring it > "properly?" There's not a big loss in readability and the > implementation is still clean. > > I personally think Synplify synthesis tools do an excellent job in the > first 23 miles of the marathon but that last mile is where things trip > up a bit. Very decent results all around but there are particular > nuisances that - in my opinion - should be handled better. The overall > quality is still better than other tools I've known.
Because the design needed a selection between 1 and 4 taps. It turned out to be considerably less work to just instantiate the SRL16 and be done with it rather than dealing with the pushing on a rope trying to get the tools to do what I wanted...at the cost of "portability" and an increase in simulation time. Instantiation of things like this gives the confidence that the tools get it right when you already have a specific design approach in mind.
Ray Andraka wrote:
> The SRL16's are actually one of the most versatile blocks on the FPGA. > You can use them for: > > reprogrammable LUTs --poor man's reconfiguration > dual port memory --serial write port, parallel read > synchronous FIFO --smallest FIFO implementation > data reordering --this is really cool for sorting and other apps > counters > state machines > variable delays > fixed delays
Nice list. Care to elaborate on the "data reordering" use case? I can't think of any good examples. Thanks, Tommy
Tommy Thorn wrote:

> > Nice list. Care to elaborate on the "data reordering" use case? I can't > think of any good examples. > > Thanks, > Tommy >
OK, an example: A small FFT requires the input data in bit reversed order, however the circuit presents data in natural order: the data goes into the SRL16 in natural order, and then we permute the address for read so that it reads out in the bit reversed order input: 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 1 2 3 4 5 6 7 8 9 A B C D E F output: 0 8 C 2 A 6 E 1 9 5 D 3 B 7 F address=delay-2 A 3 0 B 4 9 2 x 9 E 7 y B x 9 x=10h y=12h This case requires two cascaded SRL16's to get all the delays needed for the sequence. The address can be produced from an ordinary counter by passing the 4 bit count through a 4 input x4 bit LUT. Reversing the order of data in a set of 8 numbers (a left-right mirror) is similar, using just an SRL16 per bit and a 3 bit counter: input: 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 output: 7 6 5 4 3 2 1 0 address=delay-2 0 2 4 6 8 A C E
"Tommy Thorn" <tommy.thorn@gmail.com> wrote in message 
news:1158088454.332189.307350@b28g2000cwb.googlegroups.com...
> Ray Andraka wrote: >> The SRL16's are actually one of the most versatile blocks on the FPGA. >> You can use them for: >> >> reprogrammable LUTs --poor man's reconfiguration >> dual port memory --serial write port, parallel read >> synchronous FIFO --smallest FIFO implementation >> data reordering --this is really cool for sorting and other apps >> counters >> state machines >> variable delays >> fixed delays > > Nice list. Care to elaborate on the "data reordering" use case? I can't > think of any good examples. > > Thanks, > Tommy
From my own experience: A 2-D example using fixed length SRLs that comes to my mind is a 90 degree pixel rotation. If you have a 16x16 array of vectors that come in in the order A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 Aa Ab Ac Ad Ae Af B0 B1 B2 B3 B4 ... C0 C1 ... . . . P0 P1 P2 P3 ... And want to send them back out rotated 90 degrees so the order is A0 B0 C0 D0 E0 F0 G0 H0 I0 J0 K0 L0 M0 N0 O0 P0 A1 B1 C1 D1 E1 ... A2 B2 ... . . . Af Bf Cf Df ... You can do this completely pipelined for 16x16 blocks without intermediate load/unload cycles with 30 shift registers and a 16 bit barrel shift. The <100 LUTs is much better than the ~384 registers needed for the same functionality. ______ A 1-D example I was considering would use SRLs from byte wide Cyan, Magenta, Yellow, and Black lookup memory outputs (4 color printing) and rearrange into single-color words. This is effectively the 2-D rotate mentioned above, but the 8 instances of the 4x4 is small enough that I could effectively integrate half the fixed-length SRLs into dynamic length SRLs to feed the barrel shift. Direct memory output C0 M0 Y0 K0 C1 M1 Y1 K1 C2 M2 Y2 K2 C3 M3 Y3 K3 -- barrel shift -- C0 M0 Y0 K0 K1 C1 M1 Y1 Y2 K2 C2 M2 M3 Y3 K3 C3 -- dynamic srl -- C0 C1 C2 C3 M0 M1 M2 M3 Y0 Y1 Y2 Y3 K0 K1 K2 K3 It's cheaper to manipulate the addresses for 8 sets of SRLs than to increase the number of SRLs for the larger fixed 2-D example. - John_H
John_H wrote:
> A 2-D example using fixed length SRLs that comes to my mind is a 90 degree > pixel rotation. > > If you have a 16x16 array of vectors that come in in the order > > A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 Aa Ab Ac Ad Ae Af > B0 B1 B2 B3 B4 ... > C0 C1 ... > . > . > . > P0 P1 P2 P3 ... > > And want to send them back out rotated 90 degrees so the order is > > A0 B0 C0 D0 E0 F0 G0 H0 I0 J0 K0 L0 M0 N0 O0 P0 > A1 B1 C1 D1 E1 ... > A2 B2 ...
Just a nitpick but wouldn't this be a transpose? You'd need to invert in X or Y to get a 90 degree rotation. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
David Ashley wrote:
> John_H wrote: >> A 2-D example using fixed length SRLs that comes to my mind is a 90 degree >> pixel rotation. >> >> If you have a 16x16 array of vectors that come in in the order >> >> A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 Aa Ab Ac Ad Ae Af >> B0 B1 B2 B3 B4 ... >> C0 C1 ... >> . >> . >> . >> P0 P1 P2 P3 ... >> >> And want to send them back out rotated 90 degrees so the order is >> >> A0 B0 C0 D0 E0 F0 G0 H0 I0 J0 K0 L0 M0 N0 O0 P0 >> A1 B1 C1 D1 E1 ... >> A2 B2 ... > > Just a nitpick but wouldn't this be a transpose? You'd need to > invert in X or Y to get a 90 degree rotation. > > -Dave
If Sally comes into the room followed by Barbara then Sheila and finally Carol but exiting the room is four pairs of shoes followed by four nicely folded outfits followed by a basket of lingerie and finally four unclad women racing after their departed belongings, is it just a transposition? Things got very rearranged in the process. In the example above the A values enter first followed by the B values and so on. When they exit the rotator scheme, they exit as the zero label values followed by the 1 label values and so on. The transpose is a 90 degree rotation of 16x16 blocks within a 256 element grid. To get this to run continuously with simple registers would require 384 registers. When the resource usage can be nearly quartered, isn't it something to consider? The issue at hand was data reordering. The rotation is a simple reorder but in a way that isn't easy to parallelize at high speeds without throwing a huge number of resources at the problem when the information is available in a serial fashion.