Qs on HDL library code and pipelining

Started by Julio Di Egidio January 1, 2018
Hello everybody, and Happy New Year 2018!

I am new to digital design, here are some basic questions:

In particular for algorithm acceleration (e.g. arithmetic, cryptography,
etc.), does it make sense to think both FPGA and ASIC when writing HDL
library code?

And, about pipelining combinational logic, what maximum gate-delay
granularity would be good (for fmax) on FPGAs?   I am guessing a 2-gate
delay maximum granularity before introducing a register does not pay off.

And, would similar considerations and a 2-gate delay pipelining be best
for ASICs, too?

(Essentially, I am trying to find guidelines to write reusable HDL, but
at the moment I am not even sure that such a thing in fact makes sense.)

Thanks very much in advance for any insight,

Julio
> In particular for algorithm acceleration (e.g. arithmetic, cryptography, > etc.), does it make sense to think both FPGA and ASIC when writing HDL > library code?
I've been making cores to target both ASICs and FPGAs and it's difficult to= make it completely portable. For one thing, the ASIC clocks (for me) is t= wice as fast so I have to double the bus widths for the ASIC cores. The pa= th delay in an FPGA can be 80% route time, but in the ASIC it's mostly logi= c delay. I also have to use specific FPGA primitives which don't exist in = the ASIC. One section I constructed out of instantiated DSP48s for the Xil= inx, but for the ASIC, I just wrote behavioral code and it synthesizes as a= sea of flipflops. (This was a routing-intensive block so it was much easi= er on the ASIC.) Portable, parameterizable code is something to strive for, but it's still o= nly somewhat possible with today's tools, and you're going to have to make = changes for every target, so you don't want to expend too much effort on it= .
On Wednesday, January 3, 2018 at 1:48:22 AM UTC+1, Kevin Neilson wrote:
> > In particular for algorithm acceleration (e.g. arithmetic, cryptography, > > etc.), does it make sense to think both FPGA and ASIC when writing HDL > > library code? > > I've been making cores to target both ASICs and FPGAs and it's difficult to make
it completely portable. For one thing, the ASIC clocks (for me) is twice as fast so I have to double the bus widths for the ASIC cores. You mean because the bus itself stays at the same frequency, right? I have just started trying to get my head around clock domain crossing and similar. If you don't mind me taking the chance: why would you double the bus width in that case, i.e. doesn't that still need a provider that is twice as fast? (Sorry, I guess I am just missing the particulars of the job involved.)
> The path delay in an FPGA can be 80% route time, but in the ASIC it's mostly logic
delay. I also have to use specific FPGA primitives which don't exist in the ASIC. One section I constructed out of instantiated DSP48s for the Xilinx, but for the ASIC, I just wrote behavioral code and it synthesizes as a sea of flipflops. (This was a routing-intensive block so it was much easier on the ASIC.)
> > Portable, parameterizable code is something to strive for, but it's still only
somewhat possible with today's tools, and you're going to have to make changes for every target, so you don't want to expend too much effort on it. OK, and thanks very much for your feedback, Kevin, appreciated. It's a fine line then... Julio
> You mean because the bus itself stays at the same frequency, right? I have > just started trying to get my head around clock domain crossing and similar. > If you don't mind me taking the chance: why would you double the bus width > in that case, i.e. doesn't that still need a provider that is twice as fast? > (Sorry, I guess I am just missing the particulars of the job involved.) >
I got that all backwards. I made a core that was to operate on an FPGA and when I ported it to the ASIC I doubled the clock speed and halved the bus width. (Of course I could've used the same clock and bus width but then the gate count would be twice as big as it really needed to be.) One might might think that halving the bus width is a simple matter of changing a parameter, but of course it never works that way in hardware design. Watch out for clock domain crossings!
On Thursday, January 4, 2018 at 8:34:48 PM UTC+1, Kevin Neilson wrote:

> Watch out for clock domain crossings!
Yep, I am carefully following the reference designs I'm finding around! :) Anyway, I don't see a way to escape the topic even very early in a beginner course: the simplest top level I am writing has at least 2 clock domains, a slow "user" domain for user input and output, and a fast "core" domain for the core logic: and most user inputs I need to bring forward to the core domain, as control signals, likewise I need to bring outputs from the core back out to the user, for display/monitoring: which seems to me a very basic scenario... (Anyway, never mind my beginner's adventures, I understand this is going to take years, but please tell if I am missing something.) Julio