FPGARelated.com
Forums

addsubs on FPGA

Started by Unknown January 8, 2014
Hi,=20

I have a query on the RTL designing for addsub based implementations.=20

I heard that addsubs are not preferred on FPGAs as they produce worse area =
and timing QoR. Is it true ? Is resource sharing not preferred in general o=
n FPGAs.

However, if I try a very simple design of addsub shown below it shows me no=
 difference. May be in case of small examples, the difference in implementa=
tion might not be evident. That is why I wanted to ask a broader audience.=
=20

The reasoning & cases for both 'yes' and 'no' will help in understanding th=
e cause ?=20

Thanks
Vipin


module addsub(a, b, oper, res);
	input        oper;
	input  [7:0] a;
	input  [7:0] b;
	output [7:0] res;
	reg    [7:0] res;
	always @(a or b or oper)
	begin
	   if (oper =3D=3D 1=92b0)
	      res =3D a + b;
	   else
	      res =3D a - b;
        end
endmodule
sh.vipin@gmail.com wrote:
 
> I have a query on the RTL designing for addsub based implementations.
> I heard that addsubs are not preferred on FPGAs as they produce > worse area and timing QoR. Is it true ? Is resource sharing not > preferred in general on FPGAs.
> However, if I try a very simple design of addsub shown below > it shows me no difference. May be in case of small examples, > the difference in implementation might not be evident. > That is why I wanted to ask a broader audience.
> The reasoning & cases for both 'yes' and 'no' will help in > understanding the cause ?
I first got interested in FPGA addition and subtraction in the XC4000 days. The XC4000 has a special carry logic that may or may not do this operation. The carry logic changed completely between the XC4000 series and later series, though. In the pre-IC days, it was common to build logic, called ALU, which can implement add, subtract, and some bitwise logic operations using an optimal number of transistors or gates. Similar logic went into TTL.
> module addsub(a, b, oper, res); > input oper; > input [7:0] a; > input [7:0] b; > output [7:0] res; > reg [7:0] res; > always @(a or b or oper) > begin > if (oper == 1???b0) > res = a + b; > else > res = a - b; > end > endmodule
Well, one possible implementation is adder and subtractor, followed by mux to select. But modern logic optimization tools should be able to do better. You could also write: res = a + (oper ? b:-b); which may or may not fit the FPGA better. (Seems to me closer to the way that the carry logic works, though.) If you want optimal LUT use, or minimal delay, then you need to look more carefully at what it is doing. Otherwise, the logic minimization will apply to the whole system, such that it may or may not matter. -- glen
On Wednesday, January 8, 2014 4:33:39 PM UTC-6, sh.v...@gmail.com wrote:
> Hi, I have a query on the RTL designing for addsub based implementations.=
I
> heard that addsubs are not preferred on FPGAs as they produce worse area =
and
> timing QoR.
Such statements often heard about preferences in FPGAs are not always appli= cable to all manufacturers' FPGAs or even all of the same manufacturer's FP= GA families. What might not have worked well at some time months or years a= go may not be an issue today with another FPGA family. Your tests seem to s= how it works fine for your target FPGA and tools. Different synthesis tools= (including different versions of the same tool) may also affect the reults= . On a slightly different issue, IMHO, creating a design where an adder and/o= r subtractor is a separate module to be instantiated makes the larger proje= ct's code less readable and understandable, unless you are specifically tr= ying to re-use a given adder or subtractor's implementation (not just the c= ode) to save utilization on the project.=20 Don't borrow trouble unless you have to. Write the RTL so that you can unde= rstand the function it has to perform (not the way you'd design the hardwar= e) first, then see if that meets your performance/utilization requirements = (not your personal desire to make the "best" implementation). You'd be amaz= ed what a good synthesis tool can do these days. The folks that have to mai= ntain your design (which may be yourself in 6 weeks/months/years) will than= k you for it. Andy

glen herrmannsfeldt wrote:

> sh.vipin@gmail.com wrote: > > > I have a query on the RTL designing for addsub based implementations. > > > I heard that addsubs are not preferred on FPGAs as they produce > > worse area and timing QoR. Is it true ? Is resource sharing not > > preferred in general on FPGAs. > > > However, if I try a very simple design of addsub shown below > > it shows me no difference. May be in case of small examples, > > the difference in implementation might not be evident. > > That is why I wanted to ask a broader audience. > > > The reasoning & cases for both 'yes' and 'no' will help in > > understanding the cause ? > > I first got interested in FPGA addition and subtraction > in the XC4000 days. The XC4000 has a special carry logic > that may or may not do this operation. The carry logic > changed completely between the XC4000 series and later series, > though. > > In the pre-IC days, it was common to build logic, called ALU, > which can implement add, subtract, and some bitwise logic operations > using an optimal number of transistors or gates. Similar logic > went into TTL. > > > module addsub(a, b, oper, res); > > input oper; > > input [7:0] a; > > input [7:0] b; > > output [7:0] res; > > reg [7:0] res; > > always @(a or b or oper) > > begin > > if (oper == 1???b0) > > res = a + b; > > else > > res = a - b; > > end > > endmodule > > Well, one possible implementation is adder and subtractor, > followed by mux to select. But modern logic optimization tools > should be able to do better. You could also write: > > res = a + (oper ? b:-b); > > which may or may not fit the FPGA better. (Seems to me closer > to the way that the carry logic works, though.)
The above has an ambiguous carry out depending on how the -b is implemented. If -b is implemented as ~b+1 then for subtract res = a + ~b + 1 which makes the carry out the result of the +1 increment and not the addition. A simple test case is when a and b are 0. If the -b is a true -b then res = 0 Carry = 0 If the -b is ~b+1 then res = 0 Carry = 1 Might be better to restate the above as res = (oper ? b:-b) + a; which doesn't have this ambiguity. I run into this a lot writing code generators for compilers. w..
jonesandy@comcast.net wrote:
> On Wednesday, January 8, 2014 4:33:39 PM UTC-6, sh.v...@gmail.com wrote: >> Hi, I have a query on the RTL designing for addsub based implementations. I >> heard that addsubs are not preferred on FPGAs as they produce worse area and >> timing QoR.
> Such statements often heard about preferences in FPGAs are not > always applicable to all manufacturers' FPGAs or even all of the > same manufacturer's FPGA families. What might not have worked well > at some time months or years ago may not be an issue today with > another FPGA family. Your tests seem to show it works fine for > your target FPGA and tools. Different synthesis tools (including > different versions of the same tool) may also affect the reults.
Yes. As I noted, there was a big change after the XC4000.
> On a slightly different issue, IMHO, creating a design where > an adder and/or subtractor is a separate module to be > instantiated makes the larger project's code less readable > and understandable, unless you are specifically trying to > re-use a given adder or subtractor's implementation (not > just the code) to save utilization on the project.
Hmm. Hard to say, but in the ones I work on, it is more readable as a separate module. But it might be that the OP was using this to show the question, and not actually code that way. As far as I know, the tools first flatten the netlist, so it doesn't change the result at all.
> Don't borrow trouble unless you have to. Write the RTL so > that you can understand the function it has to perform > (not the way you'd design the hardware) first, then see > if that meets your performance/utilization requirements > (not your personal desire to make the "best" implementation).
It has always seemed to me that people who knew how to design hardware, knew about gates and such, wrote better HDL. That is, not think of it as writing software (like C), but as wiring up gates. But yes, as with software, write for readability.
> You'd be amazed what a good synthesis tool can do these days. > The folks that have to maintain your design (which may be > yourself in 6 weeks/months/years) will thank you for it.
There are cases where the performance goal is "as fast as possible." In this case, compare the logic against the logic of a fixed adder. If it is the same speed, then use it. If it is a lot slower, then see why it is slow. Another possibility is to pipeline the complement stage before an adder. -- glen
On Thursday, January 9, 2014 2:07:28 PM UTC-6, Walter Banks wrote:
> The above has an ambiguous carry out depending on how the -b is implemented.
Interesting, but since res, a and b are all the same size (in bits), in this Verilog statement, there is no observable carry out, so there is no ambiguity. If res were bigger than a and b, then I'm not sure what it would do (but I'm sure it's defined somewhere). I use VHDL. Andy
On Thursday, January 9, 2014 4:38:07 PM UTC-6, glen herrmannsfeldt wrote:
> It has always seemed to me that people who knew how to design hardware, k=
new
> about gates and such, wrote better HDL. That is, not think of it as writi=
ng
> software (like C), but as wiring up gates.=20
I'm almost the opposite. I see RTL written by very experienced HW (not HDL)= designers, and it often reads like a netlist. Might as well have coded it = in edif and saved the cost of a synthesis license.=20 It's not their fault. We don't spend time teaching HDL designers how a synt= hesis tool analyzes their code, and why it infers a register, a latch(!), a= RAM, or combinatorial gates. We teach all these cook-book approaches to de= signing FPGAs and ASICs using the same primitive functions they used with s= chematics. We are sequential thinkers, not parallel thinkers. Therefore, it is best th= at we describe the desired behavior (on a clock cycle basis) in a sequentia= l context (an always block or process), and let the synthesis tool infer pa= rallelism where it is possible (they're excellent at that). Use functions a= nd procedures to break out subsets of sequential behaviors. Instead of thin= king in registers (circuit elemenst), think in clock cycles of delay (behav= ior). The registers are going to get shuffled around by retiming/pipelining= optimizations anyway. The clock cycle delays will still be there. Just be = careful around asynchronous inputs! Of course, when the functionality is so complex that it cannot be easily ex= pressed in a single sequential context, then it must be broken up into sepa= rately instantiated parallel contexts (entities or modules), each including= their own detailed behavior in a sequential context.=20 My point is, we can understand (and therefore express and maintain) more co= mplex behavior when it is conveyed in a sequential context. Imagine a casse= role recipe written in concurrent statements.
> There are cases where the performance goal is "as fast as possible."=20
In my professional experience, such cases are pretty rare. But fun when the= y happen.
> Another possibility is to pipeline the complement stage before an adder.
Especially if oper and b are both available early! andy
jonesandy@comcast.net wrote:
> On Thursday, January 9, 2014 2:07:28 PM UTC-6, Walter Banks wrote: >> The above has an ambiguous carry out depending on how >> the -b is implemented.
> Interesting, but since res, a and b are all the same size > (in bits), in this Verilog statement, there is no observable > carry out, so there is no ambiguity.
> If res were bigger than a and b, then I'm not sure what it > would do (but I'm sure it's defined somewhere). I use VHDL.
I would have to look up the rule if I was actually doing it, but yes, verilog knows about carry if the register is wide enough, and it is supposed to ignore the carry if there aren't more bits. I have found some synthesis tools that complain about the loss of the carry. Unlike most programming languages, verilog looks at the size of the destination (left side of assignment). Well, I usually write continuous assignment, not behavioral assignment. I believe the rules are the same, but I am not sure about that. Does VHDL have something like the verilog continuous assignment? -- glen
On Friday, January 10, 2014 4:15:01 PM UTC-6, glen herrmannsfeldt wrote:
> Does VHDL have something like the verilog continuous assignment?
Yes, VHDL has concurrent assignment statements in several forms: direct, conditional and selected (like a case statement on the RHS), as well as concurrent procedure calls. It is difficult to describe an iterative behavior, such as priority encoding or "counting ones," with concurrent statements; these are much easier with sequential statements. Andy
jonesandy@comcast.net wrote:
> On Thursday, January 9, 2014 4:38:07 PM UTC-6, glen herrmannsfeldt wrote: >> It has always seemed to me that people who knew how to design hardware, knew >> about gates and such, wrote better HDL. That is, not think of it as writing >> software (like C), but as wiring up gates.
> I'm almost the opposite. I see RTL written by very experienced > HW (not HDL) designers, and it often reads like a netlist. > Might as well have coded it in edif and saved the cost of > a synthesis license.
> It's not their fault. We don't spend time teaching HDL > designers how a synthesis tool analyzes their code, > and why it infers a register, a latch(!), a RAM, or > combinatorial gates. We teach all these cook-book approaches > to designing FPGAs and ASICs using the same primitive > functions they used with schematics.
> We are sequential thinkers, not parallel thinkers.
OK, but HDL is inherently parallel, and, more and more, software programming, as multicore systems get more and more popular.
> Therefore, it is best that we describe the desired behavior > (on a clock cycle basis) in a sequential context (an > always block or process), and let the synthesis tool infer > parallelism where it is possible (they're excellent at that).
I believe that C programmers, and other high-level language programmers, who know how to write assembler code tend to write better HLL code. They don't have to think about the generated code for each statement, but still know which constructs generate better code.
> Use functions and procedures to break out subsets of > sequential behaviors. Instead of thinking in registers > (circuit elemenst), think in clock cycles of delay (behavior). > The registers are going to get shuffled around by > retiming/pipelining optimizations anyway. The clock > cycle delays will still be there. Just be careful around > asynchronous inputs!
Some time ago, I was designing systolic arrays with the goal of at most two level of logic (two LUTs) between registers. But registers are what make systolic arrays work, so there really isn't any ignoring them.
> Of course, when the functionality is so complex that it > cannot be easily expressed in a single sequential context, > then it must be broken up into separately instantiated > parallel contexts (entities or modules), each including > their own detailed behavior in a sequential context.
A systolic array is a long array, hundreds to thousands of stages, of fairly simple unit cells. Mostly, I don't have anything against behavioral HDL, but am less sure about people who want to write HDL in C.
> My point is, we can understand (and therefore express and > maintain) more complex behavior when it is conveyed in a > sequential context. Imagine a casserole recipe written > in concurrent statements.
If you are building a factory to produce thousands of them a day, then you probably have to consider it in parallel. For home cooking, though, serial usually works.
>> There are cases where the performance goal is "as fast >> as possible."
> In my professional experience, such cases are pretty rare. > But fun when they happen.
(snip) -- glen