FPGARelated.com
Forums

Cascaded floating-point reduction?

Started by Saad Zafar August 21, 2013
y1=3D 1.5f*y0 - x*y0*y0*y0=20

...Note that all quantities are in single precision floating point. I can't=
 write this equation in behavioral form for synthesizer to optimize because=
 it has to be broken down and fed into FP-multipliers. I have got both the =
y0 and x available in 1 clock cycle ready to be plugged into this equation =
...Now I'm stuck on reducing this equation in least cycles...right now I ha=
ve got cascaded series fp-multipliers feeding into final fp-subtractor...Ea=
ch multiplication consuming one clock cycle.

What in your opinion should be the best way to map this equation in hardwar=
e? Is there an alternative form of this equation that would be more suitabl=
e for implementation?

Regards.
Saad Zafar <saad1024@gmail.com> wrote:
> y1= 1.5f*y0 - x*y0*y0*y0
The usual one would be to factor out a y0, so y1 = (1.5f-x*y0*y0)*y0; That sames one multiplier, but maybe the same number of pipeline stages. If you factor it as 1.5f*y0-(x*y0)*(y0*y0); Then you can do it in one less pipeline stage, but no less multipliers. -- glen
I think the ieee floating point library's * operator is synthesizable, but synthesis would try to build the fp multipliers out of fixed point multipliers (e.g. DSP blocks) itself, which may take more than one clock cycle.

If the above works, then you could enable retiming & pipelining, and then use your original expression, and run the result through multiple pipeline stages. Retiming/pipelining can redistribute the operations and/or logic among the pipeline stages.

I have seen cases where synthesis tools did this automatically when assembling smaller fixed point multipliers into one larger multiplier, so long as there were pipeline register stages (clock cycles) available to spread across.

Andy 
In article <593a4792-bb97-421a-a338-3f644de0256a@googlegroups.com>,
 <jonesandy@comcast.net> wrote:
>I think the ieee floating point library's * operator is synthesizable, >but synthesis would try to build the fp multipliers out of fixed point >multipliers (e.g. DSP blocks) itself, which may take more than >one clock cycle.
Not in any synthesizer I know. Floating point types aren't handled at all, much less operation like multiplication on them. I wouldn't expect them to do so *EVER*. Too much overhead, and too little of a customer base would need/want it. Regards, Mark
On Wednesday, August 21, 2013 6:55:15 PM UTC-5, Mark Curry wrote:
> Not in any synthesizer I know. Floating point types aren't > handled at all, much less operation like multiplication on them. > I wouldn't expect them to do so *EVER*. Too much overhead, > and too little of a customer base would need/want it.
Mark, Ok, I checked our FPGA synthesis tool's documentation. The Synplify Pro reference guide states the following in regards to the bui= lt-in "real" data type:=20 "When one of the following constructs in encountered, compilation continues= , but will subsequently error out if logic must be generated for the construc= t. =95 real data types (real data expressions are supported in VHDL-2008 IEEE float_pkg.vhd) =96 real data types are supported as constant declarations o= r as constants used in expressions as long as no floating point logic must be generated" Thus, you cannot use the built-in real data type or expressions thereof to = generate logic. However, the reference guide also states the following: "The following packages are supported in VHDL 2008: =95 fixed_pkg.vhd, float_pkg.vhd, fixed_generic_pkg.vhd, float_generic_pkg.= vhd, fixed_float_types.vhd =96 IEEE fixed and floating point packages ... String and text I/O functions in the above packages are not supported. Thes= e functions include read(), write(), to_string()." Significantly, it states no other limitations on the support for float_pkg. The float_generic_package (the generic package which float_pkg instantiates= ) defines the "*" operator for type float. From ieee.float_generic_pkg-body.vhdl, the following indicates that the pac= kage is synthesizeable: -- This deferred constant will tell you if the package body is synthesiza= ble -- or implemented as real numbers, set to "true" if synthesizable. constant fphdlsynth_or_real : BOOLEAN :=3D true; -- deferred constant So, while I have not tried it to see, it appears that there are at least de= finite plans, if not the current ability, to synthesize floating point hard= ware long before *EVER* gets here. The resulting hardware may not be particularly efficient, and may not be op= erable in a single clock cycle at any reasonable clock rate, but that is wh= ere retiming and pipelining come in. Andy
In article <3d527338-9687-41dc-b4ab-a60e7a1bba19@googlegroups.com>,
 <jonesandy@comcast.net> wrote:
>On Wednesday, August 21, 2013 6:55:15 PM UTC-5, Mark Curry wrote: >> Not in any synthesizer I know. Floating point types aren't >> handled at all, much less operation like multiplication on them. >> I wouldn't expect them to do so *EVER*. Too much overhead, >> and too little of a customer base would need/want it. > >Mark, > >Ok, I checked our FPGA synthesis tool's documentation. > >However, the reference guide also states the following: > >"The following packages are supported in VHDL 2008: >&#4294967295; fixed_pkg.vhd, float_pkg.vhd, fixed_generic_pkg.vhd, float_generic_pkg.vhd, >fixed_float_types.vhd &#4294967295; IEEE fixed and floating point packages >... >String and text I/O functions in the above packages are not supported. These >functions include read(), write(), to_string()." > >Significantly, it states no other limitations on the support for float_pkg. > >The float_generic_package (the generic package which float_pkg instantiates) >defines the "*" operator for type float. > >From ieee.float_generic_pkg-body.vhdl, the following indicates that the >package is synthesizeable: > > -- This deferred constant will tell you if the package body is synthesizable > -- or implemented as real numbers, set to "true" if synthesizable. > constant fphdlsynth_or_real : BOOLEAN := true; -- deferred constant > >So, while I have not tried it to see, it appears that there are at least >definite plans, if not the current ability, to synthesize floating point >hardware long before *EVER* gets here. > >The resulting hardware may not be particularly efficient, and may not be >operable in a single clock cycle at any reasonable clock rate, but that is >where retiming and pipelining come in. >
Andy, I stand corrected. Being a verilog user - I wasn't familiar with these updates for VHDL-2008. Looks like they've done it correctly. There's default support for IEEE 754 32-bit, and IEEE 754 64-bit. But users can (and very likely should) use the generic float types, specifying all the settings including exponent width, fraction width, rounding options, normalization options, etc... One wonders however how exceptions will be handled in synthesis (i.e. NaN, etc.). The generic 32-bit, (and worse 64-bit) IEEE 754 floating point are rarely EVER appropriate for FPGA (and even ASIC) designs. For both you're almost always designing something for a specific problem. There's not going to be many valid cases where a specfic wire is going to need all that dynamic range. For generic processors, (and DSPs) yeah, it may be appropriate. But more controlled "floating point" like these library's provide, might be useful. I tend to think they'll also be dangerous in the hands of inexperienced HW designers - who will just take the defaults and go. Thanks for the pointer. Mark
jonesandy@comcast.net wrote:
> On Wednesday, August 21, 2013 6:55:15 PM UTC-5, Mark Curry wrote: >> Not in any synthesizer I know. Floating point types aren't >> handled at all, much less operation like multiplication on them. >> I wouldn't expect them to do so *EVER*. Too much overhead, >> and too little of a customer base would need/want it.
(snip)
> "When one of the following constructs in encountered, > compilation continues, but will subsequently error out if > logic must be generated for the construct."
Most of the time, you want internal pipelining on the floating point operations. There is no where to specify that with the usual arithmetic operators, but is is easy of you reference a module to do it. -- glen
On Thursday, August 22, 2013 1:15:40 PM UTC-5, glen herrmannsfeldt wrote:
> Most of the time, you want internal pipelining on the floating point > operations. There is no where to specify that with the usual arithmetic > operators, but is is easy of you reference a module to do it.
Most of the time you will need the extra pipelining if you want to infer bu= ilt-in multipliers. This is where retiming and pipelining synthesis optimizations come in handy= . If you follow up (and/or precede) the expression assignment with a few ex= tra clock cycles of latency (pipeline register stages), the synthesis tool = can distribute the HW across the extra clock cycles automatically.=20 Whether synthesis can do it as well as you can manually, I don't know. But = if it is good enough to work, does it really need to be as good as you coul= d have done manually? I'd rather have the maintainability of the mathematic= al expression, if it will work. Andy
jonesandy@comcast.net wrote:

(snip regarding pipelining)

> Most of the time you will need the extra pipelining if you want > to infer built-in multipliers.
> This is where retiming and pipelining synthesis optimizations > come in handy. If you follow up (and/or precede) the expression > assignment with a few extra clock cycles of latency > (pipeline register stages), the synthesis tool can distribute > the HW across the extra clock cycles automatically.
Which tools do that? That sounds pretty useful. As I am not the OP, the things that I try to do are different. One that I have wondered about is the ability to add extra register stages to speed up the critical path. I work on very long, fixed point pipelines, so usually there is at some point some very long routes which limit the speed. If I could put registers in them, it could run a lot faster.
> Whether synthesis can do it as well as you can manually, > I don't know. But if it is good enough to work, does it > really need to be as good as you could have done manually? > I'd rather have the maintainability of the mathematical > expression, if it will work.
Well, for really large problems every ns counts. For 5% difference, maybe I wouldn't worry about it, but 20% or 30% is worth working for. -- glen
In article <kv86fb$h37$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>jonesandy@comcast.net wrote: > >(snip regarding pipelining) > >> Most of the time you will need the extra pipelining if you want >> to infer built-in multipliers. > >> This is where retiming and pipelining synthesis optimizations >> come in handy. If you follow up (and/or precede) the expression >> assignment with a few extra clock cycles of latency >> (pipeline register stages), the synthesis tool can distribute >> the HW across the extra clock cycles automatically. > >Which tools do that? That sounds pretty useful.
In Xilinx XST, the switch you're looking for is: -register_balancing yes I now leave it on by default - it rarely makes things worse. It seems to help - I notice in the log file it does move Flops forward and backward through the combinational logic in an attempt to better balance the pipeline paths. How well it does the job - I've not dug in that deep.
>As I am not the OP, the things that I try to do are different. >One that I have wondered about is the ability to add extra register >stages to speed up the critical path. I work on very long, fixed point >pipelines, so usually there is at some point some very long routes >which limit the speed. If I could put registers in them, it could >run a lot faster.
Sounds just like what the tool is targetting. If you have access to it, I'd suggest giving it a shot. Regards, Mark