FPGARelated.com
Forums

about fast adder

Started by Giox July 7, 2005
I'm interested in the implementation of a fast adder for 32 bit data.
The CLA is too expensive so I'm searching for something different, can
you provide me some reference?
I think that Ling adder can be a good choice, but I don't know..
Thanks a lot

Giox wrote:
> I'm interested in the implementation of a fast adder for 32 bit data. > The CLA is too expensive so I'm searching for something different, can > you provide me some reference? > I think that Ling adder can be a good choice, but I don't know.. > Thanks a lot >
On a FPGA going faster than the dedicated fast carry ripple chain for only 32 bits data might not be easy. What is you target speed and what is your current speed ? Sylvain
Hi, I'm using a virtex 300E and after the synthesis step (not place and
route), the frequency is estimated as 82.129MHz.
The performances are better than whose that I need, but the occupied
area is considerable.
Gio


Giox wrote:
> Hi, I'm using a virtex 300E and after the synthesis step (not place and > route), the frequency is estimated as 82.129MHz. > The performances are better than whose that I need, but the occupied > area is considerable. > Gio
Which software you use to sintesis you project? did you try to use pipeline ? can you show you code sourse ? des00
I'm using the Xilinx ISE pack, with it's synthesis pack.
I'm not able to show the code, but it is simply a 32 bit CLA bit from 4
different 8 bit CLA with group propagate and generate
Gio

Have you tried a simple adder?
Verilog:
module myadd ( input clk, input [31:0] a, b, output reg [31:0] y );
   always @(posedge clk)  y = a + b;
enmodule

The dedicated adder circuitry is very fast silicon.  Trying to best the 
native performance of the adder is difficult.  Most people have their 
performance hurt by having more than one (or two) levels of logic in the 
adder.  If you go from registered inputs to registered outputs you 
should get significantly better performance than the CLA structure 
you're trying.

Let us know how your performance changes with a simple 32-bit adder.


Giox wrote:
> I'm using the Xilinx ISE pack, with it's synthesis pack. > I'm not able to show the code, but it is simply a 32 bit CLA bit from 4 > different 8 bit CLA with group propagate and generate > Gio
special for you, i did simple test
This code

library ieee;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_1164.all;

entity adder is
	port
	(
	in_clock 	: in std_logic;
	in_reset_b 	: in std_logic;
	in_dataA 	: in std_logic_vector(31 downto 0);
	in_dataB 	: in std_logic_vector(31 downto 0);
	in_strobe  	: in std_logic;
	out_data	: out std_logic_vector(32 downto 0);
	out_strobe  : out std_logic
	);
end entity adder;

architecture adder of adder is
begin
	process (in_clock, in_reset_b) is
	begin
		if (in_reset_b = '0') then
			out_data <= (others => '0');
		elsif (rising_edge(in_clock)) then
			if (in_strobe = '1') then
				out_data <= ext(in_dataA, out_data'length) + ext(in_dataB,
out_data'length);
			end if;
		end if;
	end process;
	process (in_clock) is
	begin
		if (rising_edge(in_clock)) then
			out_strobe <= in_strobe;
		end if;
	end process;

end architecture adder;

i syntesis by Simplify 8.1 for xcv300efg256-6
simplify report is
Worst slack in design: 3.605
...................................
                   Requested     Estimated     Requested     Estimated
             Clock        Clock
Starting Clock     Frequency     Frequency     Period        Period
   Slack     Type         Group
---------------------------------------------------------------------------------------------------------------------
adder|in_clock     100.0 MHz     156.4 MHz     10.000        6.395
   3.605     inferred     Inferred_clkgroup_0
=====================================================================================================================
...............................
then i P&R in ISE 6.3 SP2 full
and report
...............
Number of errors:      0
Number of warnings:    0
Logic Utilization:
  Number of Slice Flip Flops:        26 out of  6,144    1%
  Number of 4 input LUTs:            32 out of  6,144    1%
Logic Distribution:
    Number of occupied Slices:                          17 out of
3,072    1%
    Number of Slices containing only related logic:     17 out of
17  100%
    Number of Slices containing unrelated logic:         0 out of
17    0%
        *See NOTES below for an explanation of the effects of unrelated
logic
Total Number of 4 input LUTs:        32 out of  6,144    1%
   Number of bonded IOBs:           100 out of    176   56%
      IOB Flip Flops:                               8
   Number of GCLKs:                   1 out of      4   25%
   Number of GCLKIOBs:                1 out of      4   25%
.................
where didn't was an error? constrain was only
NET "in_clock" TNM_NET = "in_clock";
TIMESPEC "TS_in_clock" = PERIOD "in_clock" 10.000 ns HIGH 50.00%;
#End clock constraints

# Output Constraints
OFFSET = OUT : 10.000 : AFTER in_clock ;
# Input Constraints
OFFSET = IN : 10.000 : BEFORE in_clock ;

That's why i recomend you to use Simplify :) Good luck!

Gulp, interesting.
I tested your code with my tools, it is faster with simplify than with
my tools. However it seems that the biggest trouble is the use of CLA,
it seems that the synthesis process allows for better results than the
CLA that I implemented by hand. I'm not as experienced as you but is it
possible that a standard (read from standard university book)
implementation of CLA generate conflicts that disable the use of
specific feature of the FPGA?
It seems that yes but I would like your advice.
Thanks again Giovanni

Giox wrote:
> Gulp, interesting. > I tested your code with my tools, it is faster with simplify than with > my tools. However it seems that the biggest trouble is the use of CLA, > it seems that the synthesis process allows for better results than the > CLA that I implemented by hand. I'm not as experienced as you but is it > possible that a standard (read from standard university book) > implementation of CLA generate conflicts that disable the use of > specific feature of the FPGA?
You mean you didn't try the simple + first ? All modern FPGA have a dedicated carry ripple chain that allows a very quick propagation of the carry from a LogicCell to the adjacent one. So by using this, you only need n LogicCells for a n bits adders and the carry is handled by dedicated logic. When trying to do your CLA, you only used generic logic so you add supplementary delays. Using others architecture for addition than the simple + is only good for very big adders. Sylvain
Sorry, but I have no experience in this field and so I thought that the
simple approach could not be prductive so I skipped it.
Thanks for you help.
Giovanni