FPGARelated.com
Forums

Embedded Multipliers in Altera Cyclone

Started by dani...@gmail.com July 26, 2010
Hi all,

In my Cyclone 4 based design I'm getting an embedded multiplier
inferred, as expected from the following VHDL:

C <= A * B;

(where A and B are registered 12 bit values, and the output C is
subsequently registered, with no other logic in the path)

However I'm seeing a timing violation on this path. Looking at the
timing reports, there is nearly a 2ns delay between the output of the
multiplier and the flop. Obviously I'd really like to pull in some of
this 2ns, which would sort out the negative slack problem.

I looked through the documentation for the embedded multipliers, and
as expected there are input and output registers as part of the
embedded multiplier block. But clearly with that 2ns delay the output
register isn't being used. So my question is: how do I write my code
to infer the use of the output registers in the embedded multipliers?
As I tried a number of coding styles, including putting the
multiplication operation directly inside a clocked process and it had
no impact on timing. But I definitely don't want to instantiate the
embedded multiplier directly.  Perhaps there are any VHDL attributes
that may help (anything other than MULTSTYLE  DSP/LOGIC)?

Any suggestions or pointers to documents would be greatly appreciated!
On Jul 26, 9:49=A0pm, "daniel.lar...@gmail.com"
<daniel.lar...@gmail.com> wrote:
> Hi all, > > In my Cyclone 4 based design I'm getting an embedded multiplier > inferred, as expected from the following VHDL: > > C <=3D A * B; > > (where A and B are registered 12 bit values, and the output C is > subsequently registered, with no other logic in the path) > > However I'm seeing a timing violation on this path. Looking at the > timing reports, there is nearly a 2ns delay between the output of the > multiplier and the flop. Obviously I'd really like to pull in some of > this 2ns, which would sort out the negative slack problem. > > I looked through the documentation for the embedded multipliers, and > as expected there are input and output registers as part of the > embedded multiplier block. But clearly with that 2ns delay the output > register isn't being used. So my question is: how do I write my code > to infer the use of the output registers in the embedded multipliers? > As I tried a number of coding styles, including putting the > multiplication operation directly inside a clocked process and it had > no impact on timing. But I definitely don't want to instantiate the > embedded multiplier directly. =A0Perhaps there are any VHDL attributes > that may help (anything other than MULTSTYLE =A0DSP/LOGIC)? > > Any suggestions or pointers to documents would be greatly appreciated!
I would try this Mult: process (iClk, inResetAsync) is begin if inResetAsync =3D '0' then C <=3D (others =3D> '0'); elsif rising_edge(iClk) then -- rising clock edge C <=3D A * B; end if; end process Mult;
I thought I'd already tried that - but it looks like I forgot to reset
the output (i.e. C in this case), which subsequently gave a result
which didn't use the output register. Problem solved now - Thanks


> I would try this > > Mult: process (iClk, inResetAsync) is > =A0 begin > =A0 =A0 if inResetAsync =3D '0' then > =A0 =A0 =A0 C <=3D (others =3D> '0'); > =A0 =A0 elsif rising_edge(iClk) then =A0 =A0 -- rising clock edge > =A0 =A0 =A0 C <=3D A * B; > =A0 =A0 end if; > =A0 end process Mult;
> I thought I'd already tried that - but it looks like I forgot to reset > the output (i.e. C in this case), which subsequently gave a result > which didn't use the output register. Problem solved now - Thanks
That's odd, I'd have expected the output to have been registered whether it was asynchronously reset or not. Is this a bug in the synthesis tool? Nial.
On Jul 26, 8:49=A0pm, "daniel.lar...@gmail.com"
<daniel.lar...@gmail.com> wrote:
> Hi all, > > In my Cyclone 4 based design I'm getting an embedded multiplier > inferred, as expected from the following VHDL: > > C <=3D A * B; > > (where A and B are registered 12 bit values, and the output C is > subsequently registered, with no other logic in the path) > > However I'm seeing a timing violation on this path. Looking at the > timing reports, there is nearly a 2ns delay between the output of the > multiplier and the flop. Obviously I'd really like to pull in some of > this 2ns, which would sort out the negative slack problem. > > I looked through the documentation for the embedded multipliers, and > as expected there are input and output registers as part of the > embedded multiplier block. But clearly with that 2ns delay the output > register isn't being used. So my question is: how do I write my code > to infer the use of the output registers in the embedded multipliers? > As I tried a number of coding styles, including putting the > multiplication operation directly inside a clocked process and it had > no impact on timing. But I definitely don't want to instantiate the > embedded multiplier directly. =A0Perhaps there are any VHDL attributes > that may help (anything other than MULTSTYLE =A0DSP/LOGIC)? > > Any suggestions or pointers to documents would be greatly appreciated!
The following works, I have 10's of thousands of instantions in a similar number of FPGA's actually in the field. The multstyle attribute may be what you need. Synthesis might not use DSP if there is not timing need and no power need. -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D-- -- COPYRIGHT (c) 2010 DAVID GREIG. This source file is the property of David Greig. This work must not be copied without permission from David Greig. -- -- Any copy or derivative of this source file must include this copyright statement. -- ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------- -- File : SyS_Mult.vhd -- Author : David Greig (email : -- Revision : -- Description : signed input data multiplier with clken output reg ---------------------------------------------------------------------------= --------------------------------------------- -- Notes : 2 clock cycle delay -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D-- library IEEE; use IEEE.std_logic_1164.all; use IEEE.numeric_std.all; -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D-- entity SyS_Mult is -- 2 clock cycle delay generic( Gdawidth : natural; Gdbwidth : natural; Gmult_pref : string ); port( arstn : in std_logic; clk : in std_logic; clken : in std_logic; da_i : in std_logic_vector(Gdawidth - 1 downto 0); db_i : in std_logic_vector(Gdbwidth - 1 downto 0); q_o : out std_logic_vector(Gdawidth + Gdbwidth - 1 downto 0) ); end entity SyS_Mult; -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D-- -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~-- architecture rtl of SyS_Mult is -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~-- attribute multstyle : string; -- Implementation style, "logic" "dsp" --------------------------------------------------------------------------= ---------------------------------------------- signal da_r : signed(Gdawidth - 1 downto 0); signal db_r : signed(Gdbwidth - 1 downto 0); signal p_r : signed(Gdawidth + Gdbwidth - 1 downto 0); attribute multstyle of p_r : signal is Gmult_pref; ---------------------------------------------------------------------------= --------------------------------------------- begin ---------------------------------------------------------------------------= --------------------------------------------- prcs_SyS_Mult : process(arstn, clken, clk) begin if (arstn =3D '0') then da_r <=3D (others =3D> '0'); db_r <=3D (others =3D> '0'); p_r <=3D (others =3D> '0'); elsif (clken =3D '0') then null; elsif rising_edge(clk) then da_r <=3D signed(da_i); db_r <=3D signed(db_i); p_r <=3D (da_r * db_r); end if; end process prcs_SyS_Mult; --------------------------------------------------------------------------= ---------------------------------------------- q_o <=3D std_logic_vector(p_r); -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~-- end architecture rtl; -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= ~~~~~~-- -- component SyS_Mult is -- 2 clock cycle delay -- generic( -- Gdawidth : natural; -- Gdbwidth : natural; -- Gmult_pref : string -- ); -- port( -- arstn : in std_logic; -- clk : in std_logic; -- clken : in std_logic; -- da_i : in std_logic_vector(Gdawidth - 1 downto 0); -- db_i : in std_logic_vector(Gdbwidth - 1 downto 0); -- q_o : out std_logic_vector(Gdawidth + Gdbwidth -1 downto 0) -- ); -- end component SyS_Mult; -- i_ : SyS_Mult -- 2 clock cycle delay -- generic map( -- Gdawidth =3D> , -- Gdbwidth =3D> , -- Gmult_pref =3D> -- ) -- port map( -- arstn =3D> , -- clk =3D> , -- clken =3D> , -- da_i =3D> , -- db_i =3D> , -- q_o =3D> -- );