FPGARelated.com
Forums

How to RLOC adders in VHDL/Synplify to avoid broken carry chains?

Started by Ken November 19, 2003
Hello folks,

I am implementing a filter on a -5 Virtex-II part (3000) and the critical
path is one of the longest adder carry chains in the design (28 bits).

I have noticed that the minimum period of my design is being clobbered by
the carry chain of the longest adder changing CLB column half-way through
instead of carrying on up the carry chain in the column it started in...

So, I would like to be able to put in my VHDL code an RLOC constraint (or
something) that would inform Synplify Pro not to do any clever optimisation
that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one
column (an old Ray Andraka post has led me to believe this is what is
happening).

Googling about has yielded some discussions on it but I cannot see exactly
how I would specify this in my VHDL to ensure that the carry chain remains
in one column.

Can someone give me some pointers please (ideally a quick code snippet to
demonstrate  :-)   )?

Thanks in advance for your time,

Ken



-- 
To reply by email, please remove the _MENOWANTSPAM from my email address.


"Ken" <aeu96186_MENOWANTSPAM@yahoo.co.uk> wrote in message
news:bpg5k3$4a$1@dennis.cc.strath.ac.uk...
> > Hello folks, > > I am implementing a filter on a -5 Virtex-II part (3000) and the critical > path is one of the longest adder carry chains in the design (28 bits). > > I have noticed that the minimum period of my design is being clobbered by > the carry chain of the longest adder changing CLB column half-way through > instead of carrying on up the carry chain in the column it started in... > > So, I would like to be able to put in my VHDL code an RLOC constraint (or > something) that would inform Synplify Pro not to do any clever
optimisation
> that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one > column (an old Ray Andraka post has led me to believe this is what is > happening). > > Googling about has yielded some discussions on it but I cannot see exactly > how I would specify this in my VHDL to ensure that the carry chain remains > in one column. > > Can someone give me some pointers please (ideally a quick code snippet to > demonstrate :-) )? > > Thanks in advance for your time, > > Ken > >
Ken, Have you read the constraints guide in the Xilinx software manuals? Look for the RLOC section. You end up with stuff in your UCF like :- INST "*un6_burp_cry_0" RLOC = "X6Y4"; INST "*un6_burp_cry_1" RLOC = "X6Y4"; INST "*un6_burp_cry_2" RLOC = "X6Y5"; INST "*un6_burp_cry_3" RLOC = "X6Y5"; INST "*un6_burp_cry_4" RLOC = "X6Y8"; etc... I used the floorplanner to get the names of things I want to RLOC. For your problem, you could place the carry chain with floorplanner and send the output to a temporary UCF to give you a start on your RLOC stuff. Hope that makes sense! Read about H_SETs, HU_SETs and U_SETs too. good luck, Syms.
Hi

"Symon" <symon_brewer@hotmail.com> escribi&#4294967295; en el mensaje
news:bpg87l$1nmpu4$1@ID-212844.news.uni-berlin.de...
> > "Ken" <aeu96186_MENOWANTSPAM@yahoo.co.uk> wrote in message > news:bpg5k3$4a$1@dennis.cc.strath.ac.uk... > > > > Hello folks, > > > > I am implementing a filter on a -5 Virtex-II part (3000) and the
critical
> > path is one of the longest adder carry chains in the design (28 bits). > > > > I have noticed that the minimum period of my design is being clobbered
by
> > the carry chain of the longest adder changing CLB column half-way
through
> > instead of carrying on up the carry chain in the column it started in... > > > > So, I would like to be able to put in my VHDL code an RLOC constraint
(or
> > something) that would inform Synplify Pro not to do any clever > optimisation > > that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one > > column (an old Ray Andraka post has led me to believe this is what is > > happening). > > > > Googling about has yielded some discussions on it but I cannot see
exactly
> > how I would specify this in my VHDL to ensure that the carry chain
remains
> > in one column. > > > > Can someone give me some pointers please (ideally a quick code snippet
to
> > demonstrate :-) )? > > > > Thanks in advance for your time, > > > > Ken > > > > > Ken, > Have you read the constraints guide in the Xilinx software manuals?
Look
> for the RLOC section. You end up with stuff in your UCF like :- > > INST "*un6_burp_cry_0" RLOC = "X6Y4"; > INST "*un6_burp_cry_1" RLOC = "X6Y4"; > INST "*un6_burp_cry_2" RLOC = "X6Y5"; > INST "*un6_burp_cry_3" RLOC = "X6Y5"; > INST "*un6_burp_cry_4" RLOC = "X6Y8"; > etc... > > I used the floorplanner to get the names of things I want to RLOC. For > your problem, you could place the carry chain with floorplanner and send
the
> output to a temporary UCF to give you a start on your RLOC stuff. Hope
that
> makes sense! Read about H_SETs, HU_SETs and U_SETs too. > good luck, Syms. >
I prefer the way used in one of Xilinx's TechXclusives to embed RLOC attributes directly in VHDL (Relationally Placed Macros). Here's an example of a RPM to perform a registered a + b, using the carry chain using the U_SET attribute. -- begin VHDL code library ieee; use ieee.std_logic_1164.all; library unisim; use unisim.vcomponents.all; use work.rlocs.all; entity a_plus_b_reg is generic (width: integer := 32; setn: integer := 1); port ( clock : IN std_logic; enable : IN std_logic; a : IN std_logic_vector (width-1 downto 0); b : IN std_logic_vector (width-1 downto 0); q : OUT std_logic_vector (width-1 downto 0) ); end a_plus_b_reg; architecture rpm_arch of a_plus_b_reg is attribute INIT: string; attribute BEL: string; attribute RLOC: string; attribute U_SET: string; signal prexor_int_q: std_logic_vector (width-1 downto 0); signal int_carry: std_logic_vector (width-1 downto 0); signal y: std_logic_vector (width-1 downto 0); begin int_carry(0) <= '0'; reg: for i in 0 to width-1 generate attribute U_SET of q_reg: label is "uset" & integer'image(setn); attribute RLOC of q_reg: label is "X0" & "Y" & integer'image(integer(i/2)); attribute BEL of q_reg: label is "FF" & belname_xy(i); begin q_reg: FDE port map ( D => y(i), CE => enable, C => clock, Q => q(i)); end generate; gena: for i in 0 to width-2 generate attribute INIT of q_lut: label is "6"; attribute U_SET of q_lut: label is "uset" & integer'image(setn); attribute U_SET of q_mxy: label is "uset" & integer'image(setn); attribute U_SET of q_xor: label is "uset" & integer'image(setn); attribute RLOC of q_lut: label is "X0" & "Y" & integer'image(integer(i/2)); attribute RLOC of q_mxy: label is "X0" & "Y" & integer'image(integer(i/2)); attribute RLOC of q_xor: label is "X0" & "Y" & integer'image(integer(i/2)); attribute BEL of q_lut: label is belname_fg(i); attribute BEL of q_xor: label is "XOR" & belname_fg(i); begin q_lut: LUT2 --synthesis off generic map (INIT => x"6") --synthesis on port map ( I1 => b(i), I0 => a(i), O => prexor_int_q(i) ); q_mxy: MUXCY port map ( DI => a(i), CI => int_carry(i), S => prexor_int_q(i), O => int_carry(i+1) ); q_xor: XORCY port map ( LI => prexor_int_q(i), CI => int_carry(i), O => y(i) ); end generate; genb: for i in width-1 to width-1 generate attribute INIT of q_lut: label is "6"; attribute U_SET of q_lut: label is "uset" & integer'image(setn); attribute U_SET of q_xor: label is "uset" & integer'image(setn); attribute RLOC of q_lut: label is "X0" & "Y" & integer'image(integer(i/2)); attribute RLOC of q_xor: label is "X0" & "Y" & integer'image(integer(i/2)); attribute BEL of q_lut: label is belname_fg(i); attribute BEL of q_xor: label is "XOR" & belname_fg(i); begin q_lut: LUT2 --synthesis off generic map (INIT => x"6") --synthesis on port map ( I1 => b(i), I0 => a(i), O => prexor_int_q(i) ); q_xor: XORCY port map ( LI => prexor_int_q(i), CI => int_carry(i), O => y(i) ); end generate; end rpm_arch; -- end VHDL code The resulting RPM is a column of 1 x w/2 slices, being w the value assigned to the width the generic The setn generic lets you create different U_SET names for different instances of the entity (if the instances have no relative positions) or the same U_SET name and applying different RLOCs to each instance (if the instances have relative positions). The rlocs package contains a couple of simple functions to return the strings "F" or "G" or the couple "X" or "Y", to differentiate the luts/ffs inside a single slice. Read the constraints guide about RLOC, RLOC_ORIGIN and the different kinds of sets you can create. And the RPM techxcluvise, of course. If you prefer the placer to select the absolute positioning of the RPM, then that's all you need. If you want total control, then you can select the RPM position attaching an RLOC_ORIGIN to the U_SET name in the UCF file. I've successfully used this entity on the virtex2 architecture & XST. Don't know how to tell Synplify Pro to attach those attributes, but it shouldn't be that difficult. The drawback is your design is no longer portable. You're stuck with Xilinx parts that use the XY coordinate system (not all of them). But you can create different versions for different architectures, of course. Best regards Francisco Rodriguez
> Ken, > Have you read the constraints guide in the Xilinx software manuals?
Look
> for the RLOC section. You end up with stuff in your UCF like :- > > INST "*un6_burp_cry_0" RLOC = "X6Y4"; > INST "*un6_burp_cry_1" RLOC = "X6Y4"; > INST "*un6_burp_cry_2" RLOC = "X6Y5"; > INST "*un6_burp_cry_3" RLOC = "X6Y5"; > INST "*un6_burp_cry_4" RLOC = "X6Y8"; > etc... > > I used the floorplanner to get the names of things I want to RLOC. For > your problem, you could place the carry chain with floorplanner and send
the
> output to a temporary UCF to give you a start on your RLOC stuff. Hope
that
> makes sense! Read about H_SETs, HU_SETs and U_SETs too. > good luck, Syms.
Syms, Thanks for the reply, I am familiar with the Xilinx contraints guide but I would like to put the constraint in the VHDL rather than the ucf and I do not want to make it Xilinx specific. An adder is such a simple thing and the device has specific wires to implement it quickly - surely there must be a way to inform the tools to use the carry chain in one column only for max speed? Cheers, Ken
Ken wrote:
> > I am familiar with the Xilinx contraints guide but I would like to put the > constraint in the VHDL rather than the ucf and I do not want to make it > Xilinx specific.
If you are using RLOC's, aren't you making it Xilinx specific? Not only that, are RLOC's guaranteed to even be the same from one Xilinx family to another Xilinx family?
> An adder is such a simple thing and the device has specific wires to > implement it quickly - surely there must be a way to inform the tools to use > the carry chain in one column only for max speed?
I'm sure you've already thought of this, but can you not break the adder up? Good luck, Marc
> If you are using RLOC's, aren't you making it Xilinx specific? > > Not only that, are RLOC's guaranteed to even be the same from one Xilinx > family to another Xilinx family?
I would rather not use RLOCs - I just want to inform the tools that using the carry chain in one column is more important than any fancy optimisations that save a few slices but cause the fast carry chain to broken.
> > > An adder is such a simple thing and the device has specific wires to > > implement it quickly - surely there must be a way to inform the tools to
use
> > the carry chain in one column only for max speed? > > I'm sure you've already thought of this, but can you not break the adder > up?
Quite possibly but that would be a pain in the neck. I just don't see why this should be difficult. Cheers, Ken
Francisco,

Many thanks for your detailed response and the code.

If I go down the road of abandoning trying to get synthesis to accomplish
this then I will certainly be referring to your implementation.

Cheers,

Ken



> > Ken, > > Have you read the constraints guide in the Xilinx software manuals? > Look > > for the RLOC section. You end up with stuff in your UCF like :- > > > > INST "*un6_burp_cry_0" RLOC = "X6Y4"; > > INST "*un6_burp_cry_1" RLOC = "X6Y4"; > > INST "*un6_burp_cry_2" RLOC = "X6Y5"; > > INST "*un6_burp_cry_3" RLOC = "X6Y5"; > > INST "*un6_burp_cry_4" RLOC = "X6Y8"; > > etc... > > > > I used the floorplanner to get the names of things I want to RLOC.
For
> > your problem, you could place the carry chain with floorplanner and send > the > > output to a temporary UCF to give you a start on your RLOC stuff. Hope > that > > makes sense! Read about H_SETs, HU_SETs and U_SETs too. > > good luck, Syms. > > > > I prefer the way used in one of Xilinx's TechXclusives to embed RLOC > attributes > directly in VHDL (Relationally Placed Macros). Here's an example of a RPM
to
> perform a registered a + b, using > the carry chain using the U_SET attribute. > > > -- begin VHDL code > library ieee; > use ieee.std_logic_1164.all; > library unisim; > use unisim.vcomponents.all; > use work.rlocs.all; > > entity a_plus_b_reg is > generic (width: integer := 32; setn: integer := 1); > port ( > clock : IN std_logic; > enable : IN std_logic; > a : IN std_logic_vector (width-1 downto 0); > b : IN std_logic_vector (width-1 downto 0); > q : OUT std_logic_vector (width-1 downto 0) > ); > end a_plus_b_reg; > > architecture rpm_arch of a_plus_b_reg is > > attribute INIT: string; > attribute BEL: string; > attribute RLOC: string; > attribute U_SET: string; > > signal prexor_int_q: std_logic_vector (width-1 downto 0); > signal int_carry: std_logic_vector (width-1 downto 0); > signal y: std_logic_vector (width-1 downto 0); > > begin > > int_carry(0) <= '0'; > > reg: for i in 0 to width-1 generate > attribute U_SET of q_reg: label is "uset" & integer'image(setn); > attribute RLOC of q_reg: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute BEL of q_reg: label is "FF" & belname_xy(i); > begin > q_reg: FDE port map ( > D => y(i), CE => enable, C => clock, > Q => q(i)); > end generate; > > gena: for i in 0 to width-2 generate > attribute INIT of q_lut: label is "6"; > attribute U_SET of q_lut: label is "uset" & integer'image(setn); > attribute U_SET of q_mxy: label is "uset" & integer'image(setn); > attribute U_SET of q_xor: label is "uset" & integer'image(setn); > attribute RLOC of q_lut: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute RLOC of q_mxy: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute RLOC of q_xor: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute BEL of q_lut: label is belname_fg(i); > attribute BEL of q_xor: label is "XOR" & belname_fg(i); > begin > q_lut: LUT2 > --synthesis off > generic map (INIT => x"6") > --synthesis on > port map ( > I1 => b(i), I0 => a(i), > O => prexor_int_q(i) ); > q_mxy: MUXCY port map ( > DI => a(i), CI => int_carry(i), S => prexor_int_q(i), > O => int_carry(i+1) ); > q_xor: XORCY port map ( > LI => prexor_int_q(i), CI => int_carry(i), > O => y(i) ); > end generate; > > genb: for i in width-1 to width-1 generate > attribute INIT of q_lut: label is "6"; > attribute U_SET of q_lut: label is "uset" & integer'image(setn); > attribute U_SET of q_xor: label is "uset" & integer'image(setn); > attribute RLOC of q_lut: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute RLOC of q_xor: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute BEL of q_lut: label is belname_fg(i); > attribute BEL of q_xor: label is "XOR" & belname_fg(i); > begin > q_lut: LUT2 > --synthesis off > generic map (INIT => x"6") > --synthesis on > port map ( > I1 => b(i), I0 => a(i), > O => prexor_int_q(i) ); > q_xor: XORCY port map ( > LI => prexor_int_q(i), CI => int_carry(i), > O => y(i) ); > end generate; > > end rpm_arch; > -- end VHDL code > > > The resulting RPM is a column of 1 x w/2 slices, being w the value
assigned
> to the width the generic > The setn generic lets you create different U_SET names for different > instances of the entity (if the instances > have no relative positions) or the same U_SET name and applying different > RLOCs to each instance > (if the instances have relative positions). > > The rlocs package contains a couple of simple functions to return the > strings "F" or "G" > or the couple "X" or "Y", to differentiate the luts/ffs inside a single > slice. Read the constraints guide > about RLOC, RLOC_ORIGIN and the different kinds of sets you can create.
And
> the RPM techxcluvise, > of course. > > If you prefer the placer to select the absolute positioning of the RPM,
then
> that's all you need. > If you want total control, then you can select the RPM position attaching
an
> RLOC_ORIGIN > to the U_SET name in the UCF file. > > I've successfully used this entity on the virtex2 architecture & XST.
Don't
> know how to tell > Synplify Pro to attach those attributes, but it shouldn't be that
difficult.
> > The drawback is your design is no longer portable. You're stuck with
Xilinx
> parts that use the XY > coordinate system (not all of them). But you can create different versions > for different architectures, of course. > > Best regards > > Francisco Rodriguez > > > > >
Ken wrote:

>>If you are using RLOC's, aren't you making it Xilinx specific? >> >>Not only that, are RLOC's guaranteed to even be the same from one Xilinx >>family to another Xilinx family? > > > I would rather not use RLOCs - I just want to inform the tools that using > the carry chain in one column is more important than any fancy optimisations > that save a few slices but cause the fast carry chain to broken.
I agree - if the FPGA supports it, there is no reason the synthesis tool shouldn't. I'd talk with the synthesis vendors about it if I were you. Synplicity seems quite responsive. Or perhaps you could get the synthesis tool to do what you are wanting by placing a tiny period constraint on that portion of the design, thereby forcing the tool will do everything in its power to make it absolutely as fast as possible. Marc
Marc,

> I agree - if the FPGA supports it, there is no reason the synthesis tool > shouldn't. I'd talk with the synthesis vendors about it if I were you. > Synplicity seems quite responsive.
I have emailed Synplicity support - they have been very good in the past and I expect they will be on this too.
> Or perhaps you could get the synthesis tool to do what you are wanting > by placing a tiny period constraint on that portion of the design, > thereby forcing the tool will do everything in its power to make it > absolutely as fast as possible.
Probrably could - but, the problem would then fall to another adder that is 1 microsecond behind the one just fixed. In a design with many adders, I think global control is needed to force use of the carry chains in one column. Cheers, Ken
Ken wrote: 
> > An adder is such a simple thing and the device has specific wires > to implement it quickly - surely there must be a way to inform the > tools to use the carry chain in one column only for max speed? >
If you don't want to RLOC the primitives, perhaps the next best thing to try is to put a syn_keep attribute on the input operand signals of the adder; if it is in fact a logic optimization that is causing an irregularity which breaks the carry chain placement, that will usually put a stop to it. If one of the operands is a constant, that can often cause this sort of problem; you'll need to assign the constant to a signal having a syn_keep rather than placing the syn_keep on the constant itself. (at least you used to need to do that, I haven't used Synplify since last year) If this is a counter, also note that Synplify has some hardcoded internal thresholds below which it will implement random logic instead of carry chain logic, which can cause similar problems for short counters. Brian