Hello folks, I am implementing a filter on a -5 Virtex-II part (3000) and the critical path is one of the longest adder carry chains in the design (28 bits). I have noticed that the minimum period of my design is being clobbered by the carry chain of the longest adder changing CLB column half-way through instead of carrying on up the carry chain in the column it started in... So, I would like to be able to put in my VHDL code an RLOC constraint (or something) that would inform Synplify Pro not to do any clever optimisation that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one column (an old Ray Andraka post has led me to believe this is what is happening). Googling about has yielded some discussions on it but I cannot see exactly how I would specify this in my VHDL to ensure that the carry chain remains in one column. Can someone give me some pointers please (ideally a quick code snippet to demonstrate :-) )? Thanks in advance for your time, Ken -- To reply by email, please remove the _MENOWANTSPAM from my email address.
How to RLOC adders in VHDL/Synplify to avoid broken carry chains?
Started by ●November 19, 2003
Reply by ●November 19, 20032003-11-19
"Ken" <aeu96186_MENOWANTSPAM@yahoo.co.uk> wrote in message news:bpg5k3$4a$1@dennis.cc.strath.ac.uk...> > Hello folks, > > I am implementing a filter on a -5 Virtex-II part (3000) and the critical > path is one of the longest adder carry chains in the design (28 bits). > > I have noticed that the minimum period of my design is being clobbered by > the carry chain of the longest adder changing CLB column half-way through > instead of carrying on up the carry chain in the column it started in... > > So, I would like to be able to put in my VHDL code an RLOC constraint (or > something) that would inform Synplify Pro not to do any cleveroptimisation> that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one > column (an old Ray Andraka post has led me to believe this is what is > happening). > > Googling about has yielded some discussions on it but I cannot see exactly > how I would specify this in my VHDL to ensure that the carry chain remains > in one column. > > Can someone give me some pointers please (ideally a quick code snippet to > demonstrate :-) )? > > Thanks in advance for your time, > > Ken > >Ken, Have you read the constraints guide in the Xilinx software manuals? Look for the RLOC section. You end up with stuff in your UCF like :- INST "*un6_burp_cry_0" RLOC = "X6Y4"; INST "*un6_burp_cry_1" RLOC = "X6Y4"; INST "*un6_burp_cry_2" RLOC = "X6Y5"; INST "*un6_burp_cry_3" RLOC = "X6Y5"; INST "*un6_burp_cry_4" RLOC = "X6Y8"; etc... I used the floorplanner to get the names of things I want to RLOC. For your problem, you could place the carry chain with floorplanner and send the output to a temporary UCF to give you a start on your RLOC stuff. Hope that makes sense! Read about H_SETs, HU_SETs and U_SETs too. good luck, Syms.
Reply by ●November 19, 20032003-11-19
Hi "Symon" <symon_brewer@hotmail.com> escribi� en el mensaje news:bpg87l$1nmpu4$1@ID-212844.news.uni-berlin.de...> > "Ken" <aeu96186_MENOWANTSPAM@yahoo.co.uk> wrote in message > news:bpg5k3$4a$1@dennis.cc.strath.ac.uk... > > > > Hello folks, > > > > I am implementing a filter on a -5 Virtex-II part (3000) and thecritical> > path is one of the longest adder carry chains in the design (28 bits). > > > > I have noticed that the minimum period of my design is being clobberedby> > the carry chain of the longest adder changing CLB column half-waythrough> > instead of carrying on up the carry chain in the column it started in... > > > > So, I would like to be able to put in my VHDL code an RLOC constraint(or> > something) that would inform Synplify Pro not to do any clever > optimisation > > that will prevent Xilinx ISE 5.2.03i from keeping the carry chain in one > > column (an old Ray Andraka post has led me to believe this is what is > > happening). > > > > Googling about has yielded some discussions on it but I cannot seeexactly> > how I would specify this in my VHDL to ensure that the carry chainremains> > in one column. > > > > Can someone give me some pointers please (ideally a quick code snippetto> > demonstrate :-) )? > > > > Thanks in advance for your time, > > > > Ken > > > > > Ken, > Have you read the constraints guide in the Xilinx software manuals?Look> for the RLOC section. You end up with stuff in your UCF like :- > > INST "*un6_burp_cry_0" RLOC = "X6Y4"; > INST "*un6_burp_cry_1" RLOC = "X6Y4"; > INST "*un6_burp_cry_2" RLOC = "X6Y5"; > INST "*un6_burp_cry_3" RLOC = "X6Y5"; > INST "*un6_burp_cry_4" RLOC = "X6Y8"; > etc... > > I used the floorplanner to get the names of things I want to RLOC. For > your problem, you could place the carry chain with floorplanner and sendthe> output to a temporary UCF to give you a start on your RLOC stuff. Hopethat> makes sense! Read about H_SETs, HU_SETs and U_SETs too. > good luck, Syms. >I prefer the way used in one of Xilinx's TechXclusives to embed RLOC attributes directly in VHDL (Relationally Placed Macros). Here's an example of a RPM to perform a registered a + b, using the carry chain using the U_SET attribute. -- begin VHDL code library ieee; use ieee.std_logic_1164.all; library unisim; use unisim.vcomponents.all; use work.rlocs.all; entity a_plus_b_reg is generic (width: integer := 32; setn: integer := 1); port ( clock : IN std_logic; enable : IN std_logic; a : IN std_logic_vector (width-1 downto 0); b : IN std_logic_vector (width-1 downto 0); q : OUT std_logic_vector (width-1 downto 0) ); end a_plus_b_reg; architecture rpm_arch of a_plus_b_reg is attribute INIT: string; attribute BEL: string; attribute RLOC: string; attribute U_SET: string; signal prexor_int_q: std_logic_vector (width-1 downto 0); signal int_carry: std_logic_vector (width-1 downto 0); signal y: std_logic_vector (width-1 downto 0); begin int_carry(0) <= '0'; reg: for i in 0 to width-1 generate attribute U_SET of q_reg: label is "uset" & integer'image(setn); attribute RLOC of q_reg: label is "X0" & "Y" & integer'image(integer(i/2)); attribute BEL of q_reg: label is "FF" & belname_xy(i); begin q_reg: FDE port map ( D => y(i), CE => enable, C => clock, Q => q(i)); end generate; gena: for i in 0 to width-2 generate attribute INIT of q_lut: label is "6"; attribute U_SET of q_lut: label is "uset" & integer'image(setn); attribute U_SET of q_mxy: label is "uset" & integer'image(setn); attribute U_SET of q_xor: label is "uset" & integer'image(setn); attribute RLOC of q_lut: label is "X0" & "Y" & integer'image(integer(i/2)); attribute RLOC of q_mxy: label is "X0" & "Y" & integer'image(integer(i/2)); attribute RLOC of q_xor: label is "X0" & "Y" & integer'image(integer(i/2)); attribute BEL of q_lut: label is belname_fg(i); attribute BEL of q_xor: label is "XOR" & belname_fg(i); begin q_lut: LUT2 --synthesis off generic map (INIT => x"6") --synthesis on port map ( I1 => b(i), I0 => a(i), O => prexor_int_q(i) ); q_mxy: MUXCY port map ( DI => a(i), CI => int_carry(i), S => prexor_int_q(i), O => int_carry(i+1) ); q_xor: XORCY port map ( LI => prexor_int_q(i), CI => int_carry(i), O => y(i) ); end generate; genb: for i in width-1 to width-1 generate attribute INIT of q_lut: label is "6"; attribute U_SET of q_lut: label is "uset" & integer'image(setn); attribute U_SET of q_xor: label is "uset" & integer'image(setn); attribute RLOC of q_lut: label is "X0" & "Y" & integer'image(integer(i/2)); attribute RLOC of q_xor: label is "X0" & "Y" & integer'image(integer(i/2)); attribute BEL of q_lut: label is belname_fg(i); attribute BEL of q_xor: label is "XOR" & belname_fg(i); begin q_lut: LUT2 --synthesis off generic map (INIT => x"6") --synthesis on port map ( I1 => b(i), I0 => a(i), O => prexor_int_q(i) ); q_xor: XORCY port map ( LI => prexor_int_q(i), CI => int_carry(i), O => y(i) ); end generate; end rpm_arch; -- end VHDL code The resulting RPM is a column of 1 x w/2 slices, being w the value assigned to the width the generic The setn generic lets you create different U_SET names for different instances of the entity (if the instances have no relative positions) or the same U_SET name and applying different RLOCs to each instance (if the instances have relative positions). The rlocs package contains a couple of simple functions to return the strings "F" or "G" or the couple "X" or "Y", to differentiate the luts/ffs inside a single slice. Read the constraints guide about RLOC, RLOC_ORIGIN and the different kinds of sets you can create. And the RPM techxcluvise, of course. If you prefer the placer to select the absolute positioning of the RPM, then that's all you need. If you want total control, then you can select the RPM position attaching an RLOC_ORIGIN to the U_SET name in the UCF file. I've successfully used this entity on the virtex2 architecture & XST. Don't know how to tell Synplify Pro to attach those attributes, but it shouldn't be that difficult. The drawback is your design is no longer portable. You're stuck with Xilinx parts that use the XY coordinate system (not all of them). But you can create different versions for different architectures, of course. Best regards Francisco Rodriguez
Reply by ●November 20, 20032003-11-20
> Ken, > Have you read the constraints guide in the Xilinx software manuals?Look> for the RLOC section. You end up with stuff in your UCF like :- > > INST "*un6_burp_cry_0" RLOC = "X6Y4"; > INST "*un6_burp_cry_1" RLOC = "X6Y4"; > INST "*un6_burp_cry_2" RLOC = "X6Y5"; > INST "*un6_burp_cry_3" RLOC = "X6Y5"; > INST "*un6_burp_cry_4" RLOC = "X6Y8"; > etc... > > I used the floorplanner to get the names of things I want to RLOC. For > your problem, you could place the carry chain with floorplanner and sendthe> output to a temporary UCF to give you a start on your RLOC stuff. Hopethat> makes sense! Read about H_SETs, HU_SETs and U_SETs too. > good luck, Syms.Syms, Thanks for the reply, I am familiar with the Xilinx contraints guide but I would like to put the constraint in the VHDL rather than the ucf and I do not want to make it Xilinx specific. An adder is such a simple thing and the device has specific wires to implement it quickly - surely there must be a way to inform the tools to use the carry chain in one column only for max speed? Cheers, Ken
Reply by ●November 20, 20032003-11-20
Ken wrote:> > I am familiar with the Xilinx contraints guide but I would like to put the > constraint in the VHDL rather than the ucf and I do not want to make it > Xilinx specific.If you are using RLOC's, aren't you making it Xilinx specific? Not only that, are RLOC's guaranteed to even be the same from one Xilinx family to another Xilinx family?> An adder is such a simple thing and the device has specific wires to > implement it quickly - surely there must be a way to inform the tools to use > the carry chain in one column only for max speed?I'm sure you've already thought of this, but can you not break the adder up? Good luck, Marc
Reply by ●November 20, 20032003-11-20
> If you are using RLOC's, aren't you making it Xilinx specific? > > Not only that, are RLOC's guaranteed to even be the same from one Xilinx > family to another Xilinx family?I would rather not use RLOCs - I just want to inform the tools that using the carry chain in one column is more important than any fancy optimisations that save a few slices but cause the fast carry chain to broken.> > > An adder is such a simple thing and the device has specific wires to > > implement it quickly - surely there must be a way to inform the tools touse> > the carry chain in one column only for max speed? > > I'm sure you've already thought of this, but can you not break the adder > up?Quite possibly but that would be a pain in the neck. I just don't see why this should be difficult. Cheers, Ken
Reply by ●November 20, 20032003-11-20
Francisco, Many thanks for your detailed response and the code. If I go down the road of abandoning trying to get synthesis to accomplish this then I will certainly be referring to your implementation. Cheers, Ken> > Ken, > > Have you read the constraints guide in the Xilinx software manuals? > Look > > for the RLOC section. You end up with stuff in your UCF like :- > > > > INST "*un6_burp_cry_0" RLOC = "X6Y4"; > > INST "*un6_burp_cry_1" RLOC = "X6Y4"; > > INST "*un6_burp_cry_2" RLOC = "X6Y5"; > > INST "*un6_burp_cry_3" RLOC = "X6Y5"; > > INST "*un6_burp_cry_4" RLOC = "X6Y8"; > > etc... > > > > I used the floorplanner to get the names of things I want to RLOC.For> > your problem, you could place the carry chain with floorplanner and send > the > > output to a temporary UCF to give you a start on your RLOC stuff. Hope > that > > makes sense! Read about H_SETs, HU_SETs and U_SETs too. > > good luck, Syms. > > > > I prefer the way used in one of Xilinx's TechXclusives to embed RLOC > attributes > directly in VHDL (Relationally Placed Macros). Here's an example of a RPMto> perform a registered a + b, using > the carry chain using the U_SET attribute. > > > -- begin VHDL code > library ieee; > use ieee.std_logic_1164.all; > library unisim; > use unisim.vcomponents.all; > use work.rlocs.all; > > entity a_plus_b_reg is > generic (width: integer := 32; setn: integer := 1); > port ( > clock : IN std_logic; > enable : IN std_logic; > a : IN std_logic_vector (width-1 downto 0); > b : IN std_logic_vector (width-1 downto 0); > q : OUT std_logic_vector (width-1 downto 0) > ); > end a_plus_b_reg; > > architecture rpm_arch of a_plus_b_reg is > > attribute INIT: string; > attribute BEL: string; > attribute RLOC: string; > attribute U_SET: string; > > signal prexor_int_q: std_logic_vector (width-1 downto 0); > signal int_carry: std_logic_vector (width-1 downto 0); > signal y: std_logic_vector (width-1 downto 0); > > begin > > int_carry(0) <= '0'; > > reg: for i in 0 to width-1 generate > attribute U_SET of q_reg: label is "uset" & integer'image(setn); > attribute RLOC of q_reg: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute BEL of q_reg: label is "FF" & belname_xy(i); > begin > q_reg: FDE port map ( > D => y(i), CE => enable, C => clock, > Q => q(i)); > end generate; > > gena: for i in 0 to width-2 generate > attribute INIT of q_lut: label is "6"; > attribute U_SET of q_lut: label is "uset" & integer'image(setn); > attribute U_SET of q_mxy: label is "uset" & integer'image(setn); > attribute U_SET of q_xor: label is "uset" & integer'image(setn); > attribute RLOC of q_lut: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute RLOC of q_mxy: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute RLOC of q_xor: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute BEL of q_lut: label is belname_fg(i); > attribute BEL of q_xor: label is "XOR" & belname_fg(i); > begin > q_lut: LUT2 > --synthesis off > generic map (INIT => x"6") > --synthesis on > port map ( > I1 => b(i), I0 => a(i), > O => prexor_int_q(i) ); > q_mxy: MUXCY port map ( > DI => a(i), CI => int_carry(i), S => prexor_int_q(i), > O => int_carry(i+1) ); > q_xor: XORCY port map ( > LI => prexor_int_q(i), CI => int_carry(i), > O => y(i) ); > end generate; > > genb: for i in width-1 to width-1 generate > attribute INIT of q_lut: label is "6"; > attribute U_SET of q_lut: label is "uset" & integer'image(setn); > attribute U_SET of q_xor: label is "uset" & integer'image(setn); > attribute RLOC of q_lut: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute RLOC of q_xor: label is "X0" & "Y" & > integer'image(integer(i/2)); > attribute BEL of q_lut: label is belname_fg(i); > attribute BEL of q_xor: label is "XOR" & belname_fg(i); > begin > q_lut: LUT2 > --synthesis off > generic map (INIT => x"6") > --synthesis on > port map ( > I1 => b(i), I0 => a(i), > O => prexor_int_q(i) ); > q_xor: XORCY port map ( > LI => prexor_int_q(i), CI => int_carry(i), > O => y(i) ); > end generate; > > end rpm_arch; > -- end VHDL code > > > The resulting RPM is a column of 1 x w/2 slices, being w the valueassigned> to the width the generic > The setn generic lets you create different U_SET names for different > instances of the entity (if the instances > have no relative positions) or the same U_SET name and applying different > RLOCs to each instance > (if the instances have relative positions). > > The rlocs package contains a couple of simple functions to return the > strings "F" or "G" > or the couple "X" or "Y", to differentiate the luts/ffs inside a single > slice. Read the constraints guide > about RLOC, RLOC_ORIGIN and the different kinds of sets you can create.And> the RPM techxcluvise, > of course. > > If you prefer the placer to select the absolute positioning of the RPM,then> that's all you need. > If you want total control, then you can select the RPM position attachingan> RLOC_ORIGIN > to the U_SET name in the UCF file. > > I've successfully used this entity on the virtex2 architecture & XST.Don't> know how to tell > Synplify Pro to attach those attributes, but it shouldn't be thatdifficult.> > The drawback is your design is no longer portable. You're stuck withXilinx> parts that use the XY > coordinate system (not all of them). But you can create different versions > for different architectures, of course. > > Best regards > > Francisco Rodriguez > > > > >
Reply by ●November 20, 20032003-11-20
Ken wrote:>>If you are using RLOC's, aren't you making it Xilinx specific? >> >>Not only that, are RLOC's guaranteed to even be the same from one Xilinx >>family to another Xilinx family? > > > I would rather not use RLOCs - I just want to inform the tools that using > the carry chain in one column is more important than any fancy optimisations > that save a few slices but cause the fast carry chain to broken.I agree - if the FPGA supports it, there is no reason the synthesis tool shouldn't. I'd talk with the synthesis vendors about it if I were you. Synplicity seems quite responsive. Or perhaps you could get the synthesis tool to do what you are wanting by placing a tiny period constraint on that portion of the design, thereby forcing the tool will do everything in its power to make it absolutely as fast as possible. Marc
Reply by ●November 20, 20032003-11-20
Marc,> I agree - if the FPGA supports it, there is no reason the synthesis tool > shouldn't. I'd talk with the synthesis vendors about it if I were you. > Synplicity seems quite responsive.I have emailed Synplicity support - they have been very good in the past and I expect they will be on this too.> Or perhaps you could get the synthesis tool to do what you are wanting > by placing a tiny period constraint on that portion of the design, > thereby forcing the tool will do everything in its power to make it > absolutely as fast as possible.Probrably could - but, the problem would then fall to another adder that is 1 microsecond behind the one just fixed. In a design with many adders, I think global control is needed to force use of the carry chains in one column. Cheers, Ken
Reply by ●November 20, 20032003-11-20
Ken wrote:> > An adder is such a simple thing and the device has specific wires > to implement it quickly - surely there must be a way to inform the > tools to use the carry chain in one column only for max speed? >If you don't want to RLOC the primitives, perhaps the next best thing to try is to put a syn_keep attribute on the input operand signals of the adder; if it is in fact a logic optimization that is causing an irregularity which breaks the carry chain placement, that will usually put a stop to it. If one of the operands is a constant, that can often cause this sort of problem; you'll need to assign the constant to a signal having a syn_keep rather than placing the syn_keep on the constant itself. (at least you used to need to do that, I haven't used Synplify since last year) If this is a counter, also note that Synplify has some hardcoded internal thresholds below which it will implement random logic instead of carry chain logic, which can cause similar problems for short counters. Brian





