FPGARelated.com
Forums

Optimizing an inferred counter

Started by Marty Ryba March 19, 2008
Hello everyone,

    After banging our heads for last few weeks (sometimes literally), I 
figure I'll query the group of experts here. We have a design that is 
functionally correct (ModelSim test bench) but it appears to be very iffy 
when it gets on the real chip. I have a couple copies of "identical" boards 
with Virtex2-1000 chips on them. I'll check again soon, but I believe they 
are -6 parts. We've been synthesizing the design in Synplify Pro (v7.5 
though others are available; this design has some history of working fine). 
Sometimes it works on one or more boards, other times after I load it (and 
verify) with iMPACT this counter acts screwy (it messes up critical timing, 
and it also looks all wrong on Chipscope). Today I got brave enough to load 
it into the EEPROM; it worked this afternoon but who knows tomorrow (grrr). 
Looking at the Synplify timing report (with -4 speed setting in Synplify), 
the timing is marginal for a specified clock of 100 MHz around this path, 
but the chip is really running at 66 MHz (PCI clock). The key code is very 
simple (some syntax may be a bit off since I'm doing it from memory). We 
tried trimming the size of the counter from 32 bits down to 20 and it seems 
to help some.

signal my_counter : std_logic_vector(COUNT_WIDTH-1 downto 0);

countdown_process: process (CLK)
begin
  if rising_edge(CLK) then        -- do everything synchronous
    if RESET = '1' then
       my_counter <= ZEROIZE_COUNT;
    elsif counter_load = '1' then
      my_counter <= input_bus(COUNT_WIDTH-1 downto 0);
    elsif making_data = '1' then
      my_counter <= my_counter - '1';
    endif
  endif  -- CLK
end -- process

Another process block checks that this counter is nonzero and a few other 
requirements to set the value of making_data true or false.

Ideas/suggestions? My main FPGA engineer has been working with the local 
Xilinx FAE but no "Eureka!" moments yet. A very similar (if anything, 
somewhat larger) design has never shown this trouble on the same boards. The 
current design takes about 45% of the LUTs.

I notice the RTL shows this using an adder (it fans out making_data to 
COUNT_WIDTH bits so that it becomes either the 0 or -1 to add). Isn't there 
a simpler structure to define a count(down) counter? I sure can buy a simple 
counter in 74xx series logic. I notice neither arith_std or numeric_std 
define special operators (a la C's ++ and -- operators) to specify 
increment/decrement, so maybe there is no simple way to create a counter 
structure in fewer FPGA logic elements. Seems odd to me (non-EE).

Thanks in advance for your sagacity.

-Dr. Marty Ryba
Mad GNSS scientist


Marty Ryba wrote:

> Another process block checks that this counter is nonzero and a few other > requirements to set the value of making_data true or false. > > Ideas/suggestions?
Combine the two processes into one. -- Mike Treseler
Marty Ryba wrote:
> countdown_process: process (CLK) > begin > if rising_edge(CLK) then -- do everything synchronous > if RESET = '1' then > my_counter <= ZEROIZE_COUNT; > elsif counter_load = '1' then > my_counter <= input_bus(COUNT_WIDTH-1 downto 0); > elsif making_data = '1' then > my_counter <= my_counter - '1'; > endif > endif -- CLK > end -- process > > Another process block checks that this counter is nonzero and a few other > requirements to set the value of making_data true or false.
I've been banging my head as well trying to improve poor legacy code to pass timing at 250Mhz, for the last month. Based on this experience, I'd suggest registering *everything*... well, as much as possible. Make sure your making_data is a flop, 'cos if it is combinatorial and based on my_counter, that's a recipe for failure. HTH, -P@
Marty Ryba wrote:

> Hello everyone, > > After banging our heads for last few weeks (sometimes literally), I > figure I'll query the group of experts here. We have a design that is > functionally correct (ModelSim test bench) but it appears to be very iffy > when it gets on the real chip. I have a couple copies of "identical" boards > with Virtex2-1000 chips on them. I'll check again soon, but I believe they > are -6 parts. We've been synthesizing the design in Synplify Pro (v7.5 > though others are available; this design has some history of working fine). > Sometimes it works on one or more boards, other times after I load it (and > verify) with iMPACT this counter acts screwy (it messes up critical timing, > and it also looks all wrong on Chipscope). Today I got brave enough to load > it into the EEPROM; it worked this afternoon but who knows tomorrow (grrr). > Looking at the Synplify timing report (with -4 speed setting in Synplify), > the timing is marginal for a specified clock of 100 MHz around this path, > but the chip is really running at 66 MHz (PCI clock). The key code is very > simple (some syntax may be a bit off since I'm doing it from memory). We > tried trimming the size of the counter from 32 bits down to 20 and it seems > to help some.
When you have symptoms like this, that suggest the real limit is lower than the tools report, have you tried variable clocking speeds, to check if at 10MHz or 1MHz, it DOES work properly ?
> > I notice the RTL shows this using an adder (it fans out making_data to > COUNT_WIDTH bits so that it becomes either the 0 or -1 to add). Isn't there > a simpler structure to define a count(down) counter? I sure can buy a simple > counter in 74xx series logic. I notice neither arith_std or numeric_std > define special operators (a la C's ++ and -- operators) to specify > increment/decrement, so maybe there is no simple way to create a counter > structure in fewer FPGA logic elements. Seems odd to me (non-EE).
In a counter, you usually need to 'see' the state of the lower bits to decide when to toggle the upper bit - and in a FPGA the carry chain is often faster than other paths, so that makes adders a natural counter solution. Certainly easy to write. For long counters, the carry pathway can limit the speed, then you can split it and make it more complex, but faster. Look at 74161 for a faster carry scheme. -jg
 I notice neither arith_std or numeric_std
> define special operators (a la C's ++ and -- operators) to specify > increment/decrement, so maybe there is no simple way to create a counter > structure in fewer FPGA logic elements. Seems odd to me (non-EE). >
Thats because ++ -- are nothing special, it still requires an adder with the the 1st input as the registered output of the adder, and the 2nd tied to +-1. An FPGA is just an array of LUTs, flip-flops and RAMs, not alot more (some FPGAs may have dedicated multipliers too). As for the "making_data" becoming the 2nd adder input, Im surprised. It might be better if you can try and force it to synthesize "making_data" as the adder's register enable rather than the 2nd adder input, and then you can keep the 2nd adder input as a constant -1. As to how to do this, Im not sure. How about changing "my_counter" into an unsigned instead (or signed, makes no difference) using the numeric_std package (implementation is IEEE defined) instead of the std_logic_arith package (implementation is Vendor defined, and non- standard).
"Marty Ryba" <martin.ryba.nospam@verizon.net> writes:

> Hello everyone, > > After banging our heads for last few weeks (sometimes literally), I > figure I'll query the group of experts here. We have a design that is > functionally correct (ModelSim test bench) but it appears to be very iffy > when it gets on the real chip. I have a couple copies of "identical" boards > with Virtex2-1000 chips on them. I'll check again soon, but I believe they > are -6 parts. We've been synthesizing the design in Synplify Pro (v7.5 > though others are available; this design has some history of working fine). > Sometimes it works on one or more boards, other times after I load it (and > verify) with iMPACT this counter acts screwy (it messes up critical timing, > and it also looks all wrong on Chipscope). Today I got brave enough to load > it into the EEPROM; it worked this afternoon but who knows tomorrow (grrr). > Looking at the Synplify timing report (with -4 speed setting in Synplify), > the timing is marginal for a specified clock of 100 MHz around this path, > but the chip is really running at 66 MHz (PCI clock).
What does the Xilinx timing report say? Have you constrained the clock correctly (or indeed at all :-)? Synplify's report is an educated guess on the part of the tools. Xilinx's represents what they think the absolute worst-case is, so if it thinks you meet your timing constraints, then any chip you get will run that design. Of course, that depends on your constraints being right :-) You say this design has a history of working fine - what's changed since then? Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.net/electronics.html
On 19 Mar, 04:13, "Marty Ryba" <martin.ryba.nos...@verizon.net> wrote:
> Hello everyone, > > =A0 =A0 After banging our heads for last few weeks (sometimes literally), =
I
> figure I'll query the group of experts here. We have a design that is > functionally correct (ModelSim test bench) but it appears to be very iffy > when it gets on the real chip. I have a couple copies of "identical" board=
s
> with Virtex2-1000 chips on them. I'll check again soon, but I believe they=
> are -6 parts. We've been synthesizing the design in Synplify Pro (v7.5 > though others are available; this design has some history of working fine)=
.
> Sometimes it works on one or more boards, other times after I load it (and=
> verify) with iMPACT this counter acts screwy (it messes up critical timing=
,
> and it also looks all wrong on Chipscope). Today I got brave enough to loa=
d
> it into the EEPROM; it worked this afternoon but who knows tomorrow (grrr)=
.
> Looking at the Synplify timing report (with -4 speed setting in Synplify),=
> the timing is marginal for a specified clock of 100 MHz around this path, > but the chip is really running at 66 MHz (PCI clock). The key code is very=
> simple (some syntax may be a bit off since I'm doing it from memory). We > tried trimming the size of the counter from 32 bits down to 20 and it seem=
s
> to help some. > > signal my_counter : std_logic_vector(COUNT_WIDTH-1 downto 0); > > countdown_process: process (CLK) > begin > =A0 if rising_edge(CLK) then =A0 =A0 =A0 =A0-- do everything synchronous > =A0 =A0 if RESET =3D '1' then > =A0 =A0 =A0 =A0my_counter <=3D ZEROIZE_COUNT; > =A0 =A0 elsif counter_load =3D '1' then > =A0 =A0 =A0 my_counter <=3D input_bus(COUNT_WIDTH-1 downto 0); > =A0 =A0 elsif making_data =3D '1' then > =A0 =A0 =A0 my_counter <=3D my_counter - '1'; > =A0 =A0 endif > =A0 endif =A0-- CLK > end -- process > > Another process block checks that this counter is nonzero and a few other > requirements to set the value of making_data true or false. > > Ideas/suggestions? My main FPGA engineer has been working with the local > Xilinx FAE but no "Eureka!" moments yet. A very similar (if anything, > somewhat larger) design has never shown this trouble on the same boards. T=
he
> current design takes about 45% of the LUTs. > > I notice the RTL shows this using an adder (it fans out making_data to > COUNT_WIDTH bits so that it becomes either the 0 or -1 to add). Isn't ther=
e
> a simpler structure to define a count(down) counter? I sure can buy a simp=
le
> counter in 74xx series logic. I notice neither arith_std or numeric_std > define special operators (a la C's ++ and -- operators) to specify > increment/decrement, so maybe there is no simple way to create a counter > structure in fewer FPGA logic elements. Seems odd to me (non-EE). > > Thanks in advance for your sagacity. > > -Dr. Marty Ryba > Mad GNSS scientist
Just some ideas: Are all control signals synchronous to "CLK" ? Is the clock "clean"? What about supply voltage (DC-level, ripple, decoupling etc)? Could it be a board layout problem (insufficient ground plane, crosstalk)? If the long carry chain is the problem, you may divide the counter into 2 smaller counters with a pipelined carry chain. /Peter
"Martin Thompson" <martin.j.thompson@trw.com> wrote in message 
news:u4pb3m2ti.fsf@trw.com...
> "Marty Ryba" <martin.ryba.nospam@verizon.net> writes: > > > What does the Xilinx timing report say? Have you constrained the > clock correctly (or indeed at all :-)? >
^^^ What he said. Syms.
On Mar 18, 11:13=A0pm, "Marty Ryba" <martin.ryba.nos...@verizon.net>
wrote:
> Hello everyone, > > =A0 =A0 After banging our heads for last few weeks (sometimes literally), =
I
> figure I'll query the group of experts here. We have a design that is > functionally correct (ModelSim test bench) but it appears to be very iffy > when it gets on the real chip. I have a couple copies of "identical" board=
s
> with Virtex2-1000 chips on them. I'll check again soon, but I believe they=
> are -6 parts. We've been synthesizing the design in Synplify Pro (v7.5 > though others are available; this design has some history of working fine)=
. Which design are you referring to here that has some history of working fine? The PCB design or the FPGA design?
> Sometimes it works on one or more boards, other times after I load it (and=
> verify) with iMPACT this counter acts screwy (it messes up critical timing=
,
> and it also looks all wrong on Chipscope).
You might want to clarify what you mean by 'messes up critical timing' and 'looks all wrong'. I'm assuming here that the counter starts off correctly and just doesn't decrement properly which would lead one to suspecting problems related in some way to the signal 'making_data' but again you should clarify this. In any case, the problem is one of the following (not in any particular order): 1. Inadequate power supply. Check the Vcc at the chip with a high speed scope and good probing techniques, make sure that you're within spec. If only 'slightly' out that's not likely the cause of your symptoms but is still something that needs to be addressed. 2. Timing. Are the signals 'RESET', 'counter_load' and 'making_data' all synchronized to 'CLK'? As I said, I'm not sure which symptoms you're exactly seeing but I'm guessing that it resets and initializes properly it's just not counting correctly in which case 'making_data' is the likely culprit. 3. More timing. There is more to timing than just clock frequency. There are also setup/hold time requirements. Do 'counter_load' or 'making_data' come from external I/O pins? If so, then - Do the signals on the board meet the timing requirements that you specified? - You did specify a timing requirement on the inputs? - Did the computed setup time from the P&R timing report (not Synplify's estimated timing) meet all requirements? 4. Yet more timing. 'CLK' isn't a gated clock is it? 5. Clock signal quality. Put a scope on the input clock. Is it absolutely monotonic through the entire Vih voltage range? Both edges? No dips and bounces anywhere between Vih(min) and Vih(max)? Go through the above checklist and I'm fairly confident that you'll find the cause.
> We > tried trimming the size of the counter from 32 bits down to 20 and it seem=
s
> to help some. >
This is a symptom of failing timing, see items #2, 3 and 4 or double clocking , see item #5 above.
> > Another process block checks that this counter is nonzero and a few other > requirements to set the value of making_data true or false. >
This is a process block that is clocked by 'CLK' I presume? How about the inputs into that process block? The same sort of timing considerations mentioned previously apply here as well. Violating timing may cause 'making_data' to miss or double hit occasionally.
> > I notice the RTL shows this using an adder (it fans out making_data to > COUNT_WIDTH bits so that it becomes either the 0 or -1 to add). Isn't ther=
e
> a simpler structure to define a count(down) counter? I sure can buy a simp=
le
> counter in 74xx series logic. I notice neither arith_std or numeric_std > define special operators (a la C's ++ and -- operators) to specify > increment/decrement, so maybe there is no simple way to create a counter > structure in fewer FPGA logic elements. Seems odd to me (non-EE). >
The RTL viewer is a graphical view of your SOURCE code, it is not a view of the final routed design. Have no fear, the adder that adds -1 and the muxer that selects the final output will get optomized appropriately. Good luck Kevin Jennings
On Wed, 19 Mar 2008 03:13:40 GMT, "Marty Ryba" <martin.ryba.nospam@verizon.net>
wrote:

>Hello everyone, > > After banging our heads for last few weeks (sometimes literally), I >figure I'll query the group of experts here. We have a design that is >functionally correct (ModelSim test bench) but it appears to be very iffy >when it gets on the real chip.
>the timing is marginal for a specified clock of 100 MHz around this path, >but the chip is really running at 66 MHz (PCI clock).
One thought: Are you using anything like a DCM or (since it's a Virtex) DLL to clean up the clock? PCI clock can be stopped, and switched between 33 and 66 MHz, during a PC's boot sequence (I have watched this in a scope). This can confuse a DLL; you may need means to reset it after any such change; or use an alternative (constant frequency) clock. - Brian