FPGARelated.com
Forums

unstable fpga design

Started by Moti Cohen October 20, 2004
Hello all,

first a little background ..
I'm currently working on a fpga design (using VHDL) and using 
the Xilinx spartan IIE fpga (xc2s400e) chip.
my design size is about 1300 slices (about third of the chip
capacity).
My problem is as follow :
sometimes when I change my top design and then re-synthisizes it, some
"parts"  of my code is not working properly i.e. - some of my fpga
blocks are working as usuall and some dont (e.g. FSMs).
this appens not only for large code changes, sometimes it happens when
I "just" change an output pin to be '0' instead of '1' (a very minor
change).
my static timing analisys looks o.k. (at least the paths that i've
constrained). and I realy dont know where to start looking.
I checked my design over and over for "bad code" parts but didnt found
anything that might explain this.

I would realy like to know if some of you have expeienesd something
similar in the past and if not maybe someone can give me a tip to
start with..

thanks in advance, Moti.
"Moti Cohen" <moti@terasync.net> wrote in message 
news:c04bfe33.0410200517.391ab8d9@posting.google.com...
> Hello all, > > first a little background .. > I'm currently working on a fpga design (using VHDL) and using > the Xilinx spartan IIE fpga (xc2s400e) chip. > my design size is about 1300 slices (about third of the chip > capacity). > My problem is as follow : > sometimes when I change my top design and then re-synthisizes it, some > "parts" of my code is not working properly i.e. - some of my fpga > blocks are working as usuall and some dont (e.g. FSMs). > this appens not only for large code changes, sometimes it happens when > I "just" change an output pin to be '0' instead of '1' (a very minor > change). > my static timing analisys looks o.k. (at least the paths that i've > constrained). and I realy dont know where to start looking. > I checked my design over and over for "bad code" parts but didnt found > anything that might explain this. > > I would realy like to know if some of you have expeienesd something > similar in the past and if not maybe someone can give me a tip to > start with.. > > thanks in advance, Moti.
Well, off my head I can't help you what specific problem you may face but in general, if you have access to more than one syntesize tool, try to compile to code with it too and see if you see a difference. If you code is OK, the both tools should give similar results (apart from timing) but if there is a small bug, sometimes usinge a second tool which may map the design differently, can help you to isolate the problem. I assume you use XST, so you can try Synplify just as an example. If you don't have access to it, you can ask for a 30 days evaluation version (which is full featured) and try your desing with it. Regards Arash
"Moti Cohen" <moti@terasync.net> wrote in message
news:c04bfe33.0410200517.391ab8d9@posting.google.com...

> my static timing analisys looks o.k. (at least the paths that i've > constrained). and I realy dont know where to start looking. > I checked my design over and over for "bad code" parts but didnt found > anything that might explain this.
What is your background is FPGA design? This might be teaching you to suck eggs, but.... Ideally you want your whole design to be written synchronously off one clock. The only constraints you will then need are 1 for the clock and for input/output timing. If you have ripple/gated clocks the timing analysis tools can't do a proper analysis on what's actually going to happen and you're likely to have a lot of clock races etc. Nial. ------------------------------------------------ Nial Stewart Developments Ltd FPGA and High Speed Digital Design Cyclone Based 'Easy PCI' proto board www.nialstewartdevelopments.co.uk
In my experience, FSM's generally stop working when there
are asynchronous inputs.  Note that in XST you need to try
REALLY hard to end up with FSM's synthesized any way other than one-hot.
When an asynchronous input affects the next state decision in
the FSM, you can go to "zero-hot" or "more-than-one-hot" on a
state after the async input doesn't meet setup and hold requirements.
This can also happen if the FSM comes out of reset asynchronously.
There was a recent thread about this...

moti@terasync.net (Moti Cohen) wrote in message news:<c04bfe33.0410200517.391ab8d9@posting.google.com>...
> Hello all, > > first a little background .. > I'm currently working on a fpga design (using VHDL) and using > the Xilinx spartan IIE fpga (xc2s400e) chip. > my design size is about 1300 slices (about third of the chip > capacity). > My problem is as follow : > sometimes when I change my top design and then re-synthisizes it, some > "parts" of my code is not working properly i.e. - some of my fpga > blocks are working as usuall and some dont (e.g. FSMs). > this appens not only for large code changes, sometimes it happens when > I "just" change an output pin to be '0' instead of '1' (a very minor > change). > my static timing analisys looks o.k. (at least the paths that i've > constrained). and I realy dont know where to start looking. > I checked my design over and over for "bad code" parts but didnt found > anything that might explain this. > > I would realy like to know if some of you have expeienesd something > similar in the past and if not maybe someone can give me a tip to > start with.. > > thanks in advance, Moti.
Nial Stewart wrote:
> > "Moti Cohen" <moti@terasync.net> wrote in message > news:c04bfe33.0410200517.391ab8d9@posting.google.com... > > > my static timing analisys looks o.k. (at least the paths that i've > > constrained). and I realy dont know where to start looking. > > I checked my design over and over for "bad code" parts but didnt found > > anything that might explain this. > > What is your background is FPGA design? This might be teaching you to > suck eggs, but.... > > Ideally you want your whole design to be written synchronously > off one clock. The only constraints you will then need are > 1 for the clock and for input/output timing. > > If you have ripple/gated clocks the timing analysis tools can't do a > proper analysis on what's actually going to happen and you're likely > to have a lot of clock races etc.
I am not sure what you mean about the clocks. A gated clock, as in "enabled", is still easy to analyze timing. This becomes multicycle which I do all the time. I am not sure what a ripple clock is, but when you use multiple clocks, you can specify the timing relationship between them and the tool will do all the work. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
>I am not sure what you mean about the clocks. A gated clock, as in >"enabled", is still easy to analyze timing. This becomes multicycle >which I do all the time. I am not sure what a ripple clock is, but when >you use multiple clocks, you can specify the timing relationship between >them and the tool will do all the work.
I think "gated clock" in the ASIC world means a clock made by ANDing the enable with the clock signal. Just like we used to do in the old TTL days when only a few chips had enable pins. (Worked OK as long as you ran your normal/main clock through a dummy gate to keep the clock skew reasonably close.) They have the advantage of saving the power on the rest of the clock distribution chain when the clock isn't enabled. "Ripple clock" probably refers to ripple counters. You build a counter out of toggle FFs. The 0=>1 transition of bit N clocks bit N+1. I'd guess it is small but I don't know if it's used much. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
In no particular order:

1) Start with the synthesis log files.  It can be easy to overlook the
information presented there - particularly if it is listed as a
"WARNING" as opposed to an "ERROR". Look carefully at all the
WARNINGs, sometimes there is information about clock, latch and
asynchronous inference that you may explain why you are getting
behavior you did not intend.

2) Scan through your code and look for reset circuits in *all* of your
registers.  Did you intend to reset all flops?  Did you actually do
that?  If you did not, sometimes the powerup state of a flop can
change when re-building the design.

3) Simulate, simulate, simulate.  Have you verified this design in
logic simulation?  Does your simulation "sky-wire" into the design to
force internal nodes?  If so, your lab bahavior will be different.

4) Give us more details about the design style you use.  I think the
suggestion to make the entire design synchronous is a very good one. 
Gating clocks and / or generating clocks from internal registers can
cause mismatches between simulation and synthesized designs.

Good luck,
Chris


moti@terasync.net (Moti Cohen) wrote in message news:<c04bfe33.0410200517.391ab8d9@posting.google.com>...
> Hello all, > > first a little background .. > I'm currently working on a fpga design (using VHDL) and using > the Xilinx spartan IIE fpga (xc2s400e) chip. > my design size is about 1300 slices (about third of the chip > capacity). > My problem is as follow : > sometimes when I change my top design and then re-synthisizes it, some > "parts" of my code is not working properly i.e. - some of my fpga > blocks are working as usuall and some dont (e.g. FSMs). > this appens not only for large code changes, sometimes it happens when > I "just" change an output pin to be '0' instead of '1' (a very minor > change). > my static timing analisys looks o.k. (at least the paths that i've > constrained). and I realy dont know where to start looking. > I checked my design over and over for "bad code" parts but didnt found > anything that might explain this. > > I would realy like to know if some of you have expeienesd something > similar in the past and if not maybe someone can give me a tip to > start with.. > > thanks in advance, Moti.
You're probably on the edge with the timing. The reason it works differently when you re-synthesize with only minor changes is because the place and route software starts with a random seed. You could actually make no changes at all and it would work one time and not work the next (though even if it "works" it would probably break under hot/low power conditions).

You might want to try a couple things -- take a heat gun set on low (or a hair dryer) and heat up the chip a little bit. Does the problem get worse? Spray it with cold spray -- does it get better? Also, raise the internal power supply voltage SLIGHTLY (like 1.5 volts to 1.55). Does the problem get better? If you lower the PS a little bit (like 1.45 volts) does it crap out? The silicon will run a little faster when it's cold and also when its voltage is a little high, so if your timing is on the edge then either of these factors might affect your circuit. If you've got a race condition, however, things might actually get worse when you cool the chip down or increase voltage.

Ideally, you want your FPGA design to be stable even if it's got to suffer with both bad conditions -- hot AND low voltage.
On Thu, 21 Oct 2004 16:21:58 -0700,  wrote:

> You're probably on the edge with the timing. The reason it works > differently when you re-synthesize with only minor changes is because > the place and route software starts with a random seed. You could > actually make no changes at all and it would work one time and not work > the next (though even if it "works" it would probably break under > hot/low power conditions). >
Agreed - this is a likely problem. First thing to do is to perform a static timing analysis post-P&R on the entire design. Doing just some of the paths is not adequate. Hopefully you are not using gated clocks (where the main clock goes through a gate before it hits the clock pin of some of the flip-flops), ripple clocks (where the output of one f/f is used as the clock on another flip-flop) or asynchronous logic (typically placing gates on the set or reset pins of flip-flops). If you are using any of these techniques, pay extremely close attention to the timing, and try to avoid these techniques if you can. It should be possible to avoid using these techniques, as they are typically used in an attempt to reduce the size of a design or to maximize the clock rate. If you have multiple clocks in the design, you also have to take a close look at all of the signals that cross the boundary between clock domains. The issue is not so much metastability (unless you are, perhaps, really pushing the clock rates) but the handling of signals that must be processed as a coherent set. For example, if you have a binary counter that is incremented on one clock (clock A, say) but latched (registered) by another clock (call it clock B), it is very likely that the signals will be mis-read. Say the counter changes from 0111 to 1000 on a particular edge of clock A; clock B may sample some of the bits before they change, and some of the bits after they have changed (due to slightly different clock-to-output delays on the counter flops, to routing delays that vary from signal to signal, and to slightly different setup times on the sampling flops). In principle, the sampling register could contain any value at all. You have to be very careful in your design to prevent this type of problem when dealing with multiple clock domains, especially if the clocks are asynchronous with respect to each other.
> You might want to try a couple things -- take a heat gun set on low (or > a hair dryer) and heat up the chip a little bit. Does the problem get > worse? Spray it with cold spray -- does it get better? Also, raise the > internal power supply voltage SLIGHTLY (like 1.5 volts to 1.55). Does > the problem get better? If you lower the PS a little bit (like 1.45 > volts) does it crap out? The silicon will run a little faster when it's > cold and also when its voltage is a little high, so if your timing is on > the edge then either of these factors might affect your circuit. If > you've got a race condition, however, things might actually get worse > when you cool the chip down or increase voltage. >
This is useful for verifying the presence of further problems after checking the things mentioned above, but as a go-nogo test it is not particularly helpful in isolating and fixing the problem. These tests (as well as running with both a fast clock and a slow clock) are useful for determining if you have a problem(s), but provide little insight in fixing them.
> Ideally, you want your FPGA design to be stable even if it's got
to
> suffer with both bad conditions -- hot AND low voltage.
-- Phil
Hi all, 
firstly I would like to thank all of you guys for answering..

here are some more facts, before posting my questions I've made the following..
(with no sucsess).

Multi clocks - my design is indeed using few clock sources but each of them is 
driving a speperate block (i.e. only one clock is driving the data in/out of a 
single FSM).

Async reset:
before I've encountered the problem I used an async reset such as

process (resetn,clk) 
begin
	if resetn ='0' then
	...
	elsif rising_edge(clk) then
	...

but when the unstabilty problem has rised I immidiatly changed all my design to 

work on a sync reset such as 

process (resetn ,clk)
begin
	if rising_edge(clk) then
		if resetn ='0' then
		....
		else
		...

but it did not helped me ..

undesired Latch inferences: 
I also searched my entire design for warnings about latch inferences or some 
other warning that might indicate regarding an un-intentioned logic 
implementation (I found one warning regarding a latch, fixed it but the problem 
didnt "died").

Gated clocks - I dont think that I'm using them but I would be very happy if 
someone will give me a VHDL code example that will cause a gated clock. so I 
can be sure what you guys ment by "gated clock".

Simulation - I'm not fimiliar with the term "sky-wire" (mentioned by chris) 
maybe someone can explain what's the meaning of it.

FSM encoding: I also changed my FSMs encoding to GRAY instead of "one hot" 
beacuse from ny past experinse in some cases it helps (but it didnt help).

synthesis - Chris mentioned that even the same code could be synthesized 
differntly on each synthesis. In my case it's no so! I use Source safe for 
version control and save a version of every "good" synthesis and I saw that 
whenever I synthesize a code that has worked before it's always continuing to 
work.

In the past when I encounterd such problems I used the ChipScope LA to find & 
debug them but now the problem is moving from block to block and the chipscope 
itself when used is also changing the logic (using the device LUTs and RAM 
resources) so I can't  realy use it.

I didnt tried yet to run the entire design on a lower frequency rate, that was 
a good sugesstion and I will try it. I will also try to "play" a little with 
power supply and with the temperature..

I was wondering if something in my syntheis/MAP/P&R configuration is wrong 
maybe you can throw me few tips on this subject too (I'm using Xilinx project 
navigator 6.1 with XST).

again, lots of thanks.
Regards, Moti.