FPGARelated.com
Forums

Soft failures (?) 9536XL

Started by Josep Duran January 21, 2004
I have a small circuit using the 9536XL CPLD. The complete machine uses
64 of such circuits. I have tested it on the lab and everything works just
fine.

The problem is the other day, while at the client premises, I saw something
wrong
with one of the boards. The CPLD stopped responding to the commands sent by
the computer. As I had no test equipment available, I just tried to send
some reset
commands to the board and get no response. I  turned power off to change the
board, but just before replacing it I gave it another try. To my surprise,
everything worked fine this time. I did some intensive testing to the board,
and again
everything went OK.

To me, it looks like the CPLD lost its configuration.
Is this at all possible ? If so, what can I do to prevent this from
happening ?
Anybody seen something like this before ?


NB - it is a 2 layer board (no GND plane) about 3 sq inches.


Thank you for your time.

 Josep Duran


The XC9536XL is a Flash-based CPLD.  Different from an SRAM-based FPGA,
you do not reconfigure it just by cycling Vcc. The CPLD would need a
fresh in-system programming operation, which is not automatic nor
happens by accident.
So, what might have happened is that your design got into an illegal
state, out of which it cannot excape, but which did not affect the configuration.

A more far-fetched explanation is based on the fact that a small part of
the Flash-based configuration actually gets transferred into internal
latches (like in an FPGA), which of course might get upset, and that
would be fixed by cycling Vcc.
All CPLD manufacturers use this convenient mechanism, but hardly anybody
talks about it, since it creates the impression of "volatility"...

Peter Alfke
===================================
Josep Duran wrote:
> > I have a small circuit using the 9536XL CPLD. The complete machine uses > 64 of such circuits. I have tested it on the lab and everything works just > fine. > > The problem is the other day, while at the client premises, I saw something > wrong > with one of the boards. The CPLD stopped responding to the commands sent by > the computer. As I had no test equipment available, I just tried to send > some reset > commands to the board and get no response. I turned power off to change the > board, but just before replacing it I gave it another try. To my surprise, > everything worked fine this time. I did some intensive testing to the board, > and again > everything went OK. > > To me, it looks like the CPLD lost its configuration. > Is this at all possible ? If so, what can I do to prevent this from > happening ? > Anybody seen something like this before ? > > NB - it is a 2 layer board (no GND plane) about 3 sq inches. > > Thank you for your time. > > Josep Duran
Peter Alfke wrote:
> The XC9536XL is a Flash-based CPLD. Different from an SRAM-based FPGA, > you do not reconfigure it just by cycling Vcc. The CPLD would need a > fresh in-system programming operation, which is not automatic nor > happens by accident. > So, what might have happened is that your design got into an illegal > state, out of which it cannot excape, but which did not affect the configuration.
Correct - Check if you have any state machines, and what they do from illegal states.
> > A more far-fetched explanation is based on the fact that a small part of > the Flash-based configuration actually gets transferred into internal > latches (like in an FPGA), which of course might get upset, and that > would be fixed by cycling Vcc. > All CPLD manufacturers use this convenient mechanism, but hardly anybody > talks about it, since it creates the impression of "volatility"... > > Peter Alfke
You can sometimes find this hidden in the fine print, in the appx form of "Vcc must be reduced to < 0.9V before being increased again'. Systems that are prone to brown-out and non monotonic Vcc are more risky in this area. If you have the resource room, you can add read-back or similar 'check-it-actually-happened' to the PLD code, and have your system watch for any out-to-lunch behaviour. -jg
Did you analyze your design for nominal conditions
or worst case operating conditions?

I have seen boards fail when the got warm after
being closed up (or after a friend sets their engineering
notebook on top of the box and it heated up a little).

This happened to a board that I was interfacing to.
Problem went away after we drilled larger vent
holes in the product.  I guess that is what you
get when you get a product that is still in beta
testing.

Cheers,
Jim



Josep Duran wrote:

> I have a small circuit using the 9536XL CPLD. The complete machine uses > 64 of such circuits. I have tested it on the lab and everything works just > fine. > > The problem is the other day, while at the client premises, I saw something > wrong > with one of the boards. The CPLD stopped responding to the commands sent by > the computer. As I had no test equipment available, I just tried to send > some reset > commands to the board and get no response. I turned power off to change the > board, but just before replacing it I gave it another try. To my surprise, > everything worked fine this time. I did some intensive testing to the board, > and again > everything went OK. > > To me, it looks like the CPLD lost its configuration. > Is this at all possible ? If so, what can I do to prevent this from > happening ? > Anybody seen something like this before ? > > > NB - it is a 2 layer board (no GND plane) about 3 sq inches. > > > Thank you for your time. > > Josep Duran > >
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jim Lewis Director of Training mailto:Jim@SynthWorks.com SynthWorks Design Inc. http://www.SynthWorks.com 1-503-590-4787 Expert VHDL Training for Hardware Design and Verification ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Souds like you have asynchronous circuit in your design.
I saw many times your troubles in the past using CPLD andor FPGA. Every 
times these troubles became from kinds of asynchronous circuit or 
conception. When running asynchronous circuit, temperature can be affect 
your design, sometimes or sometimes not.

First, make sure you synchronize all your input signals, before to use 
it !!! Make sure your design DONT use comb. latch !

Regards,
Laurent
www.amontec.com



Josep Duran wrote:
> I have a small circuit using the 9536XL CPLD. The complete machine uses > 64 of such circuits. I have tested it on the lab and everything works just > fine. > > The problem is the other day, while at the client premises, I saw something > wrong > with one of the boards. The CPLD stopped responding to the commands sent by > the computer. As I had no test equipment available, I just tried to send > some reset > commands to the board and get no response. I turned power off to change the > board, but just before replacing it I gave it another try. To my surprise, > everything worked fine this time. I did some intensive testing to the board, > and again > everything went OK. > > To me, it looks like the CPLD lost its configuration. > Is this at all possible ? If so, what can I do to prevent this from > happening ? > Anybody seen something like this before ? > > > NB - it is a 2 layer board (no GND plane) about 3 sq inches. > > > Thank you for your time. > > Josep Duran > >
Thank you Peter,

"Peter Alfke" <peter@xilinx.com> escribi&#4294967295; en el mensaje
news:400F2439.EC57ADB0@xilinx.com...
> The XC9536XL is a Flash-based CPLD. Different from an SRAM-based FPGA, > you do not reconfigure it just by cycling Vcc. The CPLD would need a > fresh in-system programming operation, which is not automatic nor > happens by accident.
Yes. That part I understand.
> So, what might have happened is that your design got into an illegal > state, out of which it cannot excape, but which did not affect the
configuration.
>
This was my first thought, I double checked the state machine, and I don&#4294967295;t think there is a problem there.
> A more far-fetched explanation is based on the fact that a small part of > the Flash-based configuration actually gets transferred into internal > latches (like in an FPGA), which of course might get upset, and that > would be fixed by cycling Vcc.
This is the part I am actually concerned. Could a noisy or poorly decoupled Vcc be the source of the problems. How far-fetched explanation is this ? Is it really possible ? If I read the configuration through the JTAG port, do I get the internal-actual-RAM configuration, or the Flash configuration ? Or should I be looking for a bad solder point or other more mechanical explanation ? Josep Duran
Josep Duran wrote:
> > Thank you Peter, > > "Peter Alfke" <peter@xilinx.com> escribi&#4294967295; en el mensaje > news:400F2439.EC57ADB0@xilinx.com... > > The XC9536XL is a Flash-based CPLD. Different from an SRAM-based FPGA, > > you do not reconfigure it just by cycling Vcc. The CPLD would need a > > fresh in-system programming operation, which is not automatic nor > > happens by accident. > > Yes. That part I understand. > > > So, what might have happened is that your design got into an illegal > > state, out of which it cannot excape, but which did not affect the > configuration. > > > > This was my first thought, I double checked the state machine, and I don&#4294967295;t > think there is a problem there.
Illegal states have to do with combinations of your state FFs that are not accounted for in your machine. Or if you have more than one machine and have not accounted for all the state combinations you can get into trouble. Sometimes two machines interact in a way that they need to be considered one machine. Make sure you have a bubble in your state diagram that cooresponds to every possible state encoding, then there are no "illegal" states. Also account for all combinations of inputs at every state.
> > A more far-fetched explanation is based on the fact that a small part of > > the Flash-based configuration actually gets transferred into internal > > latches (like in an FPGA), which of course might get upset, and that > > would be fixed by cycling Vcc. > > This is the part I am actually concerned. Could a noisy or poorly decoupled > Vcc be the source of the problems. How far-fetched explanation is this ? Is > it really possible ?
Yes, noise on the Vcc can cause trouble for any design that has volital storage, state machine or not. Your state FFs can be corrupted by noise on Vcc.
> If I read the configuration through the JTAG port, do I get the > internal-actual-RAM configuration, or the Flash configuration ? > > Or should I be looking for a bad solder point or other more mechanical > explanation ?
Your problem reminds me of a problem I had a while ago.  The FPGA locked up just your CPLD was doing.  By digging a bit, I found that ISE had implemented my state machines as one-hot so I thought that somehow the FSM had gone into an illegal state.  Forcing the FSM to binary encoding reinforced my belief.   <p>To shorten the story, it turned out the FPGA wasn't really going into an illegal state: the problem was that there was poor signal integrity on the clock signal, which occasionally would have a double edge, causing the bit in the one-hot encoding to be lost. <p>You might want to check out the clock after looking at the Vcc.
If you have clock glitch problems, and cannot resolve them by proper
attention to board-level signal integrity methods ( which you should! ),
then there is always a band-aid method to make the problem vanish. A few
years ago, I published a way to suppress clock glitches, which has saved
several designs alreay:

http://www.xilinx.com/xcell/xl34/xl34_54.pdf

Peter Alfke, Xilinx Applications
============================
Pascal Chamberland wrote:
> > Your problem reminds me of a problem I had a while ago. The FPGA > locked up just your CPLD was doing. By digging a bit, I found that ISE > had implemented my state machines as one-hot so I thought that somehow > the FSM had gone into an illegal state. Forcing the FSM to binary > encoding reinforced my belief. > > To shorten the story, it turned out the FPGA wasn't really going into > an illegal state: the problem was that there was poor signal integrity > on the clock signal, which occasionally would have a double edge, > causing the bit in the one-hot encoding to be lost. > > You might want to check out the clock after looking at the Vcc.
Hi,
What I've done in the past is this. Get your noisy clock into the
FPGA, call it CLKA. Feed it through spare unbonded IOBs with the input
delay feature turned on to make a delayed version, CLKB. Make the
delay longer than the glitch time by using as many IOB delays as
necessary. Now make two signals,
SET <= CLKA and CLKB;
RESET <= not(CLKA) and not(CLKB);
Use SET to set a latch and RESET to reset it. The output of the latch
is your debounced clock, which you feed to your circuit. Disgusting
but effective.
Cheers, Syms.

Peter Alfke <peter@xilinx.com> wrote in message news:<40214E57.192B79B2@xilinx.com>...
> If you have clock glitch problems, and cannot resolve them by proper > attention to board-level signal integrity methods ( which you should! ), > then there is always a band-aid method to make the problem vanish. A few > years ago, I published a way to suppress clock glitches, which has saved > several designs alreay: > > http://www.xilinx.com/xcell/xl34/xl34_54.pdf > > Peter Alfke, Xilinx Applications > ============================