FPGARelated.com
Forums

Weird JTAG lockup issue, where is the BUG?

Started by Antti July 9, 2006
Hi

I have several Spartan3 boards that have a very weird issue, namly when
configured with one specific VHDL design using Impact with verify off
then after first programming attempt (status fail with CRC check!) the
JTAG chain is reported broken before the FPGA and further configuration
or even jtag idcode reading is not possible until complete power off
the FPGA. When imact option verify is on then however the same
bitstream can be used to configure the boards multiply times, the JTAG
lockup doesnt happen. It is not related to bad bitstream because the
VHDL design (LEON3 system) when compiled to different FPGA (S3-1500 or
s3-4000) has the same behaviour. The boards in question (2 different
PCBs) seem to work with all other design I have tested.

To my understanding the JTAG TAP controller should be completly
separate function block from the FPGA fabric - so no matter what is
loaded as FPGA config should not make the JTAG TAP unscannable. So the
issue could be only related to power supply behaviour, some voltage
spike at FPGA startup?

Any ideas what to test or where to look? Or what to test. I would
really like to get to the bottom of the problem and understand how come
does LEON3 design make the JTAG Chain to die (this is what is looks
like for the moment).

The FPGAs on the boards where I see this behaviour are with date codes
mentioned in

http://direct.xilinx.com/bvdocs/notifications/xcn06018.pdf

but I dont think this could be the issue?

Antti

"Antti" <Antti.Lukats@xilant.com> wrote:

>Hi > >I have several Spartan3 boards that have a very weird issue, namly when >configured with one specific VHDL design using Impact with verify off >then after first programming attempt (status fail with CRC check!) the >JTAG chain is reported broken before the FPGA and further configuration >or even jtag idcode reading is not possible until complete power off >the FPGA. When imact option verify is on then however the same >bitstream can be used to configure the boards multiply times, the JTAG >lockup doesnt happen. It is not related to bad bitstream because the >VHDL design (LEON3 system) when compiled to different FPGA (S3-1500 or >s3-4000) has the same behaviour. The boards in question (2 different >PCBs) seem to work with all other design I have tested. > >To my understanding the JTAG TAP controller should be completly >separate function block from the FPGA fabric - so no matter what is >loaded as FPGA config should not make the JTAG TAP unscannable. So the >issue could be only related to power supply behaviour, some voltage >spike at FPGA startup?
I've had this problem with Spartan2 fpga's. I even cooked a few! So far I could trace the problem, it had to do with power supply current capability and bypassing. Sometimes the fpga will draw a huge amount of current during configuration. If the power supply system (including the bypass capacitors) can't supply this current, you'll have some latch-ups in the fpga. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl
Nico Coesel schrieb:

> "Antti" <Antti.Lukats@xilant.com> wrote: > > >Hi > > > >I have several Spartan3 boards that have a very weird issue, namly when > >configured with one specific VHDL design using Impact with verify off > >then after first programming attempt (status fail with CRC check!) the > >JTAG chain is reported broken before the FPGA and further configuration > >or even jtag idcode reading is not possible until complete power off > >the FPGA. When imact option verify is on then however the same > >bitstream can be used to configure the boards multiply times, the JTAG > >lockup doesnt happen. It is not related to bad bitstream because the > >VHDL design (LEON3 system) when compiled to different FPGA (S3-1500 or > >s3-4000) has the same behaviour. The boards in question (2 different > >PCBs) seem to work with all other design I have tested. > > > >To my understanding the JTAG TAP controller should be completly > >separate function block from the FPGA fabric - so no matter what is > >loaded as FPGA config should not make the JTAG TAP unscannable. So the > >issue could be only related to power supply behaviour, some voltage > >spike at FPGA startup? > > I've had this problem with Spartan2 fpga's. I even cooked a few! So > far I could trace the problem, it had to do with power supply current > capability and bypassing. Sometimes the fpga will draw a huge amount > of current during configuration. If the power supply system (including > the bypass capacitors) can't supply this current, you'll have some > latch-ups in the fpga. > > -- > Reply to nico@nctdevpuntnl (punt=.) > Bedrijven en winkels vindt U op www.adresboekje.nl
hi thanks for answer, and yes that is what I think also the problem could be. but I assumed the Spartan 3 has no special requirements of huge currents required to startup. both 1.2 and 2.5V powersupplies are 6A step-downs from LT and look like really designed by the book. Gosh I really hate if I need to troubleshoot them. I still wonder why the latchup never happens when I select "verify on" in impact !? guess I need to setup up DSO trigger on done=1 and monitor all the supplies at the transition time. Antti
I'm not that familiar with Xilinx's FPGA's; but I did have an issue with an 
Altera FPGA that turned out to be power supply related.  The problem was 
that the power-up configuration was unstable, sometimes it would work and 
other times it wouldn't.  But, if I powered up, then initiated a 
configuration (from an on board push-button), it always worked.  This led me 
to look at the power rails. In my case, I had a power supply that was 
generating a non-monotonic rise on VCCint.  Once I fixed the rise so that it 
was smooth the problem went away.

Can you initiate, or re-initiate, the configuration cycle after you are 
powerd up and the voltage rails are stable?  If so, try it, and see what 
happens.  It may give you another clue.

Take care,
Rob



"Antti" <Antti.Lukats@xilant.com> wrote in message 
news:1152431443.921543.193450@m79g2000cwm.googlegroups.com...
> Hi > > I have several Spartan3 boards that have a very weird issue, namly when > configured with one specific VHDL design using Impact with verify off > then after first programming attempt (status fail with CRC check!) the > JTAG chain is reported broken before the FPGA and further configuration > or even jtag idcode reading is not possible until complete power off > the FPGA. When imact option verify is on then however the same > bitstream can be used to configure the boards multiply times, the JTAG > lockup doesnt happen. It is not related to bad bitstream because the > VHDL design (LEON3 system) when compiled to different FPGA (S3-1500 or > s3-4000) has the same behaviour. The boards in question (2 different > PCBs) seem to work with all other design I have tested. > > To my understanding the JTAG TAP controller should be completly > separate function block from the FPGA fabric - so no matter what is > loaded as FPGA config should not make the JTAG TAP unscannable. So the > issue could be only related to power supply behaviour, some voltage > spike at FPGA startup? > > Any ideas what to test or where to look? Or what to test. I would > really like to get to the bottom of the problem and understand how come > does LEON3 design make the JTAG Chain to die (this is what is looks > like for the moment). > > The FPGAs on the boards where I see this behaviour are with date codes > mentioned in > > http://direct.xilinx.com/bvdocs/notifications/xcn06018.pdf > > but I dont think this could be the issue? > > Antti >
Rob schrieb:

> I'm not that familiar with Xilinx's FPGA's; but I did have an issue with an > Altera FPGA that turned out to be power supply related. The problem was > that the power-up configuration was unstable, sometimes it would work and > other times it wouldn't. But, if I powered up, then initiated a > configuration (from an on board push-button), it always worked. This led me > to look at the power rails. In my case, I had a power supply that was > generating a non-monotonic rise on VCCint. Once I fixed the rise so that it > was smooth the problem went away. > > Can you initiate, or re-initiate, the configuration cycle after you are > powerd up and the voltage rails are stable? If so, try it, and see what > happens. It may give you another clue. > > Take care, > Rob
Hi Rob, 1) I can configure and reconfigure the board with many many different designs and never see an issue at all. 2) when using one specific design/bitstream then I can configure and reconfigure any number of times when Xilinx impact is set to perform configure and verify. Impact even reports programming and verify success !! 3) using the same bitstream and impact with configure, but no verify then first configuration attempts says configure error (CRC error) and after that the JTAG chain is reported as broken before the FPGA. The power supplies are still proper Voltage and stable and the FPGA does not get hot. But it needs to be power cycled for the JTAG TAP to come live again. I understand that power supply is the most likely issue but why doesnt the issue never happen when jtag operation is set to configure_and_verify? and locks up the jtag tap 100% when attempting to configure without verify? I bet this remains "Xilinx mystery" forever. Antti
Anti,

All devices after Virtex E (Sparta 2E) have no extra current required 
over that which is specified in the data sheet for minimum power on current.

Is it possible that the configuration you are loading requires more 
power than you have available?

I have seen DONE go high, only for the power supply to crash, fold back, 
  and the part starts to reconfigure again.

As for the JTAG state machine, it is definitely possible for it to enter 
a "bad" state from which it may never recover.  It is only with Virtex 
4, and now Virtex 5, that we have worked carefully on the state machines 
to harden them from soft errors, which might place them in an 
unrecoverable state.  Irradiation with neutrons can quickly find those 
hidden bad states!

Austin
Austin Lesea schrieb:

> Anti, > > All devices after Virtex E (Sparta 2E) have no extra current required > over that which is specified in the data sheet for minimum power on current. > > Is it possible that the configuration you are loading requires more > power than you have available? > > I have seen DONE go high, only for the power supply to crash, fold back, > and the part starts to reconfigure again. > > As for the JTAG state machine, it is definitely possible for it to enter > a "bad" state from which it may never recover. It is only with Virtex > 4, and now Virtex 5, that we have worked carefully on the state machines > to harden them from soft errors, which might place them in an > unrecoverable state. Irradiation with neutrons can quickly find those > hidden bad states! > > Austin
Hi Austin, I also did think there is no extra power surge at configuration on S3. I do not think the design takes more power then available. I was just porting LEON3 design onto some new boards to have more designs for the board test. To my very surprise the LEON3 design never started up correctly. I did make the design smaller by disabling MMU and caches and the problem persisted. The desing uses 13% of S3-4000 and is set to run from 25MHz. All power supplies are rated to 6A. The same board runs succesfully a Microblaze desing with two separate SDRAM controllers, ethernetand TFT display cores at 72MHz, I would bet that design should defenetly burn more dynamic current than the plain vanilla LEON3 design. Ok I cant measure the LEON3 design power as it never comes up live. Wrong I can, I have one Memec board with s3-1500 I can load the design that fails on my board onto memec and measure current and then measure current on my boards with some design that do work. That should tell if the boards that fails do work with the current that the LEON3 design requires. As of JTAG dead states - that fact that Xilinx has only ironed it out for V4 and V5 really surprises me. A JTAG TAP isnt rocket sience. -- I havent been able to test with non JTAG configuration methods yet maybe the all issue is only with impact software - the JTAG chain contains a Atmel AT91SAM7S ARM with JTAGSEL=0 eg the ARM ICE JTAG chain is selected. It is remotly possible that the ARM JTAG is getting messed up somehow. This could even explain why there is difference when configuring with verify on and off. Well it means that I have stumbled into some very nasty Impact bug? I know that the ARM ICE JTAG is not 100% proper JTAG but as long as it.. hmm maybe i solved the issue at this very moment, as the ARM JTAG has a bug that disturbs some JTAG operations when JTAG clock is over system clock and the Atmel ARM powers up with internal 128KHz clock, then it is remotly possible it gets upset somehow. As I did not see problem so far I assumed the ARM BYPASS works at higher speeds also (but all assumptions are wrong). If I think of it, then it sounds like that this must be the problem. Just weird that every other design worked so far and one design doesnt. Antti
I didn't think that was the problem, but I thought I would throw it out 
there.  Bizarre problem indeed.  Please post when you find the answer.

"Antti" <Antti.Lukats@xilant.com> wrote in message 
news:1152471480.276262.151590@35g2000cwc.googlegroups.com...
> Rob schrieb: > >> I'm not that familiar with Xilinx's FPGA's; but I did have an issue with >> an >> Altera FPGA that turned out to be power supply related. The problem was >> that the power-up configuration was unstable, sometimes it would work and >> other times it wouldn't. But, if I powered up, then initiated a >> configuration (from an on board push-button), it always worked. This led >> me >> to look at the power rails. In my case, I had a power supply that was >> generating a non-monotonic rise on VCCint. Once I fixed the rise so that >> it >> was smooth the problem went away. >> >> Can you initiate, or re-initiate, the configuration cycle after you are >> powerd up and the voltage rails are stable? If so, try it, and see what >> happens. It may give you another clue. >> >> Take care, >> Rob > > Hi Rob, > > 1) I can configure and reconfigure the board with many many different > designs and never see an issue at all. > > 2) when using one specific design/bitstream then I can configure and > reconfigure any number of times when Xilinx impact is set to perform > configure and verify. Impact even reports programming and verify > success !! > > 3) using the same bitstream and impact with configure, but no verify > then first configuration attempts says configure error (CRC error) and > after that the JTAG chain is reported as broken before the FPGA. The > power supplies are still proper Voltage and stable and the FPGA does > not get hot. But it needs to be power cycled for the JTAG TAP to come > live again. > > I understand that power supply is the most likely issue but why doesnt > the issue never happen when jtag operation is set to > configure_and_verify? and locks up the jtag tap 100% when attempting to > configure without verify? > > I bet this remains "Xilinx mystery" forever. > > Antti >
Rob schrieb:

> I didn't think that was the problem, but I thought I would throw it out > there. Bizarre problem indeed. Please post when you find the answer. > > "Antti" <Antti.Lukats@xilant.com> wrote in message > news:1152471480.276262.151590@35g2000cwc.googlegroups.com... > > Rob schrieb:
[snip]
> > 3) using the same bitstream and impact with configure, but no verify > > then first configuration attempts says configure error (CRC error) and > > after that the JTAG chain is reported as broken before the FPGA. The > > power supplies are still proper Voltage and stable and the FPGA does > > not get hot. But it needs to be power cycled for the JTAG TAP to come > > live again. > > > > I understand that power supply is the most likely issue but why doesnt > > the issue never happen when jtag operation is set to > > configure_and_verify? and locks up the jtag tap 100% when attempting to > > configure without verify? > > > > I bet this remains "Xilinx mystery" forever. > > > > Antti > >
mystery solved ! The issue is the bug in ARM core netlist that is licensed by Atmel for the AT91SAM7S! The problem was in no way related to any issues with Xilinx FPGA or power supplies despite the weird 'Effect' that the issue was only visible whith one specific FPGA design and only with impact and only when configuration attempt was done with verify OFF setting. When the AT91SAM7S has PLL enabled the issue with the same 'bad' bitstream doesnt occour anymore. Antti
Austin Lesea wrote:
> Anti, > > All devices after Virtex E (Sparta 2E) have no extra current required > over that which is specified in the data sheet for minimum power on > current. > > Is it possible that the configuration you are loading requires more > power than you have available? > > I have seen DONE go high, only for the power supply to crash, fold back, > and the part starts to reconfigure again. > > As for the JTAG state machine, it is definitely possible for it to enter > a "bad" state from which it may never recover. It is only with Virtex > 4, and now Virtex 5, that we have worked carefully on the state machines > to harden them from soft errors, which might place them in an > unrecoverable state. Irradiation with neutrons can quickly find those > hidden bad states! >
I am surprised there: I thought the JTAG standard had defined that state machine so that a limited number (5, by memory) of clocks with TMS=1 would force it out of anything.