FPGARelated.com
Forums

Re: XC3000 non-recoverable lockup problem

Started by Unknown March 22, 2005
A second failure took place.  I reset all of the ICs, disabled the
cards master clock and left all of the FPGAs in the unprogrammed state.
 Looking around I was not able to tell if the 1MHz signal was present
or not.  It is so far down in the noise floor that it is virtually
undetectable.

I decided to start looking at wider BWs.  It appears that the internal
clock is not 1MHz, but much higher.  Doing a sweep from 500KHz to 50MHz
and comparing the peaks, the IC that is in the strange state is missing
a peak at around 16-17MHz.

This signal is changes part to part which I would expect for a sloppy
oscillator.
Again, the data sheets do not mention this.  I will try and call Xilinx
today and see if they can confirm that this is the internal clock.

To further verify that the 16MHz is the internal clock I tried to
change the temperature of the device to see how it effects the
frequency and indeed it does.    Just what you would expect from an RC
design.  I am very confident that the oscillator is the problem.

I did some searching and came across an app note form 1997 that talks
about the 1Mhz clock on the 3000.

"The nominal frequency of this oscillator is 1 MHz with a
max deviation of +25% to -10%. The clock frequency,
therefore, is between 1.25 MHz and 0.5 MHz. In the
XC4000 family, the 1-MHz clock is derived from an internal
8-MHz clock that also can be used as CCLK source."

I have provided Xilinx with the lot codes on these parts and I am
guessing that at some point the oscillator was changed to 16MHz on the
3000.

I am trying more tests now to try and get other oscillators to fail.

lecroy,

The oscillator itself is at a much higher frequency, and is divided down 
to the number listed in the data sheet.  At least, we still do it that 
way, even today.

The accuracy of this oscillator would be from 1/2 to 2X the nominal (it 
just isn't critical).

Since this part still had paper schematics (REALLY) it is far too old 
for us to go look at its design.

Phil is on the right track.

This part did have a brownout issue (if the the voltage dropped just 
right, for just the right amount of time, and came back up) that would 
place it in a locked state that could not be recovered until the power 
was cycled.

I solved this problem 15 years ago by using a Dallas Semi Power on Reset 
part to reset the power supply if it detected a glitch.

The product was an optical multiplexer for then AT&T (and then Lucent).

We had sold more than 100K units in three years.  I think you can still 
buy them even today.

They are used in some applications that are actually critical, so they 
went through an amazing battery of tests (for the audio radio channels 
at all US and Canadian Airports, for example).

Austin


lecroy7200@chek.com wrote:
> To further verify that the 16MHz is the internal clock I tried to > change the temperature of the device to see how it effects the > frequency and indeed it does. Just what you would expect from an RC > design. I am very confident that the oscillator is the problem. > > I did some searching and came across an app note form 1997 that talks > about the 1Mhz clock on the 3000. > > "The nominal frequency of this oscillator is 1 MHz with a > max deviation of +25% to -10%. The clock frequency, > therefore, is between 1.25 MHz and 0.5 MHz. In the > XC4000 family, the 1-MHz clock is derived from an internal > 8-MHz clock that also can be used as CCLK source." > > I have provided Xilinx with the lot codes on these parts and I am > guessing that at some point the oscillator was changed to 16MHz on the > 3000. > > I am trying more tests now to try and get other oscillators to fail. >
lecroy7200@chek.com wrote:
> A second failure took place. I reset all of the ICs, disabled the > cards master clock and left all of the FPGAs in the unprogrammed state. > Looking around I was not able to tell if the 1MHz signal was present > or not. It is so far down in the noise floor that it is virtually > undetectable. > > I decided to start looking at wider BWs. It appears that the internal > clock is not 1MHz, but much higher. Doing a sweep from 500KHz to 50MHz > and comparing the peaks, the IC that is in the strange state is missing > a peak at around 16-17MHz. > > This signal is changes part to part which I would expect for a sloppy > oscillator. > Again, the data sheets do not mention this. I will try and call Xilinx > today and see if they can confirm that this is the internal clock.
That freq makes more sense than 1MHz for the buried osc, as 1MHz is relatively slow, so needs more specialised die area - in the old process of the 3000, a ring osc will give 16-17MHz region. Dividers are simple. If you need additional confirmation it is inside the FPGA, you could give the chip a squirt of freeze - ring osc's are temp dependant. They are likely to gate the loader osc, to save power, so this may only confirm you have exited the first power-up load state, but are unable to get back into load state. -jg
Austin Lesea wrote:

> lecroy, > Phil is on the right track. > > This part did have a brownout issue (if the the voltage dropped just > right, for just the right amount of time, and came back up) that would > place it in a locked state that could not be recovered until the power > was cycled.
Do you recall how low the Vcc had to cycle, in order to correctly recover ?
> > I solved this problem 15 years ago by using a Dallas Semi Power on Reset > part to reset the power supply if it detected a glitch.
Sounds just like my power removal wdog.... :) How did you 'detect a glitch' - was that simply via Vcc lowering, or did that get an "I'm OK" signal from the FPGA ? I have wondered why more regulator chips do not offer this type of 'wide hysteresis' in their operation. -jg
Jim,

See below,

Austin

Jim Granville wrote:
> Austin Lesea wrote: > >> lecroy, >> Phil is on the right track. >> >> This part did have a brownout issue (if the the voltage dropped just >> right, for just the right amount of time, and came back up) that would >> place it in a locked state that could not be recovered until the power >> was cycled. > > > Do you recall how low the Vcc had to cycle, in order to correctly recover ?
As I recall, it had to go below 150 mV to 300 mV to recover.
> >> >> I solved this problem 15 years ago by using a Dallas Semi Power on >> Reset part to reset the power supply if it detected a glitch. > > > Sounds just like my power removal wdog.... :) > How did you 'detect a glitch' - was that simply via Vcc lowering, or > did that get an "I'm OK" signal from the FPGA ?
The POR IC had a settable threshold with an external resistive divider. It responding very quickly. I set it to the voltage range I knew I never wanted to be in. I think that was anything below 2.5V. For a 5V supply, I figured many bad things would happen if I went below 2.5V.
> > I have wondered why more regulator chips do not offer this type of > 'wide hysteresis' in their operation. > > -jg >
The problem is how do you tell? A band gap reference takes a lot of area, and is hard to be accurate in the really deep sub micron tecnologies. So if you can't measure more accurately that +/-5%, why bother?
Austin Lesea wrote:
>> I have wondered why more regulator chips do not offer this type of >> 'wide hysteresis' in their operation. >> >> -jg >> > > The problem is how do you tell? A band gap reference takes a lot of > area, and is hard to be accurate in the really deep sub micron > tecnologies. So if you can't measure more accurately that +/-5%, why > bother?
I did say regulator chip, not FPGA :). In the analog realm of regulators this is a no-brainer, all the support silicon is already there, it just needs a difference in the enable/disable details. Regulators/reset generators on FPGA is another topic entirely... The best indicator of what is possible, are the MOSFET charge based Vref chips from Xicor (now intersil), and the bigger embedded controllers, esp towards the Automotive area, where on chip regulators are more and more common. Todays FPGAs are such power hogs, that this is less practical, but on the 'zero power' CPLDs it makes sense to engineer it better than the present numbers. -jg
> The oscillator itself is at a much higher frequency, and is divided down > to the number listed in the data sheet. At least, we still do it that > way, even today.
This is not what the data sheet states. The 4000 data sheet makes a distinction that it runs at 8MHz and divides down to the 1MHz where the 3000 is at 1MHz. I am not disagreeing with you. I believe that the 3000 was changed overtime and the clock was part of these changes and now runs at around 16MHz. The documents were never updated to reflect this change because it was "transparrent" to the end user. Of course this is all a guess on my part.
> The accuracy of this oscillator would be from 1/2 to 2X the nominal (it > just isn't critical).
Agree, it just needs to work. Too bad it seems to have problems.
> Since this part still had paper schematics (REALLY) it is far too old > for us to go look at its design.
Funny, we can still pull up our paper documents if needed. I agree, its not fun but sometimes you just have to roll up your sleves and dig in.
> Phil is on the right track. > > This part did have a brownout issue (if the the voltage dropped just > right, for just the right amount of time, and came back up) that would > place it in a locked state that could not be recovered until the power > was cycled.
Again, I read Xilinx's app. note on the brown out problem and it makes it clear that the part can be reset without removing power. I don't disagree that the internal logic could get into a locked state and that there was not a problem with brown out. I also think it is very possible that the current devices being sold could have a second problem with the internal oscillator. There is no mention anywhere about the oscillators failing to start or locking up in the brown out app. note. I am sure if Xilinx would have known this, it would have been documented and the power cycle requirements would have been called out, which they are not.
> I solved this problem 15 years ago by using a Dallas Semi Power on Reset > part to reset the power supply if it detected a glitch.
Again, power cycling the device, no matter how it could be done, is not an option for this system. It sounds like Xilinx is not willing to dig into the root problem of the oscillator. I can understand this to some degree. After all the software has not supported the device in several years. So my next question is if you are able to tell me if the oscillator design used in the currently sold 3000s is being used in other Xilinx devices?
> > > > Do you recall how low the Vcc had to cycle, in order to correctly
recover ?
> > As I recall, it had to go below 150 mV to 300 mV to recover. >
After testing the second failure, I tried the power cycle test again. The second part behaved the same as the first. Removing power from the device and shorting the supply (much less than 150mV) for over 1mS would not cause the oscillator to restart (observing it with the spectrum analyzer).
lecroy,

Regardless of what any piece of paper claims, it is the memory of many 
here that the only way to recover is by powering down.

As a 15 year old problem, it is one that we only have our (failing) 
memories to rely upon.

There was no answer database in those days.

There was no hotline.

Austin

lecroy7200 wrote:
>>The oscillator itself is at a much higher frequency, and is divided down >>to the number listed in the data sheet. At least, we still do it that >>way, even today. > > > This is not what the data sheet states. The 4000 data sheet makes a > distinction that it runs at 8MHz and divides down to the 1MHz where the 3000 > is at 1MHz. I am not disagreeing with you. I believe that the 3000 was > changed overtime and the clock was part of these changes and now runs at > around 16MHz. The documents were never updated to reflect this change > because it was "transparrent" to the end user. Of course this is all a > guess on my part. > > >>The accuracy of this oscillator would be from 1/2 to 2X the nominal (it >>just isn't critical). > > > Agree, it just needs to work. Too bad it seems to have problems. > > >>Since this part still had paper schematics (REALLY) it is far too old >>for us to go look at its design. > > > Funny, we can still pull up our paper documents if needed. I agree, its > not fun but sometimes you just have to roll up your sleves and dig in. > > >>Phil is on the right track. >> >>This part did have a brownout issue (if the the voltage dropped just >>right, for just the right amount of time, and came back up) that would >>place it in a locked state that could not be recovered until the power >>was cycled. > > > Again, I read Xilinx's app. note on the brown out problem and it makes it > clear that the part can be reset without removing power. I don't disagree > that the internal logic could get into a locked state and that there was not > a problem with brown out. I also think it is very possible that the current > devices being sold could have a second problem with the internal oscillator. > There is no mention anywhere about the oscillators failing to start or > locking up in the brown out app. note. I am sure if Xilinx would have known > this, it would have been documented and the power cycle requirements would > have been called out, which they are not. > > >>I solved this problem 15 years ago by using a Dallas Semi Power on Reset >>part to reset the power supply if it detected a glitch. > > > Again, power cycling the device, no matter how it could be done, is not an > option for this system. > > It sounds like Xilinx is not willing to dig into the root problem of the > oscillator. I can understand this to some degree. After all the software > has not supported the device in several years. So my next question is if > you are able to tell me if the oscillator design used in the currently sold > 3000s is being used in other Xilinx devices? > > > > > > > > > > > > > > > >