FPGARelated.com
Forums

Virtex 4 FIFO16 blocks - Corruption ?

Started by Sylvain Munaut November 18, 2005
Hi,


We're faced with a strange problem ...
While investigating a bug in one design, we could only observe that
behavior on real board and not in simulation.

Using chipscope, we finally traced down the problem by monitoring
both write and read port of a FIFO16 configured as 18x1024, using the
same rd/wr clocks. That fifo was used in a "weird" way, by setting a
ALMOSTFULL threshold very high (but still within spec), so that it turn
on very quicly. And what we observed is that we push a data with some
parity bits (which are not 'true' parity but some critical control), we
continue to push, the almost full goes up (normal), and we still push
(we still have plenty of room) and at the same time we re-read but
slower (not at each clock cycle) and when we finally re-read the data
where the parity bit was set, the data (15:0) are there but the parity
bit is not, it's just 0 ...

The chipscope 'probes' were tied directly to the fifo signals, no logic
in between. That fifo is supposed to cross clock domains but for
debugging, we just sent the same clock everywhere. And the behavior of
the surrounding logic is consitent with that bit being missed.

Instead of using ALMOSTFULL set to a very high value, we used not
ALMOSTEMPTY (here since we're debugging with just 1 clock domain, it's
ok), and there it looks like we never observe such a miss.


Has someone ever observed such a behavior ?



	Sylvain
Sylvain Munaut wrote:
> Hi, > > > We're faced with a strange problem ... > While investigating a bug in one design, we could only observe that > behavior on real board and not in simulation. > > Using chipscope, we finally traced down the problem by monitoring > both write and read port of a FIFO16 configured as 18x1024, using the > same rd/wr clocks. That fifo was used in a "weird" way, by setting a > ALMOSTFULL threshold very high (but still within spec), so that it turn > on very quicly. And what we observed is that we push a data with some > parity bits (which are not 'true' parity but some critical control), we > continue to push, the almost full goes up (normal), and we still push > (we still have plenty of room) and at the same time we re-read but > slower (not at each clock cycle) and when we finally re-read the data > where the parity bit was set, the data (15:0) are there but the parity > bit is not, it's just 0 ... > > The chipscope 'probes' were tied directly to the fifo signals, no logic > in between. That fifo is supposed to cross clock domains but for > debugging, we just sent the same clock everywhere. And the behavior of > the surrounding logic is consitent with that bit being missed. > > Instead of using ALMOSTFULL set to a very high value, we used not > ALMOSTEMPTY (here since we're debugging with just 1 clock domain, it's > ok), and there it looks like we never observe such a miss. > > > Has someone ever observed such a behavior ? > > > > Sylvain
Have you got any resolution on this? Have you opened a case with Xilinx? What does Xilinx have to say about it? I am aware that some people have had problems with the FIFO16 not working correctly. I had an issue with trying to use the FIFO as a synchronous fifo (it is async, so there is a possibility with some ambiguity on the flag latency when both clocks are the same). I have asked Xilinx repeatedly to document this behavior prominently in the user guide, but so far they have only quietly acknowledged that the user has to be careful if read and write clocks are the same. That said, your problem is different than the one I experienced and appears to be a more serious problem in the FIFO16 logic. You are not the first person I've heard state they had problems with the fifo16 async behavior. There may be some issues with the flag logic for asynchronous use as well. I do find it interesting that Altera was forthcoming with their recent problems with dual port memories. I hope that Xilinx is equally forthcoming if there is indeed a problem with the FIFO16 logic.
Ray,

The bug for use of the async FIFO synchronously has been acknolwedged, 
and we apologize for not getting it out there more prominently.  But:

In our defense, it is unusual (or at least, so far we think it is 
unusual) where the read and write clocks are tied directly together (why 
use a FIFO at all?  I guess it is a really useful structure, so even 
when used this way it is too useful to ignore....?).

The solution is to not source the two clocks from the same source 
directly, but place a small delay in one, or the other.

The problem does not exist in the asynchronous case, as it takes two 
subsequent clock cycles on BOTH clocks (at exactly the wrong times) to 
cause the problem.  As long as the probability of two adjacent clock 
cyles not coming in on both clocks exactly the same just as you are 
getting full (or is it empty? I'm not the expert on this), it works fine.

Sometimes with problems like this (that are difficult to even cause) it 
doesn't make sense to put up a billboard that it is an issue, as then 
everyone comes down with the disease (mass hypochrondira) when they 
don't really have the problem.

Now, if the feature is just plain broke, then it is a different story, 
and we will end the pain as soon as we are sure it is just plain broke.

No one is intentionally hiding anything, but we are judiciously placing 
(obscure) bug information only with the hotline and support community, 
rather than broadcasting it across the entire user community publicy.

If, for any reason, you feel that you have caught the disease (have a 
bug we haven't shared universally), the entry of a webcase will get you 
the help you need, as the hotline will search for all such issues.  If 
yours is there, then we will immediately share with you the solution.

These are known as "internal answers" and it isn't that we don't want to 
share them, we just don't think they are likely issues for everyone. 
Better to talk to you and find out what the problem is, first.

If these internal answers are made external, we imagine there would be 
thousands of designers running down debug paths that are so obscure, 
there is almost no chance they will find this as their problem.  Then we 
get a bad reputation, and the hotline is overwhelmed with folks who all 
think they have this obscure problem!

I hope folks will appreciate that sometimes telling every strange and 
obscure story causes more trouble than selectively understanding each 
issue that arises, and dealing with it directly.

Support:  it is an art.

Austin


Ray Andraka wrote:

> Sylvain Munaut wrote: > >> Hi, >> >> >> We're faced with a strange problem ... >> While investigating a bug in one design, we could only observe that >> behavior on real board and not in simulation. >> >> Using chipscope, we finally traced down the problem by monitoring >> both write and read port of a FIFO16 configured as 18x1024, using the >> same rd/wr clocks. That fifo was used in a "weird" way, by setting a >> ALMOSTFULL threshold very high (but still within spec), so that it turn >> on very quicly. And what we observed is that we push a data with some >> parity bits (which are not 'true' parity but some critical control), we >> continue to push, the almost full goes up (normal), and we still push >> (we still have plenty of room) and at the same time we re-read but >> slower (not at each clock cycle) and when we finally re-read the data >> where the parity bit was set, the data (15:0) are there but the parity >> bit is not, it's just 0 ... >> >> The chipscope 'probes' were tied directly to the fifo signals, no logic >> in between. That fifo is supposed to cross clock domains but for >> debugging, we just sent the same clock everywhere. And the behavior of >> the surrounding logic is consitent with that bit being missed. >> >> Instead of using ALMOSTFULL set to a very high value, we used not >> ALMOSTEMPTY (here since we're debugging with just 1 clock domain, it's >> ok), and there it looks like we never observe such a miss. >> >> >> Has someone ever observed such a behavior ? >> >> >> >> Sylvain > > > Have you got any resolution on this? Have you opened a case with > Xilinx? What does Xilinx have to say about it? > > I am aware that some people have had problems with the FIFO16 not > working correctly. I had an issue with trying to use the FIFO as a > synchronous fifo (it is async, so there is a possibility with some > ambiguity on the flag latency when both clocks are the same). I have > asked Xilinx repeatedly to document this behavior prominently in the > user guide, but so far they have only quietly acknowledged that the user > has to be careful if read and write clocks are the same. > > That said, your problem is different than the one I experienced and > appears to be a more serious problem in the FIFO16 logic. You are not > the first person I've heard state they had problems with the fifo16 > async behavior. There may be some issues with the flag logic for > asynchronous use as well. > > I do find it interesting that Altera was forthcoming with their recent > problems with dual port memories. I hope that Xilinx is equally > forthcoming if there is indeed a problem with the FIFO16 logic. >
Austin,

You are kidding as far as the usefulness of a synchronous fifo (one 
which has both sides clocked by the same clock), right?  This is a
rather common structure in pipelined designs, it is an elastic buffer. 
Useful, for example, for processing bursty data at a more relaxed rate 
than the data is presented. I'd be hard pressed to find one of my 
designs that does NOT have a synchronous FIFO in it.  The solution with 
the "small" delay is fine if you are not pushing the performance 
envelope, but it will destroy timing closure in designs that are.  For 
example, I have a floating point FFT design with a target clock rate of 
400 MHz in an SX55-10 part... basically running at the DSP48/memory 
speed.  It has synchronous FIFOs in it, and there is no room in the 
timing for adding small delays to clocks.  This is a real limitation to 
the FIFO16 design, and has cost me several weeks of debug and redesign 
time to find and work around it.  It should be prominently highlighted 
in the user guide under the section that describes the used of the 
FIFO16.  I am sure other users are going to encounter the same issue. No 
one looks at the answers database until they have a problem and have 
identified the source of the problem.  The synchronous FIFO issue could 
easily be considered a limitation rather than an outright bug, but it 
does have to be made clear to the user before he does the design, not 
when trying to figure out why it isn't working. By keeping it close to 
your chest as an internal answer, I suspect you'll wind up generating a 
heck of a lot more hotline cases than if you put it in black and white 
right in the user's guide that this is the way the FIFO16's work and 
that these are the things you need to do to work around the limitation 
if the clocks are the same on both sides. BTW, I don't think this is an 
"obscure" issue either, as anyone attempting to use the FIFO16 as a 
synchronous FIFO is going to encounter it.

The flip answers regarding the synchronous FIFO (things like such a 
structure is not useful, and just add delays to the clock when I've 
explained that it is not a viable solution for maximum performance 
designs), combined with the reluctance to make it clear to users that 
this is a limitation of the FIFO16 design, makes it appear that either 
Xilinx doesn't understand the issue or that they are trying to sweep it 
under the rug.  I presume and hope it is the former, although neither is 
a particularly good outcome.

I am reluctant to enter a webcase on an issue such as this unless it has 
become critical for the project. Invariably, the result of entering a 
webcase is my having to generate and submit testcases to prove the 
problem, and often having to come up with my own work-around because the 
fix won't be available until the next major release.  Nobody pays me for 
the time spent doing testcases to ferret out the source of a bug in the 
software or silicon.  There have been months recently where I've spent 
more than a quarter of my time identifying and generating test cases for 
problems in the tools (not just Xilinx).  Naturally, I'd like to avoid 
that as much as practical.

Regarding the asynchronous FIFO behavior, I don't have any direct 
experience with the FIFO16 behaving badly as an async FIFO, But I 
haven't used it in that mode in a design that has made it to testing.
Silvain's description does sound as though the FIFO may be misbehaving, 
and it jives with things I've heard from others.  This is why I asked 
him if he had opened a case with Xilinx and what the resolution of that 
case was.  It is important to know if there is a potential problem so 
that I can avoid it during the design rather than discover it during 
integration.  I am currently working on a design that has several async 
FIFO16's in it, and would like to believe that they will work for me, 
however these rumblings have me concerned, hence my asking Sylvain about 
his resolution.  So far, the work arounds I am aware of have used the 
coregen FIFO instead of the FIFO16, which does not have the same clock 
performance as the FIFO16.

I didn't intend to kick over the beehive here, I was only trying to 
collect more data so that I might avoid a problem in my own design if it 
does exist.





Austin Lesea wrote:
> Ray, > > The bug for use of the async FIFO synchronously has been acknolwedged, > and we apologize for not getting it out there more prominently. But: > > In our defense, it is unusual (or at least, so far we think it is > unusual) where the read and write clocks are tied directly together (why > use a FIFO at all? I guess it is a really useful structure, so even > when used this way it is too useful to ignore....?).
> > The solution is to not source the two clocks from the same source > directly, but place a small delay in one, or the other. > > The problem does not exist in the asynchronous case, as it takes two > subsequent clock cycles on BOTH clocks (at exactly the wrong times) to > cause the problem. As long as the probability of two adjacent clock > cyles not coming in on both clocks exactly the same just as you are > getting full (or is it empty? I'm not the expert on this), it works fine. > > Sometimes with problems like this (that are difficult to even cause) it > doesn't make sense to put up a billboard that it is an issue, as then > everyone comes down with the disease (mass hypochrondira) when they > don't really have the problem. > > Now, if the feature is just plain broke, then it is a different story, > and we will end the pain as soon as we are sure it is just plain broke. > > No one is intentionally hiding anything, but we are judiciously placing > (obscure) bug information only with the hotline and support community, > rather than broadcasting it across the entire user community publicy. > > If, for any reason, you feel that you have caught the disease (have a > bug we haven't shared universally), the entry of a webcase will get you > the help you need, as the hotline will search for all such issues. If > yours is there, then we will immediately share with you the solution. > > These are known as "internal answers" and it isn't that we don't want to > share them, we just don't think they are likely issues for everyone. > Better to talk to you and find out what the problem is, first. > > If these internal answers are made external, we imagine there would be > thousands of designers running down debug paths that are so obscure, > there is almost no chance they will find this as their problem. Then we > get a bad reputation, and the hotline is overwhelmed with folks who all > think they have this obscure problem! > > I hope folks will appreciate that sometimes telling every strange and > obscure story causes more trouble than selectively understanding each > issue that arises, and dealing with it directly. > > Support: it is an art. > > Austin > > > Ray Andraka wrote: > >> Sylvain Munaut wrote: >> >>> Hi, >>> >>> >>> We're faced with a strange problem ... >>> While investigating a bug in one design, we could only observe that >>> behavior on real board and not in simulation. >>> >>> Using chipscope, we finally traced down the problem by monitoring >>> both write and read port of a FIFO16 configured as 18x1024, using the >>> same rd/wr clocks. That fifo was used in a "weird" way, by setting a >>> ALMOSTFULL threshold very high (but still within spec), so that it turn >>> on very quicly. And what we observed is that we push a data with some >>> parity bits (which are not 'true' parity but some critical control), we >>> continue to push, the almost full goes up (normal), and we still push >>> (we still have plenty of room) and at the same time we re-read but >>> slower (not at each clock cycle) and when we finally re-read the data >>> where the parity bit was set, the data (15:0) are there but the parity >>> bit is not, it's just 0 ... >>> >>> The chipscope 'probes' were tied directly to the fifo signals, no logic >>> in between. That fifo is supposed to cross clock domains but for >>> debugging, we just sent the same clock everywhere. And the behavior of >>> the surrounding logic is consitent with that bit being missed. >>> >>> Instead of using ALMOSTFULL set to a very high value, we used not >>> ALMOSTEMPTY (here since we're debugging with just 1 clock domain, it's >>> ok), and there it looks like we never observe such a miss. >>> >>> >>> Has someone ever observed such a behavior ? >>> >>> >>> >>> Sylvain >> >> >> >> Have you got any resolution on this? Have you opened a case with >> Xilinx? What does Xilinx have to say about it? >> >> I am aware that some people have had problems with the FIFO16 not >> working correctly. I had an issue with trying to use the FIFO as a >> synchronous fifo (it is async, so there is a possibility with some >> ambiguity on the flag latency when both clocks are the same). I have >> asked Xilinx repeatedly to document this behavior prominently in the >> user guide, but so far they have only quietly acknowledged that the >> user has to be careful if read and write clocks are the same. >> >> That said, your problem is different than the one I experienced and >> appears to be a more serious problem in the FIFO16 logic. You are not >> the first person I've heard state they had problems with the fifo16 >> async behavior. There may be some issues with the flag logic for >> asynchronous use as well. >> >> I do find it interesting that Altera was forthcoming with their recent >> problems with dual port memories. I hope that Xilinx is equally >> forthcoming if there is indeed a problem with the FIFO16 logic. >>
Ray, your comments are again right on target with my own feelings about bugs 
and support.  The webcase submission issue especially hits home.  One minor 
difference for me may be that when I find unusual behavior and have it 
isolated to a functional portion of the design, I may check the (externally 
available) knowledge database for any information relating to my problem 
area before spending a few more days to further isolate the cause.  I've had 
several instances where the information is in *a* database, just not one I 
can get to.

For ANYONE who is concerned with whether or not to air the dirty laundry of 
their EDA tools and silicon, PLEASE read through Ray's note and understand 
where designers come from.  Our company has had TOO many issues with silicon 
(non-FPGA as well) and EDA tools ("you knew about this for how many 
months?") that when we encounter known bugs that are "hidden" from plain 
view, we are LIVID.  There is no excuse to withhold information that WILL 
affect designs if there is a way to communicate the issues externally.

In this instance, there is a way.



"Ray Andraka" <ray@andraka.com> wrote in message 
news:LlIjf.15740$Mi5.7070@dukeread07...
> Austin, > > You are kidding as far as the usefulness of a synchronous fifo (one which > has both sides clocked by the same clock), right? This is a > rather common structure in pipelined designs, it is an elastic buffer. > Useful, for example, for processing bursty data at a more relaxed rate > than the data is presented. I'd be hard pressed to find one of my designs > that does NOT have a synchronous FIFO in it. The solution with the > "small" delay is fine if you are not pushing the performance envelope, but > it will destroy timing closure in designs that are. For example, I have a > floating point FFT design with a target clock rate of 400 MHz in an > SX55-10 part... basically running at the DSP48/memory speed. It has > synchronous FIFOs in it, and there is no room in the timing for adding > small delays to clocks. This is a real limitation to the FIFO16 design, > and has cost me several weeks of debug and redesign time to find and work > around it. It should be prominently highlighted in the user guide under > the section that describes the used of the FIFO16. I am sure other users > are going to encounter the same issue. No one looks at the answers > database until they have a problem and have identified the source of the > problem. The synchronous FIFO issue could easily be considered a > limitation rather than an outright bug, but it does have to be made clear > to the user before he does the design, not when trying to figure out why > it isn't working. By keeping it close to your chest as an internal answer, > I suspect you'll wind up generating a heck of a lot more hotline cases > than if you put it in black and white right in the user's guide that this > is the way the FIFO16's work and that these are the things you need to do > to work around the limitation if the clocks are the same on both sides. > BTW, I don't think this is an "obscure" issue either, as anyone attempting > to use the FIFO16 as a synchronous FIFO is going to encounter it. > > The flip answers regarding the synchronous FIFO (things like such a > structure is not useful, and just add delays to the clock when I've > explained that it is not a viable solution for maximum performance > designs), combined with the reluctance to make it clear to users that this > is a limitation of the FIFO16 design, makes it appear that either Xilinx > doesn't understand the issue or that they are trying to sweep it under the > rug. I presume and hope it is the former, although neither is a > particularly good outcome. > > I am reluctant to enter a webcase on an issue such as this unless it has > become critical for the project. Invariably, the result of entering a > webcase is my having to generate and submit testcases to prove the > problem, and often having to come up with my own work-around because the > fix won't be available until the next major release. Nobody pays me for > the time spent doing testcases to ferret out the source of a bug in the > software or silicon. There have been months recently where I've spent > more than a quarter of my time identifying and generating test cases for > problems in the tools (not just Xilinx). Naturally, I'd like to avoid > that as much as practical. > > Regarding the asynchronous FIFO behavior, I don't have any direct > experience with the FIFO16 behaving badly as an async FIFO, But I haven't > used it in that mode in a design that has made it to testing. > Silvain's description does sound as though the FIFO may be misbehaving, > and it jives with things I've heard from others. This is why I asked him > if he had opened a case with Xilinx and what the resolution of that case > was. It is important to know if there is a potential problem so that I > can avoid it during the design rather than discover it during integration. > I am currently working on a design that has several async FIFO16's in it, > and would like to believe that they will work for me, however these > rumblings have me concerned, hence my asking Sylvain about his resolution. > So far, the work arounds I am aware of have used the coregen FIFO instead > of the FIFO16, which does not have the same clock performance as the > FIFO16. > > I didn't intend to kick over the beehive here, I was only trying to > collect more data so that I might avoid a problem in my own design if it > does exist. > > > > > > Austin Lesea wrote: >> Ray, >> >> The bug for use of the async FIFO synchronously has been acknolwedged, >> and we apologize for not getting it out there more prominently. But: >> >> In our defense, it is unusual (or at least, so far we think it is >> unusual) where the read and write clocks are tied directly together (why >> use a FIFO at all? I guess it is a really useful structure, so even when >> used this way it is too useful to ignore....?). > > >> >> The solution is to not source the two clocks from the same source >> directly, but place a small delay in one, or the other. >> >> The problem does not exist in the asynchronous case, as it takes two >> subsequent clock cycles on BOTH clocks (at exactly the wrong times) to >> cause the problem. As long as the probability of two adjacent clock >> cyles not coming in on both clocks exactly the same just as you are >> getting full (or is it empty? I'm not the expert on this), it works fine. >> >> Sometimes with problems like this (that are difficult to even cause) it >> doesn't make sense to put up a billboard that it is an issue, as then >> everyone comes down with the disease (mass hypochrondira) when they don't >> really have the problem. >> >> Now, if the feature is just plain broke, then it is a different story, >> and we will end the pain as soon as we are sure it is just plain broke. >> >> No one is intentionally hiding anything, but we are judiciously placing >> (obscure) bug information only with the hotline and support community, >> rather than broadcasting it across the entire user community publicy. >> >> If, for any reason, you feel that you have caught the disease (have a bug >> we haven't shared universally), the entry of a webcase will get you the >> help you need, as the hotline will search for all such issues. If yours >> is there, then we will immediately share with you the solution. >> >> These are known as "internal answers" and it isn't that we don't want to >> share them, we just don't think they are likely issues for everyone. >> Better to talk to you and find out what the problem is, first. >> >> If these internal answers are made external, we imagine there would be >> thousands of designers running down debug paths that are so obscure, >> there is almost no chance they will find this as their problem. Then we >> get a bad reputation, and the hotline is overwhelmed with folks who all >> think they have this obscure problem! >> >> I hope folks will appreciate that sometimes telling every strange and >> obscure story causes more trouble than selectively understanding each >> issue that arises, and dealing with it directly. >> >> Support: it is an art. >> >> Austin >> >> >> Ray Andraka wrote: >> >>> Sylvain Munaut wrote: >>> >>>> Hi, >>>> >>>> >>>> We're faced with a strange problem ... >>>> While investigating a bug in one design, we could only observe that >>>> behavior on real board and not in simulation. >>>> >>>> Using chipscope, we finally traced down the problem by monitoring >>>> both write and read port of a FIFO16 configured as 18x1024, using the >>>> same rd/wr clocks. That fifo was used in a "weird" way, by setting a >>>> ALMOSTFULL threshold very high (but still within spec), so that it turn >>>> on very quicly. And what we observed is that we push a data with some >>>> parity bits (which are not 'true' parity but some critical control), we >>>> continue to push, the almost full goes up (normal), and we still push >>>> (we still have plenty of room) and at the same time we re-read but >>>> slower (not at each clock cycle) and when we finally re-read the data >>>> where the parity bit was set, the data (15:0) are there but the parity >>>> bit is not, it's just 0 ... >>>> >>>> The chipscope 'probes' were tied directly to the fifo signals, no logic >>>> in between. That fifo is supposed to cross clock domains but for >>>> debugging, we just sent the same clock everywhere. And the behavior of >>>> the surrounding logic is consitent with that bit being missed. >>>> >>>> Instead of using ALMOSTFULL set to a very high value, we used not >>>> ALMOSTEMPTY (here since we're debugging with just 1 clock domain, it's >>>> ok), and there it looks like we never observe such a miss. >>>> >>>> >>>> Has someone ever observed such a behavior ? >>>> >>>> >>>> >>>> Sylvain >>> >>> >>> >>> Have you got any resolution on this? Have you opened a case with >>> Xilinx? What does Xilinx have to say about it? >>> >>> I am aware that some people have had problems with the FIFO16 not >>> working correctly. I had an issue with trying to use the FIFO as a >>> synchronous fifo (it is async, so there is a possibility with some >>> ambiguity on the flag latency when both clocks are the same). I have >>> asked Xilinx repeatedly to document this behavior prominently in the >>> user guide, but so far they have only quietly acknowledged that the user >>> has to be careful if read and write clocks are the same. >>> >>> That said, your problem is different than the one I experienced and >>> appears to be a more serious problem in the FIFO16 logic. You are not >>> the first person I've heard state they had problems with the fifo16 >>> async behavior. There may be some issues with the flag logic for >>> asynchronous use as well. >>> >>> I do find it interesting that Altera was forthcoming with their recent >>> problems with dual port memories. I hope that Xilinx is equally >>> forthcoming if there is indeed a problem with the FIFO16 logic. >>>
John, Ray,

I never said, nor implied we would intentionally withhold information.

That is bad.  Really bad.

I understand that you may consider our choice of distribution of 
information (through answers externally available, or through answers 
internally available to case workers and FAEs) to be unacceptable.

I will entertain any other solutions.  One that I might suggest is that 
you sign up for a push email whenever something happens that you 
indicate you are interested in.  Maybe its been tried, maybe not.  I 
know we do have push email systems in place now.  Perhaps we need to add 
features?

As far as async FIFO issues go, I am getting some emails on that subject 
as well.  So, even though I thought (and witnessed) the extensive FIFO 
testing on V4, the problem with a test that passes is that a test is 
never the application.

I will reserve a whole-hearted endorsement of perfection until I hear 
more about what the alleged issues are with async mode.

And Ray, I appreciate the use of the FIFO synchronously (toungue firmly 
in cheek comments), I just never thought about it before.  Making an 
async FIFO is so much black magic that you spend all your time looking 
at the async case, and no time with the sync case (obviously the problem 
here).

Austin


John_H wrote:

> Ray, your comments are again right on target with my own feelings about bugs > and support. The webcase submission issue especially hits home. One minor > difference for me may be that when I find unusual behavior and have it > isolated to a functional portion of the design, I may check the (externally > available) knowledge database for any information relating to my problem > area before spending a few more days to further isolate the cause. I've had > several instances where the information is in *a* database, just not one I > can get to. > > For ANYONE who is concerned with whether or not to air the dirty laundry of > their EDA tools and silicon, PLEASE read through Ray's note and understand > where designers come from. Our company has had TOO many issues with silicon > (non-FPGA as well) and EDA tools ("you knew about this for how many > months?") that when we encounter known bugs that are "hidden" from plain > view, we are LIVID. There is no excuse to withhold information that WILL > affect designs if there is a way to communicate the issues externally. > > In this instance, there is a way. > > > > "Ray Andraka" <ray@andraka.com> wrote in message > news:LlIjf.15740$Mi5.7070@dukeread07... > >>Austin, >> >>You are kidding as far as the usefulness of a synchronous fifo (one which >>has both sides clocked by the same clock), right? This is a >>rather common structure in pipelined designs, it is an elastic buffer. >>Useful, for example, for processing bursty data at a more relaxed rate >>than the data is presented. I'd be hard pressed to find one of my designs >>that does NOT have a synchronous FIFO in it. The solution with the >>"small" delay is fine if you are not pushing the performance envelope, but >>it will destroy timing closure in designs that are. For example, I have a >>floating point FFT design with a target clock rate of 400 MHz in an >>SX55-10 part... basically running at the DSP48/memory speed. It has >>synchronous FIFOs in it, and there is no room in the timing for adding >>small delays to clocks. This is a real limitation to the FIFO16 design, >>and has cost me several weeks of debug and redesign time to find and work >>around it. It should be prominently highlighted in the user guide under >>the section that describes the used of the FIFO16. I am sure other users >>are going to encounter the same issue. No one looks at the answers >>database until they have a problem and have identified the source of the >>problem. The synchronous FIFO issue could easily be considered a >>limitation rather than an outright bug, but it does have to be made clear >>to the user before he does the design, not when trying to figure out why >>it isn't working. By keeping it close to your chest as an internal answer, >>I suspect you'll wind up generating a heck of a lot more hotline cases >>than if you put it in black and white right in the user's guide that this >>is the way the FIFO16's work and that these are the things you need to do >>to work around the limitation if the clocks are the same on both sides. >>BTW, I don't think this is an "obscure" issue either, as anyone attempting >>to use the FIFO16 as a synchronous FIFO is going to encounter it. >> >>The flip answers regarding the synchronous FIFO (things like such a >>structure is not useful, and just add delays to the clock when I've >>explained that it is not a viable solution for maximum performance >>designs), combined with the reluctance to make it clear to users that this >>is a limitation of the FIFO16 design, makes it appear that either Xilinx >>doesn't understand the issue or that they are trying to sweep it under the >>rug. I presume and hope it is the former, although neither is a >>particularly good outcome. >> >>I am reluctant to enter a webcase on an issue such as this unless it has >>become critical for the project. Invariably, the result of entering a >>webcase is my having to generate and submit testcases to prove the >>problem, and often having to come up with my own work-around because the >>fix won't be available until the next major release. Nobody pays me for >>the time spent doing testcases to ferret out the source of a bug in the >>software or silicon. There have been months recently where I've spent >>more than a quarter of my time identifying and generating test cases for >>problems in the tools (not just Xilinx). Naturally, I'd like to avoid >>that as much as practical. >> >>Regarding the asynchronous FIFO behavior, I don't have any direct >>experience with the FIFO16 behaving badly as an async FIFO, But I haven't >>used it in that mode in a design that has made it to testing. >>Silvain's description does sound as though the FIFO may be misbehaving, >>and it jives with things I've heard from others. This is why I asked him >>if he had opened a case with Xilinx and what the resolution of that case >>was. It is important to know if there is a potential problem so that I >>can avoid it during the design rather than discover it during integration. >>I am currently working on a design that has several async FIFO16's in it, >>and would like to believe that they will work for me, however these >>rumblings have me concerned, hence my asking Sylvain about his resolution. >>So far, the work arounds I am aware of have used the coregen FIFO instead >>of the FIFO16, which does not have the same clock performance as the >>FIFO16. >> >>I didn't intend to kick over the beehive here, I was only trying to >>collect more data so that I might avoid a problem in my own design if it >>does exist. >> >> >> >> >> >>Austin Lesea wrote: >> >>>Ray, >>> >>>The bug for use of the async FIFO synchronously has been acknolwedged, >>>and we apologize for not getting it out there more prominently. But: >>> >>>In our defense, it is unusual (or at least, so far we think it is >>>unusual) where the read and write clocks are tied directly together (why >>>use a FIFO at all? I guess it is a really useful structure, so even when >>>used this way it is too useful to ignore....?). >> >> >>>The solution is to not source the two clocks from the same source >>>directly, but place a small delay in one, or the other. >>> >>>The problem does not exist in the asynchronous case, as it takes two >>>subsequent clock cycles on BOTH clocks (at exactly the wrong times) to >>>cause the problem. As long as the probability of two adjacent clock >>>cyles not coming in on both clocks exactly the same just as you are >>>getting full (or is it empty? I'm not the expert on this), it works fine. >>> >>>Sometimes with problems like this (that are difficult to even cause) it >>>doesn't make sense to put up a billboard that it is an issue, as then >>>everyone comes down with the disease (mass hypochrondira) when they don't >>>really have the problem. >>> >>>Now, if the feature is just plain broke, then it is a different story, >>>and we will end the pain as soon as we are sure it is just plain broke. >>> >>>No one is intentionally hiding anything, but we are judiciously placing >>>(obscure) bug information only with the hotline and support community, >>>rather than broadcasting it across the entire user community publicy. >>> >>>If, for any reason, you feel that you have caught the disease (have a bug >>>we haven't shared universally), the entry of a webcase will get you the >>>help you need, as the hotline will search for all such issues. If yours >>>is there, then we will immediately share with you the solution. >>> >>>These are known as "internal answers" and it isn't that we don't want to >>>share them, we just don't think they are likely issues for everyone. >>>Better to talk to you and find out what the problem is, first. >>> >>>If these internal answers are made external, we imagine there would be >>>thousands of designers running down debug paths that are so obscure, >>>there is almost no chance they will find this as their problem. Then we >>>get a bad reputation, and the hotline is overwhelmed with folks who all >>>think they have this obscure problem! >>> >>>I hope folks will appreciate that sometimes telling every strange and >>>obscure story causes more trouble than selectively understanding each >>>issue that arises, and dealing with it directly. >>> >>>Support: it is an art. >>> >>>Austin >>> >>> >>>Ray Andraka wrote: >>> >>> >>>>Sylvain Munaut wrote: >>>> >>>> >>>>>Hi, >>>>> >>>>> >>>>>We're faced with a strange problem ... >>>>>While investigating a bug in one design, we could only observe that >>>>>behavior on real board and not in simulation. >>>>> >>>>>Using chipscope, we finally traced down the problem by monitoring >>>>>both write and read port of a FIFO16 configured as 18x1024, using the >>>>>same rd/wr clocks. That fifo was used in a "weird" way, by setting a >>>>>ALMOSTFULL threshold very high (but still within spec), so that it turn >>>>>on very quicly. And what we observed is that we push a data with some >>>>>parity bits (which are not 'true' parity but some critical control), we >>>>>continue to push, the almost full goes up (normal), and we still push >>>>>(we still have plenty of room) and at the same time we re-read but >>>>>slower (not at each clock cycle) and when we finally re-read the data >>>>>where the parity bit was set, the data (15:0) are there but the parity >>>>>bit is not, it's just 0 ... >>>>> >>>>>The chipscope 'probes' were tied directly to the fifo signals, no logic >>>>>in between. That fifo is supposed to cross clock domains but for >>>>>debugging, we just sent the same clock everywhere. And the behavior of >>>>>the surrounding logic is consitent with that bit being missed. >>>>> >>>>>Instead of using ALMOSTFULL set to a very high value, we used not >>>>>ALMOSTEMPTY (here since we're debugging with just 1 clock domain, it's >>>>>ok), and there it looks like we never observe such a miss. >>>>> >>>>> >>>>>Has someone ever observed such a behavior ? >>>>> >>>>> >>>>> >>>>> Sylvain >>>> >>>> >>>> >>>>Have you got any resolution on this? Have you opened a case with >>>>Xilinx? What does Xilinx have to say about it? >>>> >>>>I am aware that some people have had problems with the FIFO16 not >>>>working correctly. I had an issue with trying to use the FIFO as a >>>>synchronous fifo (it is async, so there is a possibility with some >>>>ambiguity on the flag latency when both clocks are the same). I have >>>>asked Xilinx repeatedly to document this behavior prominently in the >>>>user guide, but so far they have only quietly acknowledged that the user >>>>has to be careful if read and write clocks are the same. >>>> >>>>That said, your problem is different than the one I experienced and >>>>appears to be a more serious problem in the FIFO16 logic. You are not >>>>the first person I've heard state they had problems with the fifo16 >>>>async behavior. There may be some issues with the flag logic for >>>>asynchronous use as well. >>>> >>>>I do find it interesting that Altera was forthcoming with their recent >>>>problems with dual port memories. I hope that Xilinx is equally >>>>forthcoming if there is indeed a problem with the FIFO16 logic. >>>> > > >
Hi Ray,

Ray Andraka wrote:
> Have you got any resolution on this? Have you opened a case with > Xilinx? What does Xilinx have to say about it?
My colleague had some contact with our distributor but afaik, no news yet. Looking at the xilinx answer record , I saw that fifo usign FIFO16 blocks generated with an old version of fifogenerator could show some datacorruption problem and that usage of the new one is recommanded ... but I didn't use coregenerator, i instanciated FIFO16 directly (coregen doesn't have first word fall thru anyway ...) I haven't opened a webcase myself yet, ... often before "bothering" xilinx peoples, I want to be sure ;p I've tried to reproduce the problem with a far simpler design but so far no luck ... (even in the full design it's quite "rare" but 1 times suffice to lock it ...)
> I am aware that some people have had problems with the FIFO16 not > working correctly. I had an issue with trying to use the FIFO as a > synchronous fifo (it is async, so there is a possibility with some > ambiguity on the flag latency when both clocks are the same). I have > asked Xilinx repeatedly to document this behavior prominently in the > user guide, but so far they have only quietly acknowledged that the user > has to be careful if read and write clocks are the same.
What exactly is the problem if the clocks are the same ? (what behaviour could happen ?)
> That said, your problem is different than the one I experienced and > appears to be a more serious problem in the FIFO16 logic. You are not > the first person I've heard state they had problems with the fifo16 > async behavior. There may be some issues with the flag logic for > asynchronous use as well.
Well, here we use the fifo synchronously ... They are meant in the future to be used asynchronously but for testing, we've put everything at the same clock. But other part in the design will always use them synchronously so I must get it working in both mode ... Sylvain
Austin Lesea wrote:
> Ray, > > The bug for use of the async FIFO synchronously has been acknolwedged, > and we apologize for not getting it out there more prominently. But:
Where can I get detailled infos about it ? (to be sure not to run into it, or at least that it doesn't cause trouble in my design ?) Sylvain
Sylvain,

I expect the fastest way is to open a webcase requesting the information.

As I already stated, if both read and write clocks are from the same 
BUFG net, then this may (will) probably be an issue at some 
process/voltage/temperature corner (hence the indsidiousness of the issue).

A quick fix is to drive one of the clocks from the other edge (one 
rising, one falling) which may require another BUFG resource (in order 
to be sure the delay doesn't put you right back where you started).

It is my understanding that a macro will be created to instantiate the 
sync FIFO with the required offset delay automatically in the best way 
we can (probably using fabric resources, like a LUT, doubles, hexes, etc.).

The issue as I was told is that at the critical instant, the almost 
full/almost empty flag assertions will be correct, but if the event 
occurs again on the very next clock cycle, the flag will reset to 0, 
which will not be correct (as the FIFO is still almost full, or almost 
empty if nothing was done to read anything out, or write anything in on 
that cycle).

There may be other simpler solutions (that we haven't thought of yet).

Again, the jury is out on the async case....

Austin

Sylvain Munaut wrote:

> Austin Lesea wrote: > >>Ray, >> >>The bug for use of the async FIFO synchronously has been acknolwedged, >>and we apologize for not getting it out there more prominently. But: > > > Where can I get detailled infos about it ? (to be sure not to run into > it, or at least that it doesn't cause trouble in my design ?) > > > > Sylvain
Sylvain Munaut wrote:

> Austin Lesea wrote: > >>Ray, >> >>The bug for use of the async FIFO synchronously has been acknolwedged, >>and we apologize for not getting it out there more prominently. But: > > > Where can I get detailled infos about it ? (to be sure not to run into > it, or at least that it doesn't cause trouble in my design ?) > > > > Sylvain
This is exactly what I mean by the problem being hidden. I searched the answers database for FIFO16, and did not turn up anything regarding the known synchronous behavior problem, nor any async problems. It may still only be in the internal database, if it is even there. In debugging stuff like this, I've always assumed the silicon is good and that any problems are a result of the design until I can prove otherwise. As a result, you don't suspect the FIFO itself as being the problem. That can lead to a tremendous amount of debugging effort before finding out there is a problem or unpublished limitation with the silicon. Considering how much time I spent fiddling with this problem, I suspect there are literally thousands of manhours put into debugging the same problem in different projects simply because Xilinx doesn't want to advertise a limitation with their design. The problem with the synchronous usage is that the flag circuit is an async design. When the clock is the same to both sides, and a read and write are done on the same clock cycle, the flag circuit displays a one clock jitter in the timing of the flag outputs, such that the word written in at the same time the last one is read out may or may not make the fifo show empty. If empty does get set, it then takes something like 3 clocks to go away, so you wind up with a non-deterministic behavior. It is an artifact of using an async flag circuit. BTW, finding stuff in the answers database is a lot like finding a needle in a haystack, provided you even know what you are looking for.