FPGARelated.com
Forums

Best Async FIFO Implementation

Started by Davy October 16, 2005
Hi all,

Does there exist a best implementation of Asynchronous FIFO?

Any suggestions will be appreciated!
Best regards,
Davy

Davy wrote:
> Hi all, > > Does there exist a best implementation of Asynchronous FIFO? > > Any suggestions will be appreciated! > Best regards, > Davy
I guess it depends on what you're looking for. At minimum, it should *work* ... Then the rest is a compromise of resources/speed/feature(like almost empty/full flags,...)/...(reliability?) Sylvain
All members of the Virtex-4 family from Xilinx have a
(hard-coded=full-custom) FIFO controller in each of their BlockRAMs. It
accepts different clocks for read and write (called "asynchronous
operation") at any frequency up to 500 MHz. Capacity is 18 Kbits, the
width is 4 to 36 bits, and the depth is accordingly from 4K to 512
addresses (depth and width can easily be expanded with additional
BlockRAMs)
There is an  EMPTY and a FULL flag, and also an ALMOST EMPTY and an
ALMOST FULL flag, both fully programmable (with 1-address granularity).

I designed the crucial asynchronous empty arbitration logic, and it
works perfectly: We tested it by writing data at ~200 MHz into the
FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect
logic had worked flawlessly for all those >10e14 operations when we
stopped the test after a week.
No real FIFO application will probably ever go empty 200 million times
a second...
The high performance is due to very fast and compact full-custom logic,
and our long experience in analyzing and dealing with the effects of
metastability.

Peter Alfke, Xilinx Applications (posting from home)

Peter Alfke wrote:
> All members of the Virtex-4 family from Xilinx have a > (hard-coded=full-custom) FIFO controller in each of their BlockRAMs. It > accepts different clocks for read and write (called "asynchronous > operation") at any frequency up to 500 MHz. Capacity is 18 Kbits, the > width is 4 to 36 bits, and the depth is accordingly from 4K to 512 > addresses (depth and width can easily be expanded with additional > BlockRAMs) > There is an EMPTY and a FULL flag, and also an ALMOST EMPTY and an > ALMOST FULL flag, both fully programmable (with 1-address granularity). > > I designed the crucial asynchronous empty arbitration logic, and it > works perfectly: We tested it by writing data at ~200 MHz into the > FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect > logic had worked flawlessly for all those >10e14 operations when we > stopped the test after a week.
Why stop after 1 week ?. Sounds like the sort of app nice to have spinning in the corner of the lab forever.... Did you also test the full detect, or is that expected to be the same by symmetry ?
> No real FIFO application will probably ever go empty 200 million times > a second... > The high performance is due to very fast and compact full-custom logic, > and our long experience in analyzing and dealing with the effects of > metastability.
So does that mean devices without this full-custom logic, can expect lower performance, and if so, how much lower ? [eg Spartan 3 / 3E ?] -jg
Hi, Jim..
We stopped after a week because we were satisfied. In one week, we
proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to
prove 10e16. Diminishing returns...But we definitely did NOT stop
because we found an error. No cheating on my watch!

For some strange reason (fixed in "Virtex-5") there is a
one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead.
FULL is not as important as EMPTY, since a properly designed system
should never overflow the FIFO, whereas it might be nice to empty it
completely. (I often use the savings-account analogy).

Yes, using the fabric to implement the FIFO controller might limit the
speed to 250 MHz.
The reasons for the "hard" FIFO controller were:
Higher performance, guaranteed reliable operation without user
involvement, and saving fabric resources as well as power consumption.
The same reasoning will be used for future "hard" subfunctions. It's
the best way to increase speed, functionality, and user-friendliness.
How else can we improve by a factor 2 or even more?
Peter Alfke

Peter Alfke wrote:
> Hi, Jim.. > We stopped after a week because we were satisfied. In one week, we > proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to > prove 10e16. Diminishing returns...But we definitely did NOT stop > because we found an error. No cheating on my watch! > > For some strange reason (fixed in "Virtex-5") there is a > one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead. > FULL is not as important as EMPTY, since a properly designed system > should never overflow the FIFO, whereas it might be nice to empty it > completely. (I often use the savings-account analogy).
Wow! They pay you so much you have to worry about overflow of your saving account ?! ;) -jg
That's exactly the point. You want to be able to move to another bank
or leave town, and get your last cent or penny out of the account. But
you really don't worry about overflow.
I know you understood...
And you are right.: The pay is o.k, considering the fun I am still
having...
Peter

In most datacomm applications, filling a buffer can be caused by network 
congestion, so to prevent dropped packets, you'd want to correctly detect 
FIFO full, and backpressure accordingly.


"Peter Alfke" <alfke@sbcglobal.net> wrote in message 
news:1129501822.643986.219070@f14g2000cwb.googlegroups.com...
> Hi, Jim.. > We stopped after a week because we were satisfied. In one week, we > proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to > prove 10e16. Diminishing returns...But we definitely did NOT stop > because we found an error. No cheating on my watch! > > For some strange reason (fixed in "Virtex-5") there is a > one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead. > FULL is not as important as EMPTY, since a properly designed system > should never overflow the FIFO, whereas it might be nice to empty it > completely. (I often use the savings-account analogy). > > Yes, using the fabric to implement the FIFO controller might limit the > speed to 250 MHz. > The reasons for the "hard" FIFO controller were: > Higher performance, guaranteed reliable operation without user > involvement, and saving fabric resources as well as power consumption. > The same reasoning will be used for future "hard" subfunctions. It's > the best way to increase speed, functionality, and user-friendliness. > How else can we improve by a factor 2 or even more? > Peter Alfke >
The Virtex-4 has a FULL flag that is synchronous with the write clock
(obviously, the read clock does not care) but the FULL flag is
activated one clock period late. (The EMPTY flag, synchronous with the
read clock does not have this latency, it gets activated by the same
clock edge that read the last valid data. Doing that right and fast is
the art of asynchronous FIFO design...))
I claim that it is easy to use the ALMOST FULL flag, since the exaxt
max capacity of a FIFO is not critical. Set it for 1020 for a 1024-deep
FIFO, and you will never be bothered by the latency, you actually get
an early warning...
Peter Alfke

Xilinx's asynchronous fifos have a depth of (power of 2) -1 bytes.
According to my analysis, using Xilinx's application notes, the reason
of it is that full flag can be really generated 1 writing clock period
after it is really expected. To overcome overflowing, the fifo depth is
decreased by 1.
Alex