Hi all, Does there exist a best implementation of Asynchronous FIFO? Any suggestions will be appreciated! Best regards, Davy
Best Async FIFO Implementation
Started by ●October 16, 2005
Reply by ●October 16, 20052005-10-16
Davy wrote:> Hi all, > > Does there exist a best implementation of Asynchronous FIFO? > > Any suggestions will be appreciated! > Best regards, > DavyI guess it depends on what you're looking for. At minimum, it should *work* ... Then the rest is a compromise of resources/speed/feature(like almost empty/full flags,...)/...(reliability?) Sylvain
Reply by ●October 16, 20052005-10-16
All members of the Virtex-4 family from Xilinx have a (hard-coded=full-custom) FIFO controller in each of their BlockRAMs. It accepts different clocks for read and write (called "asynchronous operation") at any frequency up to 500 MHz. Capacity is 18 Kbits, the width is 4 to 36 bits, and the depth is accordingly from 4K to 512 addresses (depth and width can easily be expanded with additional BlockRAMs) There is an EMPTY and a FULL flag, and also an ALMOST EMPTY and an ALMOST FULL flag, both fully programmable (with 1-address granularity). I designed the crucial asynchronous empty arbitration logic, and it works perfectly: We tested it by writing data at ~200 MHz into the FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect logic had worked flawlessly for all those >10e14 operations when we stopped the test after a week. No real FIFO application will probably ever go empty 200 million times a second... The high performance is due to very fast and compact full-custom logic, and our long experience in analyzing and dealing with the effects of metastability. Peter Alfke, Xilinx Applications (posting from home)
Reply by ●October 16, 20052005-10-16
Peter Alfke wrote:> All members of the Virtex-4 family from Xilinx have a > (hard-coded=full-custom) FIFO controller in each of their BlockRAMs. It > accepts different clocks for read and write (called "asynchronous > operation") at any frequency up to 500 MHz. Capacity is 18 Kbits, the > width is 4 to 36 bits, and the depth is accordingly from 4K to 512 > addresses (depth and width can easily be expanded with additional > BlockRAMs) > There is an EMPTY and a FULL flag, and also an ALMOST EMPTY and an > ALMOST FULL flag, both fully programmable (with 1-address granularity). > > I designed the crucial asynchronous empty arbitration logic, and it > works perfectly: We tested it by writing data at ~200 MHz into the > FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect > logic had worked flawlessly for all those >10e14 operations when we > stopped the test after a week.Why stop after 1 week ?. Sounds like the sort of app nice to have spinning in the corner of the lab forever.... Did you also test the full detect, or is that expected to be the same by symmetry ?> No real FIFO application will probably ever go empty 200 million times > a second... > The high performance is due to very fast and compact full-custom logic, > and our long experience in analyzing and dealing with the effects of > metastability.So does that mean devices without this full-custom logic, can expect lower performance, and if so, how much lower ? [eg Spartan 3 / 3E ?] -jg
Reply by ●October 16, 20052005-10-16
Hi, Jim.. We stopped after a week because we were satisfied. In one week, we proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to prove 10e16. Diminishing returns...But we definitely did NOT stop because we found an error. No cheating on my watch! For some strange reason (fixed in "Virtex-5") there is a one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead. FULL is not as important as EMPTY, since a properly designed system should never overflow the FIFO, whereas it might be nice to empty it completely. (I often use the savings-account analogy). Yes, using the fabric to implement the FIFO controller might limit the speed to 250 MHz. The reasons for the "hard" FIFO controller were: Higher performance, guaranteed reliable operation without user involvement, and saving fabric resources as well as power consumption. The same reasoning will be used for future "hard" subfunctions. It's the best way to increase speed, functionality, and user-friendliness. How else can we improve by a factor 2 or even more? Peter Alfke
Reply by ●October 16, 20052005-10-16
Peter Alfke wrote:> Hi, Jim.. > We stopped after a week because we were satisfied. In one week, we > proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to > prove 10e16. Diminishing returns...But we definitely did NOT stop > because we found an error. No cheating on my watch! > > For some strange reason (fixed in "Virtex-5") there is a > one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead. > FULL is not as important as EMPTY, since a properly designed system > should never overflow the FIFO, whereas it might be nice to empty it > completely. (I often use the savings-account analogy).Wow! They pay you so much you have to worry about overflow of your saving account ?! ;) -jg
Reply by ●October 16, 20052005-10-16
That's exactly the point. You want to be able to move to another bank or leave town, and get your last cent or penny out of the account. But you really don't worry about overflow. I know you understood... And you are right.: The pay is o.k, considering the fun I am still having... Peter
Reply by ●October 17, 20052005-10-17
In most datacomm applications, filling a buffer can be caused by network congestion, so to prevent dropped packets, you'd want to correctly detect FIFO full, and backpressure accordingly. "Peter Alfke" <alfke@sbcglobal.net> wrote in message news:1129501822.643986.219070@f14g2000cwb.googlegroups.com...> Hi, Jim.. > We stopped after a week because we were satisfied. In one week, we > proved 10e14, it would take 10 weeks to prove 10e15, and 2 years to > prove 10e16. Diminishing returns...But we definitely did NOT stop > because we found an error. No cheating on my watch! > > For some strange reason (fixed in "Virtex-5") there is a > one-clock-pulse latency for FULL. I suggest using ALMOST FULL instead. > FULL is not as important as EMPTY, since a properly designed system > should never overflow the FIFO, whereas it might be nice to empty it > completely. (I often use the savings-account analogy). > > Yes, using the fabric to implement the FIFO controller might limit the > speed to 250 MHz. > The reasons for the "hard" FIFO controller were: > Higher performance, guaranteed reliable operation without user > involvement, and saving fabric resources as well as power consumption. > The same reasoning will be used for future "hard" subfunctions. It's > the best way to increase speed, functionality, and user-friendliness. > How else can we improve by a factor 2 or even more? > Peter Alfke >
Reply by ●October 17, 20052005-10-17
The Virtex-4 has a FULL flag that is synchronous with the write clock (obviously, the read clock does not care) but the FULL flag is activated one clock period late. (The EMPTY flag, synchronous with the read clock does not have this latency, it gets activated by the same clock edge that read the last valid data. Doing that right and fast is the art of asynchronous FIFO design...)) I claim that it is easy to use the ALMOST FULL flag, since the exaxt max capacity of a FIFO is not critical. Set it for 1020 for a 1024-deep FIFO, and you will never be bothered by the latency, you actually get an early warning... Peter Alfke
Reply by ●October 17, 20052005-10-17
Xilinx's asynchronous fifos have a depth of (power of 2) -1 bytes. According to my analysis, using Xilinx's application notes, the reason of it is that full flag can be really generated 1 writing clock period after it is really expected. To overcome overflowing, the fifo depth is decreased by 1. Alex





