FPGARelated.com
Forums

fifo or sdram bug?

Started by kaz July 30, 2015
kaz <37480@fpgarelated> wrote:
> >>Is the problem that the data is off by 8, or that the data gets stored >> in a location that is off by 8? If it's that the data is off by 8, >> then the number of sdram pages or the sdram muxing is not relevant.
(snip)
> data is 16 bits wide, nothing wrong with bits. all sample values are > correct. > odd samples do not follow their even members e.g. if a correct stream is > indexed as 0,1,2,3,4,5,6,7,8,9,10...etc then what we get is: > 0,9,2,11,4,13,6,15,8,17,10
Somehow this reminds me of something from years ago, which was using real IC FIFOs instead of FPGA ones. Somehow the system wasn't following FULL and ALMOST FULL, and would wrap the FIFO. But that usually results in data loss, which you seem to indicate doesn't happen. I don't know SDRAM timing enough to say. If all the data paths are 16 bits, it is funny to have an offset on eight bit boundaries! -- glen
I will try to give more details instead of my "reduced simplified version"
and hopefully answer some of your questions.

I am talking about a DPD functionality where software reads from sdram
2,457,600 samples of each of TxI,TxQ,sRxI,sRxQ.
all these four slots are 16 bits signed and interleaved in above order
giving a total stream size of 2,457,600 x 4 samples. 

inside FPGA:
TxI and TxQ are first concatenated as(16 x 2 bits), then passed through a
small dc fifo for clock crossing.

sRxI and sRxQ are data received from Tx after going through DAC & PA then
sampled back by an ADC for DPD algorithm. sRxI and sRxQ are also
concatenated as 16 x 2 bits. They also go through their dc fifo for clock
crossing. 

Then all four data are concatenated as 16 x 4 = 64 bits. 

The stream is then passed as 128 bits using sc fifo for sdram controller
IF (Altera sdram controller). At the i/o data is passed as two streams
each 16 bits and each has its own sdram. Thus we have two sdrams (one for
Tx data and one for sRx data)

Almost all field units work without any problem. Occasionally, it is
reported that DPD algorithm fails and when I looked at captured files I
noticed that sRx data was ok but TxI and TxQ each shows same problem I
described where their odd samples had shifted location relative to even
ones. So instead of the normal order of 0,1,2,3,4,...etc. I noticed it was
0,9,2,11,4,13,6,15,8,... from beginning to the end of 2,456,7600

Apart from that there is no other error and all values are correct judging
by spectrum and time domain.

What happens at the moment of the glitch we don't know, I haven't tested
any failed units in the lab though I requested that. We have inserted some
extra logic to capture data directly from fifos in case of the event but
we failed to reproduce the error. Units are in different countries and it
is hard to keep track of debugging.

My first conclusion is that there must be memory involved and it must be a
case of read/write toggling. The basic fpga concatenation logic does not
involve storage and so is ruled out. FPGA fifos are block ram based and we
have hundreds of them all across the design for various parts without
issues.
sdram controller and i/o timing have been done by Altera experts.

Design is timing clean, lab tested across full range of temperature.

Kaz




---------------------------------------
Posted through http://www.FPGARelated.com
kaz <37480@fpgarelated> wrote:
> I will try to give more details instead of my "reduced simplified version" > and hopefully answer some of your questions.
(snip)
> inside FPGA: > TxI and TxQ are first concatenated as(16 x 2 bits), then > passed through a small dc fifo for clock crossing.
How small is this FIFO? (depth x width) By they way, it is usual to use Gray code when passing the FIFO address across the clock domain. I think they convert back, but maybe just address the BRAM with Gray code. -- glen
On 7/30/2015 4:24 PM, kaz wrote:
> I will try to give more details instead of my "reduced simplified version" > and hopefully answer some of your questions. > > I am talking about a DPD functionality where software reads from sdram > 2,457,600 samples of each of TxI,TxQ,sRxI,sRxQ. > all these four slots are 16 bits signed and interleaved in above order > giving a total stream size of 2,457,600 x 4 samples. > > inside FPGA: > TxI and TxQ are first concatenated as(16 x 2 bits), then passed through a > small dc fifo for clock crossing.
Should I assume DC means "dual clock"? So this FIFO is 32 bits wide?
> sRxI and sRxQ are data received from Tx after going through DAC & PA then > sampled back by an ADC for DPD algorithm. sRxI and sRxQ are also > concatenated as 16 x 2 bits. They also go through their dc fifo for clock > crossing. > > Then all four data are concatenated as 16 x 4 = 64 bits. > > The stream is then passed as 128 bits using sc fifo for sdram controller > IF (Altera sdram controller). At the i/o data is passed as two streams > each 16 bits and each has its own sdram. Thus we have two sdrams (one for > Tx data and one for sRx data)
I don't find this part clear at all. Above you say the data stream is 64 bits, then 128, then two streams of 16 bits. So the data is packed with one sample of each of the four data streams (TxI,TxQ,sRxI,sRxQ) to make 64 bits, then two words of this are grouped to make 128 bits. But then it is all broken back down into 16 bit individual samples?
> Almost all field units work without any problem. Occasionally, it is > reported that DPD algorithm fails and when I looked at captured files I > noticed that sRx data was ok but TxI and TxQ each shows same problem I > described where their odd samples had shifted location relative to even > ones. So instead of the normal order of 0,1,2,3,4,...etc. I noticed it was > 0,9,2,11,4,13,6,15,8,... from beginning to the end of 2,456,7600
So you can't say what happened to samples 1, 3, 5, etc? The data is being handed to the SDRAM as 16 bit samples, TxI0, TxQ0, TxI1, TxQ1,...? So when you have the glitch the alignment is shifted for both TxI and TxQ or just one? If both, that would be 16 samples of 16 bit data, right?
> Apart from that there is no other error and all values are correct judging > by spectrum and time domain. > > What happens at the moment of the glitch we don't know, I haven't tested > any failed units in the lab though I requested that. We have inserted some > extra logic to capture data directly from fifos in case of the event but > we failed to reproduce the error. Units are in different countries and it > is hard to keep track of debugging. > > My first conclusion is that there must be memory involved and it must be a > case of read/write toggling. The basic fpga concatenation logic does not > involve storage and so is ruled out. FPGA fifos are block ram based and we > have hundreds of them all across the design for various parts without > issues. > sdram controller and i/o timing have been done by Altera experts. > > Design is timing clean, lab tested across full range of temperature. > > Kaz
-- Rick
>> inside FPGA: >> TxI and TxQ are first concatenated as(16 x 2 bits), then >> passed through a small dc fifo for clock crossing. > >How small is this FIFO? (depth x width) > >By they way, it is usual to use Gray code when passing the FIFO >address across the clock domain. I think they convert back, but >maybe just address the BRAM with Gray code. > >-- glen
it is dual clock fifo(368.64Mhz => 245.76MHz, 32 bits wide, 16 words deep it is altera core, we just write/read under our rate control logic avoiding empty/full situation The sRx fifo is 245.76 => 245.76 with same above width/depth Kaz --------------------------------------- Posted through http://www.FPGARelated.com
>On 7/30/2015 4:24 PM, kaz wrote:
>I don't find this part clear at all. Above you say the data stream is >64 bits, then 128, then two streams of 16 bits. So the data is packed >with one sample of each of the four data streams (TxI,TxQ,sRxI,sRxQ) to >make 64 bits, then two words of this are grouped to make 128 bits. But >then it is all broken back down into 16 bit individual samples? >
correct, that is how it is designed (I assume it is to do with SOPC interface)
> >So you can't say what happened to samples 1, 3, 5, etc? The data is >being handed to the SDRAM as 16 bit samples, TxI0, TxQ0, TxI1, TxQ1,...? >
correct I should correct myself about the offset value, it is 16 samples(not 8) in the sense of stream index i.e. I get samples in the order 0,17,2,19,4,21,...etc
>So when you have the glitch the alignment is shifted for both TxI and >TxQ or just one? If both, that would be 16 samples of 16 bit data,
right?
>
both I and Q symmetrically, if I reverse the offset of both I get proper signal. I don't have two captures and I assume the error wraps.
>> Kaz > > >-- > >Rick
--------------------------------------- Posted through http://www.FPGARelated.com
On Thu, 30 Jul 2015 04:05:45 -0500, kaz wrote:

> In our system a signal is passed through a couple of fifos inside FPGA > and then onto external sdram to be read by application software. All > looks ok except that some units in the field show occasional errors in > that signal read from sdram. The error is as follows: odd samples are > offset by 8 samples from the even. So if we remove this offset then > signal looks ok. > > I can't reproduce the error in the lab. So I depend on some > speculations. It could be the fifos or the sdram. Anyone has come across > such issue? my suspicion is on the sdram as it is configured as 8 pages? > Also the sdram itself has an internal fifo that muxes 128 bits onto 16 > (again factor of 8)? any input appreciated.
Focus on reproducing it in the lab - or in simulation. Xilinx FPGAs have multiple clock modules (DCMs) - you're using Altera so you'll have to translate terms. These have ways of generating a derivative clock with adjustable timing for clock phase adjustment : I have attacked similar problems by setting up the timing in a software-writable register, running memtests with every possible phase adjustment and mapping out the valid range of timings. If you have one or more of these spare, attach it to the SDRAM clock, and if you have another, attach it to your incoming data register, or your SDRAM address bus output, etc... Now you can run memory tests, stretching the timings until it fails. Hopefully one of the failure modes (but not more than one) will reproduce the error you are seeing. In my case, having found the likely failure mode this way, I was able to reproduce the effect in simulation, the rest was plain sailing. Incidentally I also recall a correlation between memory manufacturer and one failure mechanism : I concluded there was nothing specific wrong with the memory itself, but some disagreement between it and my RAM interface; I could make that one go away by specifying another memory. Is there any such variability in your case? -- Brian
I am still trying to figure out this issue of odd/even offset. My
suspicion has fallen on a dual clock fifo (32 bit wide) because when at
some stage its depth was 8 then odd/even offset was 8 samples. Now its
depth is 16 and odd/even offset is 16.

The next question is why would a fifo behave like that even if clocks
change phase or fifo gets empty/full. 

The fifo is protected against empty/full preventing read/write. The clocks
are asynchronous. It is a straight forward dc fifo with several other like
it in the design but only this one shows the problem occasionally.

I am planning to use dual port ram instead but wanted to know what has
gone wrong.

Kaz
---------------------------------------
Posted through http://www.FPGARelated.com
On Thursday, August 13, 2015 at 9:00:13 AM UTC-4, kaz wrote:
> > The next question is why would a fifo behave like that even if clocks > change phase or fifo gets empty/full. >
If your suspicion is correct, then it is because there is a bug in the dual clock design. I know that sounds trite, but you can't discount the obvious.
> The fifo is protected against empty/full preventing read/write. The clocks > are asynchronous. It is a straight forward dc fifo with several other like > it in the design but only this one shows the problem occasionally. >
Home grown fifo design?
> I am planning to use dual port ram instead but wanted to know what has > gone wrong. >
Even if the fifo is not home grown, I would suggest switching to another dual clock fifo design first for the following reasons: - It's going to be quicker to check this out since in some sense all you have to change is the entity that is being instantiated and possibly renaming parameters and ports - You get another tidbit of information. If the design still fails in the same way, then it 'could' be that the fifo is OK after all. If the design works, then it 'could' be that you're right about there being a problem in the fifo. If you go down the 'use dual port ram instead' path instead, this is simply re-inventing the dual clock fifo. Dual clock fifo designs are already based on dual port ram anyway. Going down that road is probably not the best way to go about getting to a solution. If the problem really is in the fifo, then it would be better to mentally trace it back in order to figure out what you could then instrument. In this particular case what I mean is that from what you describe, it looks like a bit in the read address is perhaps into the wrong state. So accept that as a given and work out what are the implications of that condition? One implication if the address pointer is suddenly wrong might be an unusual change in the number of entries in the fifo (like increasing after a read, or decreasing after a write or just changing by a 'large' amount over a short period). Now add some logic to monitor that condition and bring the results of that monitor out to a pin that you can trigger on. For example, maybe the number of words in the fifo should never change by more than 4 between any two read side clock cycles. So add some code that will detect that condition. Repeat for other conditions that you can come up with. Kevin Jennings
The fifo is not home grown. It is altera fifo core. We never discard well
tried cores for home made work.

DC fifo is built by Altera around dual ram but if (as in my case) the
clock rates are predictable then one can control wr/rd pointers each in
their clock domain without having to cross clock domains thus reducing
risk and resource. That is my point and is well known design
recommendation.

The fifo in question is just 32 bit wide dc fifo from altera core with
internal pipe set to 3, rd/wr protected, connected to clk 368.64 at write
side enabled 2/3 and connected to 245.76 on the read side always enabled.
Initially the read
enable is delayed to wait for few words (even though it is protected).

Timing is clean. I imagine the write pointer is working but the read
pointer is toggling between 0 and 15 with two clock delays leading to
samples 0,17,2,19 ...etc. Just a guess.

I have put a ram to capture few data from this fifo in the field when
problem occurs and I am awaiting results.
---------------------------------------
Posted through http://www.FPGARelated.com