FPGARelated.com
Forums

fifo or sdram bug?

Started by kaz July 30, 2015
In our system a signal is passed through a couple of fifos inside FPGA and
then onto external sdram to be read by application software. All looks ok
except that some units in the field show occasional errors in that signal
read from sdram. The error is as follows: odd samples are offset by 8
samples from the even. So if we remove this offset then signal looks ok.

I can't reproduce the error in the lab. So I depend on some speculations.
It could be the fifos or the sdram. Anyone has come across such issue? my
suspicion is on the sdram as it is configured as 8 pages? Also the sdram
itself has an internal fifo that muxes 128 bits onto 16 (again factor of
8)? any input appreciated.

Kaz
---------------------------------------
Posted through http://www.FPGARelated.com
On Thursday, July 30, 2015 at 5:05:50 AM UTC-4, kaz wrote:
> In our system a signal is passed through a couple of fifos inside FPGA and > then onto external sdram to be read by application software. All looks ok > except that some units in the field show occasional errors in that signal > read from sdram. The error is as follows: odd samples are offset by 8 > samples from the even. So if we remove this offset then signal looks ok. > > I can't reproduce the error in the lab. So I depend on some speculations. > It could be the fifos or the sdram. Anyone has come across such issue? my > suspicion is on the sdram as it is configured as 8 pages? Also the sdram > itself has an internal fifo that muxes 128 bits onto 16 (again factor of > 8)? any input appreciated. > > Kaz
Is the problem that the data is off by 8, or that the data gets stored in a location that is off by 8? If it's that the data is off by 8, then the number of sdram pages or the sdram muxing is not relevant. What exactly do you mean by 'off by 8'? Is data bit 3 in the wrong state? Is it that data bit 3 is always 1 when wrong or is it that data bit 3 is wrong, and when it is wrong that bit might be 1 or it might be 0. I would also highly doubt that the problem is in the commercial sdram, almost without doubt it is in something that you have designed, not elsewhere. - Has your design passed static timing analysis? - Are all of the I/O and clock frequencies correctly specified to the timing analysis tool? - Try warming up the part with a heat gun or cooling it off with cool spray in the lab. Does the design still work in the lab? If not, you have a timing problem. In fact, based on your description so far, it is almost certainly a timing issue, so that would be the best place to start looking. Kevin
>Is the problem that the data is off by 8, or that the data gets stored
in
>a location that is off by 8? If it's that the data is off by 8, then
the
>number of sdram pages or the sdram muxing is not relevant. What exactly
do
>you mean by 'off by 8'? Is data bit 3 in the wrong state? Is it that
data
>bit 3 is always 1 when wrong or is it that data bit 3 is wrong, and when >it is wrong that bit might be 1 or it might be 0.
data is 16 bits wide, nothing wrong with bits. all sample values are correct. odd samples do not follow their even members e.g. if a correct stream is indexed as 0,1,2,3,4,5,6,7,8,9,10...etc then what we get is: 0,9,2,11,4,13,6,15,8,17,10 Thus all samples are correct individually. even stream is correct as 0,2,4,6,8... and odd stream is also correct as sequence 9,11,13,15,...etc but there is this offset where instead of 0,1,2,3.. I get 0,9,2,11...
>- Has your design passed static timing analysis?
yes certainly,
>- Are all of the I/O and clock frequencies correctly specified to the >timing analysis tool?
yes
>- Try warming up the part with a heat gun or cooling it off with cool >spray in the lab. Does the design still work in the lab? If not, you
have a
>timing problem.
can't do that in the field, units are concealed mobile radio heads. we have deployed many thousands of them. only a tiny percentage shows the issue. Kaz --------------------------------------- Posted through http://www.FPGARelated.com

>- Try warming up the part with a heat gun or cooling it off with cool >spray in the lab. Does the design still work in the lab? If not, you
have a
>timing problem. > >In fact, based on your description so far, it is almost certainly a
timing
>issue, so that would be the best place to start looking. > >Kevin
We have done that in the lab, warming/freezing across full range but could not reproduce the issue. test was iterated over thousand times to catch any intermittent behaviour but all passed. Kaz --------------------------------------- Posted through http://www.FPGARelated.com

"kaz"  wrote in message=20
news:0p6dneVQkdmemifInZ2dnUU7-b2dnZ2d@giganews.com...


>Is the problem that the data is off by 8, or that the data gets stored
in
>a location that is off by 8? If it's that the data is off by 8, then
the
>number of sdram pages or the sdram muxing is not relevant. What =
exactly do
>you mean by 'off by 8'? Is data bit 3 in the wrong state? Is it that
data
>bit 3 is always 1 when wrong or is it that data bit 3 is wrong, and =
when
>it is wrong that bit might be 1 or it might be 0.
data is 16 bits wide, nothing wrong with bits. all sample values are correct. odd samples do not follow their even members e.g. if a correct stream is indexed as 0,1,2,3,4,5,6,7,8,9,10...etc then what we get is: 0,9,2,11,4,13,6,15,8,17,10 Thus all samples are correct individually. even stream is correct as 0,2,4,6,8... and odd stream is also correct as sequence = 9,11,13,15,...etc but there is this offset where instead of 0,1,2,3.. I get 0,9,2,11...
>- Has your design passed static timing analysis?
yes certainly,
>- Are all of the I/O and clock frequencies correctly specified to the >timing analysis tool?
yes
>- Try warming up the part with a heat gun or cooling it off with cool >spray in the lab. Does the design still work in the lab? If not, you
have a
>timing problem.
can't do that in the field, units are concealed mobile radio heads. we have deployed many thousands of them. only a tiny percentage shows the issue. ***************************************************************** Sounds like you have a conflict between your SDRAM column addressing and = the=20 setting of the SDRAM's burst mode. If your column address does not step in lumps of SDRAM burst addressing = then=20 your output data sequencing can (and will) get screwed. Make sure you have correctly specified the burst length for your SDRAM=20 driver and make sure your column address stepping agrees. Andy=20
On 7/30/2015 5:05 AM, kaz wrote:
> In our system a signal is passed through a couple of fifos inside FPGA and > then onto external sdram to be read by application software. All looks ok > except that some units in the field show occasional errors in that signal > read from sdram. The error is as follows: odd samples are offset by 8 > samples from the even. So if we remove this offset then signal looks ok. > > I can't reproduce the error in the lab. So I depend on some speculations. > It could be the fifos or the sdram. Anyone has come across such issue? my > suspicion is on the sdram as it is configured as 8 pages? Also the sdram > itself has an internal fifo that muxes 128 bits onto 16 (again factor of > 8)? any input appreciated.
You haven't said anything about your FPGA design in terms of where you could lose 8 samples of data or how your data is split between the odd and even samples. If your FPGA design does not split the data between odd and even, I'm not sure how you could have this problem. You also don't mention if the "lost" 8 samples of odd data ever show up somewhere or if it is just a synchronization problem from the first sample. Is there a place in your design where the odd and even samples are handled separately? Are you using two ADC converters to sample the same analog at a higher rate, for example? Do you write the odd/even samples to the SDRAM separately? -- Rick
>You haven't said anything about your FPGA design in terms of where you >could lose 8 samples of data or how your data is split between the odd >and even samples. If your FPGA design does not split the data between >odd and even, I'm not sure how you could have this problem. >
The 8 samples are not lost but odd substream is offset from the even substream regularly. Inside fpga the data is never split up into odd/even streams. data 16 bit wide enters a fifo (dc fifo with 16 bits output width). Then into another fifo(sc with output width 32 bits) then back to 16 bits at i/o to sdram. The two fifos are few words deep but could be a cause in theory i.e. if fifo ptr toggles between two separate counting sequences. Though Altera experts looked at them and were happy about the design and we added extra pipe just in case.
>Do you write the odd/even samples to the SDRAM separately? > >Rick
no. The problem is also sometimes self rectifying after some time. I assume that a glitch in control signals to sdram may change the column addressing mechanism as suggested by Andy but the sdram is 8k x128 x 128 x 8 banks thus 16 bits of data is muxed as 128 bits into each cell! Kaz --------------------------------------- Posted through http://www.FPGARelated.com
On Thursday, July 30, 2015 at 7:32:59 AM UTC-4, kaz wrote:
> >- Try warming up the part with a heat gun or cooling it off with cool > >spray in the lab. Does the design still work in the lab? If not, you > have a > >timing problem. > > > >In fact, based on your description so far, it is almost certainly a > timing > >issue, so that would be the best place to start looking. > > > >Kevin >=20 > We have done that in the lab, warming/freezing across full range but coul=
d
> not reproduce the issue. test was iterated over thousand times to catch > any intermittent behaviour but all passed. >=20 > Kaz >=20
Some next steps to consider... - You did the thermal testing with field return boards that exhibited the p= roblem, correct? - Are there other differences between lab use and field use that could cont= ribute such as with the power supply? The power supply is probably a stret= ch given the symptoms you describe, but just wondering what environmental d= ifferences might be going on. - What kind of DRAM are you using? - Commercial IP for the controller or home grown? - Since you said in another post that the data starts at 16 bits, widens to= 32 (I presume at input to the DRAM Controller) and then narrows back down = to 16 at the I/O pins. Are clock domains being crossed along the way? Is = your timing analysis set to ignore crossings? If so, shut that off and re-= analyze each crossing. - How does the DDR controller receive input commands? By that I mean is it= given addresses for each read/write to be performed or is it given a start= address and a burst size? If given a start address and burst size, then t= hat would likely exonerate everything that is upstream of the DRAM controll= er (except for possible clock domain crossing issues) - Review the PCB routing and look at signal integrity on the PCB? Focus on getting the field returns to fail in the lab. Without that you'll= have no way to verify any potential fix candidate. Kevin Jennings
On Thu, 30 Jul 2015 06:14:11 -0500, kaz wrote:

> >>- Has your design passed static timing analysis? > yes certainly, >
Then are you certain all your timing constraints are correct? I'm with KJ, this problem description makes me immediately leap to a timing problem. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
On 7/30/2015 12:56 PM, kaz wrote:
>> You haven't said anything about your FPGA design in terms of where you >> could lose 8 samples of data or how your data is split between the odd >> and even samples. If your FPGA design does not split the data between >> odd and even, I'm not sure how you could have this problem. >> > > The 8 samples are not lost but odd substream is offset from the even > substream regularly. > > Inside fpga the data is never split up into odd/even streams. data 16 bit > wide enters a fifo (dc fifo with 16 bits output width). Then into another > fifo(sc with output width 32 bits) then back to 16 bits at i/o to sdram.
How large is a "sample"?
> The two fifos are few words deep but could be a cause in theory i.e. if > fifo ptr toggles between two separate counting sequences. Though Altera > experts looked at them and were happy about the design and we added extra > pipe just in case.
??? The two FIFOs are sequential, not parallel, right? So how would the cause a shift in the odd/even data? Do the FIFOs use block RAM? I don't recall Altera having distributed memory so I guess block RAM is the only thing available. That means the FIFO memory is one block of memory unless you have fairly large FIFOs. Is any of this right?
>> Do you write the odd/even samples to the SDRAM separately? >> >> Rick > no. > > The problem is also sometimes self rectifying after some time. > > I assume that a glitch in control signals to sdram may change the column > addressing mechanism as suggested by Andy but the sdram is 8k x128 x 128 x > 8 banks thus 16 bits of data is muxed as 128 bits into each cell!
Not following this well. I think you are simply saying that the internal writes in the SDRAM are 128 bits so your 16 bit samples(?) are written 8 at a time. Unless you have some separation of odd/even samples I don't see how that would matter. How do you have your burst addressing set? There are different modes with different addressing. Only one is sequential. It has been too long since I've worked with SDRAM and I don't recall what that is all about. If this is the issue, it won't reproduce the symptoms as you have described. I believe you say that at the beginning of the fault 8 odd samples are dropped leaving the rest of the sequence out of alignment with the even samples. If they aren't dropped, where do they show up? With a burst addressing error the samples would be moved about, scrambled in some way, but not lost at all. When the unit "recovers", where does the extra data come from? Are 8 odd samples repeated? If you can figure out more details of the glitch at the beginning and end of the error sequence it might help explain where the problem is. -- Rick