FPGARelated.com
Forums

serial protocol specs and verification

Started by alb July 26, 2013
[snip]
> >A frame is defined as follows: > >- sync :'111' >- header: dtype (4) - n.u.(2) - length (10) >- data : (16) * length > >in principle between frames there can be any number of zeros (with bit >stuffing). An 'all zero' pattern in this sense might be of any number of >bits. >
[snip] Unless 'length' is limited, your worst case has header "0000001111111111" (with an extra bit stuffed) followed by 16 * 1023 = 16368 zeros, which will have 2728 ones stuffed into them. Total line packet length is 19113 symbols. If the clocks are within 1/19114 of each other, the same number of symbols will be received as sent, ASSUMING no jitter. You can't assume that, but if there is 'not much' jitter then perhaps 1/100k will be good enough for relative drift to not need to be corrected for. So, for version 1, use the 'sync' to establish the start of frame and the sampling point, simulate the 'Rx fast' and 'Rx slow' cases in parallel, and see whether it works. BTW, this is off-topic for C.A.F., as it is a system design problem not related to the implementation method. --------------------------------------- Posted through http://www.FPGARelated.com
On Wednesday, July 31, 2013 1:44:17 PM UTC+2, rickman wrote:
> On 7/31/2013 3:36 AM, alb wrote: > > > On 29/07/2013 22:14, rickman wrote: > > > [] > > >>> Everyone's old favorite asynchronous serial RS232 usually uses a > > >>> clock at 16x, though I have seen 64x. From the beginning of the > > >>> start bit, it counts half a bit time (in clock cycles), verifies > > >>> the start bit (and not random noise) then counts whole bits and > > >>> decodes at that point. So, the actual decoding is done with a 1X > > >>> clock, but with 16 (or 64) possible phase values. It resynchronizes > > >>> at the beginning of each character, so it can't get too far off. > > >> > > >> Yes, that protocol requires a clock matched to the senders clock to at > > >> least 2.5% IIRC. The protocol the OP describes has much longer char > > >> sequences which implies much tighter clock precision at each end and I'm > > >> expecting it to use a clock recovery circuit... but maybe not. I think > > >> he said they don't use one but get "frequent" errors. > > > > > > At the physical level the bit stuffing will allow to resync continuously > > > therefore I'm not concerned if there's a clock recovery circuit. > > > > > > We are using 40MHz (0.5 ppm stability) but after few seconds you can > > > already see how many cycles two clocks can drift apart. > > > > > >> I've never analyzed an async design with longer data streams so I don't > > >> know how much precision would be required, but I"m sure you can't do > > >> reliable data recovery with a 2x clock (without a pll). I think this > > >> would contradict the Nyquist criterion. > > > > > > <neatpick mode on> > > > Nyquist criterion has nothing to do with being able to sample data. As a > > > matter of fact your internal clock is perfectly capable to sample data > > > flowing in your fpga without the need to be 2x the data rate. > > > <neatpick mode off> > > > > I don't know what you are talking about. If you asynchronously sample, > > you very much do have to satisfy the Nyquist criterion. A 2x clock, > > because it isn't *exactly* 2x, can *not* be used to capture a bitstream > > so that you can find the the transitions and know which bit is which. > > Otherwise there wouldn't be so many errors in the existing circuit. > > > > > > >> In my earlier comments when I'm talking about a PLL I am referring to a > > >> digital PLL. I guess I should have said a DPLL. > > > > > > Why bothering? If you have a PLL on your FPGA you can profit of it, > > > otherwise you need something fancier. > > > > Not sure of your context. You can't use the PLL on the FPGA to recover > > the clock from an arbitrary data stream. It is not designed for that > > and will not work because of the gaps in data transitions. It is > > designed to allow the multiplication of clock frequencies. A DPLL can > > be easily designed to recover the clock, but needs to be greater than 3x > > the data rate in order to distinguish the fast condition from the slow > > condition. > > > > You can use the FPGA PLL to multiply your clock from 2x to 4x to allow > > the DPLL to work correctly. >
or do like many USB PHYs, assume the 2x clock in reasonably 50/50 and use a DDR input flop to sample at 4x -Lasse
On 7/31/2013 2:30 PM, langwadt@fonz.dk wrote:
> On Wednesday, July 31, 2013 1:44:17 PM UTC+2, rickman wrote: >> >> You can use the FPGA PLL to multiply your clock from 2x to 4x to allow >> the DPLL to work correctly. >> > > or do like many USB PHYs, assume the 2x clock in reasonably 50/50 and use > a DDR input flop to sample at 4x
Yes, that would be interesting to design actually, the logic gets two bits at the same time rather than one bit, I guess it makes the machine a bit more complicated in that you have to deal with four states and four possible input combinations. Still, not a big deal, just a bit of work on paper to understand the logic needed. -- Rick
On 31/07/2013 13:44, rickman wrote:
[]
>> <neatpick mode on> >> Nyquist criterion has nothing to do with being able to sample data. As a >> matter of fact your internal clock is perfectly capable to sample data >> flowing in your fpga without the need to be 2x the data rate. >> <neatpick mode off> > > I don't know what you are talking about. If you asynchronously sample, > you very much do have to satisfy the Nyquist criterion. A 2x clock, > because it isn't *exactly* 2x, can *not* be used to capture a bitstream > so that you can find the the transitions and know which bit is which.
A data stream which is *exactly* flowing with a frequency f can be *exactly* sampled with a clock frequency f, it happens continuously in your synchronous logic. What happened to Nyquist theorem? If you have a protocol with data and clock, does it mean that you will recognize only half of the bits because your clock rate is just equal to your data rate? I'm confused... IMO calling a signal 'asynchronous' does not make any difference. Mr. Nyquist referred to reconstructing an analog signal with a discrete sampling (no quantization error involved). How does that applies to digital transmission?
> Otherwise there wouldn't be so many errors in the existing circuit.
It does not work not because of Nyquist limit, but because the recovery of a phase shift cannot be done with just two clocks per bit. []
> You can use the FPGA PLL to multiply your clock from 2x to 4x to allow > the DPLL to work correctly.
This is what I meant indeed. I believe I confused DPLL with ADPLL...
On Wednesday, July 31, 2013 11:37:59 PM UTC+2, alb wrote:
> On 31/07/2013 13:44, rickman wrote: > > [] > > >> <neatpick mode on> > > >> Nyquist criterion has nothing to do with being able to sample data. As a > > >> matter of fact your internal clock is perfectly capable to sample data > > >> flowing in your fpga without the need to be 2x the data rate. > > >> <neatpick mode off> > > > > > > I don't know what you are talking about. If you asynchronously sample, > > > you very much do have to satisfy the Nyquist criterion. A 2x clock, > > > because it isn't *exactly* 2x, can *not* be used to capture a bitstream > > > so that you can find the the transitions and know which bit is which. > > > > A data stream which is *exactly* flowing with a frequency f can be > > *exactly* sampled with a clock frequency f, it happens continuously in > > your synchronous logic. What happened to Nyquist theorem? > > > > If you have a protocol with data and clock, does it mean that you will > > recognize only half of the bits because your clock rate is just equal to > > your data rate? I'm confused... > > > > IMO calling a signal 'asynchronous' does not make any difference. Mr. > > Nyquist referred to reconstructing an analog signal with a discrete > > sampling (no quantization error involved). How does that applies to > > digital transmission? > > > > > Otherwise there wouldn't be so many errors in the existing circuit. > > > > It does not work not because of Nyquist limit, but because the recovery > > of a phase shift cannot be done with just two clocks per bit. >
may not technically be Nyquist limit, but like so many things in nature the same relations are repeated and if you take NRZ you'll notice that the highest "frequency" (0101010101..) is only half of the data rate -Lasse
On 31/07/2013 15:36, RCIngham wrote:
> [snip] >> >> A frame is defined as follows: >> >> - sync :'111' >> - header: dtype (4) - n.u.(2) - length (10) >> - data : (16) * length >> >> in principle between frames there can be any number of zeros (with bit >> stuffing). An 'all zero' pattern in this sense might be of any number of >> bits. >> > [snip] > > Unless 'length' is limited, your worst case has header "0000001111111111" > (with an extra bit stuffed) followed by 16 * 1023 = 16368 zeros, which will > have 2728 ones stuffed into them. Total line packet length is 19113 > symbols.
Why you excluded the sync symbol? If the clocks are within 1/19114 of each other, the same number of
> symbols will be received as sent, ASSUMING no jitter.
5*10e-5 is a very large difference. We are using 0.5 ppm oscillators. The amount of symbols received has to take into account phase shift otherwise bits will be lost or oversampled.
> You can't assume > that, but if there is 'not much' jitter then perhaps 1/100k will be good > enough for relative drift to not need to be corrected for.
Still not sure what you are trying to say.
> So, for version 1, use the 'sync' to establish the start of frame and the > sampling point, simulate the 'Rx fast' and 'Rx slow' cases in parallel, and > see whether it works.
by saying 'in parallel' you mean a data stream with some bits slower and some faster? I think the main problem lies on the slight difference in clock frequencies which lead to increasing phase shift to the point where a bit is lost or oversampled.
> > BTW, this is off-topic for C.A.F., as it is a system design problem not > related to the implementation method.
IMO is an implementation issue, no specs will tell me how many times I need to sample the data stream. The system design does not have a problem IMO, it simply specify the protocol between two modules. But I will be more than happy if you could point me out to some more appropriate group.
On 7/31/13 9:36 AM, RCIngham wrote:
> [snip] >> > Unless 'length' is limited, your worst case has header "0000001111111111" > (with an extra bit stuffed) followed by 16 * 1023 = 16368 zeros, which will > have 2728 ones stuffed into them. Total line packet length is 19113 > symbols. If the clocks are within 1/19114 of each other, the same number of > symbols will be received as sent, ASSUMING no jitter. You can't assume > that, but if there is 'not much' jitter then perhaps 1/100k will be good > enough for relative drift to not need to be corrected for. > > So, for version 1, use the 'sync' to establish the start of frame and the > sampling point, simulate the 'Rx fast' and 'Rx slow' cases in parallel, and > see whether it works. > > BTW, this is off-topic for C.A.F., as it is a system design problem not > related to the implementation method. > >
Since you can resynchronize your sampling clock on each transition received, you only need to "hold lock" for the maximum time between transitions, which is 7 bit times. This would mean that if you have a nominal 4x clock, some sample points will be only 3 clocks apart (if you are slow) or some will be 5 clocks apart (if you are fast), while most will be 4 clock apart. This is the reason for the 1 bit stuffing.
>On 7/31/13 9:36 AM, RCIngham wrote: >> [snip] >>> >> Unless 'length' is limited, your worst case has header
"0000001111111111"
>> (with an extra bit stuffed) followed by 16 * 1023 = 16368 zeros, which
will
>> have 2728 ones stuffed into them. Total line packet length is 19113 >> symbols. If the clocks are within 1/19114 of each other, the same number
of
>> symbols will be received as sent, ASSUMING no jitter. You can't assume >> that, but if there is 'not much' jitter then perhaps 1/100k will be
good
>> enough for relative drift to not need to be corrected for. >> >> So, for version 1, use the 'sync' to establish the start of frame and
the
>> sampling point, simulate the 'Rx fast' and 'Rx slow' cases in parallel,
and
>> see whether it works. >> >> BTW, this is off-topic for C.A.F., as it is a system design problem not >> related to the implementation method. >> >> > >Since you can resynchronize your sampling clock on each transition >received, you only need to "hold lock" for the maximum time between >transitions, which is 7 bit times. This would mean that if you have a >nominal 4x clock, some sample points will be only 3 clocks apart (if you >are slow) or some will be 5 clocks apart (if you are fast), while most >will be 4 clock apart. This is the reason for the 1 bit stuffing. >
The bit-stuffing in long sequences of zeroes is almost certainly there to facilitate a conventional clock recovery method, which I am proposing not using PROVIDED THAT the clocks at each end are within a sufficiently tight tolerance. Detect the ones in the as-sent stream first, then decide which are due to bit-stuffing, and remove them. Deciding how tight a tolerance is 'sufficiently tight' is probably non-trivial, so I won't be doing it for free. --------------------------------------- Posted through http://www.FPGARelated.com
On 01/08/2013 11:56, RCIngham wrote:
[]
>> Since you can resynchronize your sampling clock on each transition >> received, you only need to "hold lock" for the maximum time between >> transitions, which is 7 bit times. This would mean that if you have a >> nominal 4x clock, some sample points will be only 3 clocks apart (if you >> are slow) or some will be 5 clocks apart (if you are fast), while most >> will be 4 clock apart. This is the reason for the 1 bit stuffing. >> > > The bit-stuffing in long sequences of zeroes is almost certainly there to > facilitate a conventional clock recovery method, which I am proposing not > using PROVIDED THAT the clocks at each end are within a sufficiently tight > tolerance. Detect the ones in the as-sent stream first, then decide which > are due to bit-stuffing, and remove them.
What is the gain of not using 'conventional clock recovery'?
On 8/1/2013 8:55 AM, alb wrote:
> On 01/08/2013 11:56, RCIngham wrote: > [] >>> Since you can resynchronize your sampling clock on each transition >>> received, you only need to "hold lock" for the maximum time between >>> transitions, which is 7 bit times. This would mean that if you have a >>> nominal 4x clock, some sample points will be only 3 clocks apart (if you >>> are slow) or some will be 5 clocks apart (if you are fast), while most >>> will be 4 clock apart. This is the reason for the 1 bit stuffing. >>> >> >> The bit-stuffing in long sequences of zeroes is almost certainly there to >> facilitate a conventional clock recovery method, which I am proposing not >> using PROVIDED THAT the clocks at each end are within a sufficiently tight >> tolerance. Detect the ones in the as-sent stream first, then decide which >> are due to bit-stuffing, and remove them. > > What is the gain of not using 'conventional clock recovery'?
I think the point is that if the sequences are short enough that the available timing tolerance is adequate, then you just don't need to recover timing from the bit stream. I've been looking at this, then working on other issues and have lost my train of thought on this. I believe that a PLL (or DPLL) is not needed as long as the input can be sampled fast enough and the reference frequency is matched closely enough. But it is still important to correct for "phase" as the OP puts it (IIRC) so that you can tell where the bits are and not sample on transitions, just like a conventional UART does it. We frequent enough transitions, the phase can be detected and aligned while the exact frequency does not need to be recovered. -- Rick