comp.arch.fpga | serial protocol specs and verification| page 2

Reply by ●July 29, 20132013-07-29

On Saturday, July 27, 2013 1:59:46 AM UTC+2, rickman wrote:
> On 7/26/2013 11:22 AM, alb wrote:
> 
> > Hi all,
> 
> >
> 
> > I have the following specs for the physical level of a serial protocol:
> 
> >
> 
> >> For the communication with Frontend asynchronous LVDS connection is used.
> 
> >> The bitrate is set to 20 Mbps.
> 
> >> Data encoding on the LVDS line is NRZI:
> 
> >> - bit '1' is represented by a transition of the physical level,
> 
> >> - bit '0' is represented by no transition of the physical level,
> 
> >> - insertion of an additional bit '1' after 6 consecutive bits '0'.
> 
> >
> 
> > Isn't there a missing requirement on reset condition of the line?
> 
> > System clock is implicitly defined on a different section of the specs
> 
> > and is set at 40MHz.
> 
> >
> 
> > At the next layer there's a definition of a 'frame' as a sequence of 16
> 
> > bit words preceded by a 3 bit sync pattern (111) and a header of 16 bits
> 
> > defining the type of the packet and the length of the packet (in words).
> 
> >
> 
> > I'm writing a test bench for it and I was wondering whether there's any
> 
> > recommendation you would suggest. Should I take care about randomly
> 
> > select the phase between the system clock and the data?
> 
> 
> 
> Async, eh?  At 2x clock to data?  Not sure I would want to design this. 
> 
>   I assume you have to phase lock to the data stream somehow?  I think 
> 
> that is the part I would worry about.
> 
> 
> 
> In simulation I would recommend that you both jitter the data clock at a 
> 
> high bandwidth and also with something fairly slow.  The slow variation 
> 
> will test the operation of your data extraction with a variable phase 
> 
> and the high bandwidth jitter will check for problems from only having 
> 
> two samples per bit.  I don't know how this can be expected to work myself.
> 
> 
> 
> I did something similar where I had to run a digital phase locked loop 
> 
> on standard NRZ data (no encoding) and used a 4x clock, but I think I 
> 
> proved to myself I could do it with a 3x clock, it just becomes 
> 
> impossible to detect when you have a sample error... lol.
> 

Doesn't sound so different from usb (full speed)

usually done by sampling the 12mbit/s using a 48MHz clk or 
rising and falling edge on 24MHz clock


-Lasse

Reply by rickman ●July 29, 20132013-07-29

On 7/29/2013 4:36 PM, glen herrmannsfeldt wrote:
> rickman<gnuarm@gmail.com>  wrote:
>
> (snip, I wrote)
>
>>> Everyone's old favorite asynchronous serial RS232 usually uses a
>>> clock at 16x, though I have seen 64x. From the beginning of the
>>> start bit, it counts half a bit time (in clock cycles), verifies
>>> the start bit (and not random noise) then counts whole bits and
>>> decodes at that point. So, the actual decoding is done with a 1X
>>> clock, but with 16 (or 64) possible phase values. It resynchronizes
>>> at the beginning of each character, so it can't get too far off.
>
>> Yes, that protocol requires a clock matched to the senders clock to at
>> least 2.5% IIRC.  The protocol the OP describes has much longer char
>> sequences which implies much tighter clock precision at each end and I'm
>> expecting it to use a clock recovery circuit... but maybe not.  I think
>> he said they don't use one but get "frequent" errors.
>
> (snip)
>>> Seems to me that it should depend on how far of you can get.
>>> For async RS232, you have to stay within about a quarter bit time
>>> over 10 bits, so even if the clock is 2% off, it still works.
>>> But as above, that depends on having a clock of the appropriate
>>> phase.
>
>> Not sure why you mention phase.  In 232 type character async you have
>> *no* phase relationship between clocks.  There is no PLL so you aren't
>> phase locked to the data either.  I guess you mean a clock with enough
>> precision?
>
> The reason for the 16x clock is that it can then clock the bits
> in one at a time with any of 16 different phases. That is, the actual
> bits are only looked at once (usually).
>
>> I've never analyzed an async design with longer data streams
>> so I don't know how much precision would be required, but I"m
>> sure you can't do reliable data recovery with a 2x clock (without
>> a pll).  I think this would contradict the Nyquist criterion.
>
> If you start from the leading edge of the start bit, choose which
> cycle of the 2x clock is closest to the center, and count from there,
> seems to me you do pretty well if the clocks are close enough. Also,
> the bit times should be pretty close to correct.

That is the point.  With a 2x clock there isn't enough resolution to 
"pick" an edge.  The clock that detects the edge is somewhere in the 
first *half* of the start bit and the following clock is somewhere in 
the second half of the start bit... which do you use?  Doesn't matter, 
if the clock detecting the start bit is close enough to the wrong point, 
one or the other will be far too close to the next transition to 
guarantee that you are sampling data from the correct bit.

>> In my earlier comments when I'm talking about a PLL I am
>> referring to a digital PLL.  I guess I should have said a DPLL.
>
> I was thinking of an analog one. I still remember when analog (PLL
> based) data separators were better for floppy disk reading.
> Most likely by now, digital ones are better, possibly because
> of a higher clock frequency.

If you have an analog PLL then you just need to make sure your sample 
clock is *faster* than 2x the bit rate.  Then you can be certain of how 
many bits are between adjacent transitions.  But if at any time due to 
frequency error or jitter you sample on the wrong side of a transition 
you will get an unrecoverable error.

When it comes to analog media like disk drives where the position of the 
bit pulse can jitter significantly I would expect a significantly higher 
clock rate would be very useful.  It all comes down to distinguishing 
which half of the bit time the transition falls into.  With a run of six 
zeros (no transition) between 1 bits (transition) it becomes more 
important to sample with adequate resolution with a DPLL or to use an 
analog PLL.

I did a DPLL design for a data input to an IP circuit to packet card. 
It worked well in simulation and in product test and verification.  I'm 
not sure they have used this feature in the field though.  It was added 
to the product "just in case" and that depends on the customer needing 
the feature.

-- 

Rick

Reply by glen herrmannsfeldt ●July 29, 20132013-07-29

rickman <gnuarm@gmail.com> wrote:

(snip, I wrote)
>> If you start from the leading edge of the start bit, choose which
>> cycle of the 2x clock is closest to the center, and count from there,
>> seems to me you do pretty well if the clocks are close enough. Also,
>> the bit times should be pretty close to correct.

> That is the point.  With a 2x clock there isn't enough resolution to 
> "pick" an edge.  The clock that detects the edge is somewhere in the 
> first *half* of the start bit and the following clock is somewhere in 
> the second half of the start bit... which do you use?  

The easy way is to use the opposite edge of the clock. I suppose that
really means that the clock is 4x, though, so maybe that doesn't count.
Say you clock on the falling edge. If the clock is currently high,
the next falling edge will be less than half a cycle away. If it
is currently low, then it will be more. Using that, you can find the
falling edge closest to the center.

The hard way is to have the receive clock slightly faster or slightly
slower. That is, the speed such that if the first edge is in the
first half, later edges will be later in the bit time, and not past
the 3/4 mark. Now, having different receive and transmit clocks is
inconvenient, but not impossible.

> Doesn't matter,  if the clock detecting the start bit is close 
> enough to the wrong point, one or the other will be far too 
> close to the next transition to guarantee that you are sampling 
> data from the correct bit.

(snip)

>> I was thinking of an analog one. I still remember when analog (PLL
>> based) data separators were better for floppy disk reading.
>> Most likely by now, digital ones are better, possibly because
>> of a higher clock frequency.

> If you have an analog PLL then you just need to make sure your sample 
> clock is *faster* than 2x the bit rate.  Then you can be certain of how 
> many bits are between adjacent transitions.  But if at any time due to 
> frequency error or jitter you sample on the wrong side of a transition 
> you will get an unrecoverable error.

It is interesting in the case of magnetic media. The read head reads
changes in the recorded magnetic field. For single density (FM) there
is a flux transition at the edge of the bit cell (clock bit), and either
is or isn't one in the center (data bit). So, including jitter, the
data bit is +/- one quarter bit time from the center, and the clock
bits are +/- one quarter from the cell boundary. The data rate is half
the maximum flux transition rate.  The time between transitions is
either 1/2 or 1 bit time.

For the usual IBM double density (MFM), the data bits are again in the
center of the bit cell, but clock bits only occur on bit cell boundaries
between two zero (no transition) bits. The data rate is then equal to
the maximum flux transition rate. The time between transitions is then
either one or 1.5 bit times. The result, though, as you noted, is that
it is more sensitive to jitter. In the case of magnetic media response,
though, there is a predictable component to the transition times. 
As the field doesn't transition infinitely fast, the result is that
as two transitions get closer together, when read back they come
slightly farther apart than you might expect. Precompensation is then
used to correct for this. Transitions are moved slightly earlier
or slightly later, depending on the expected movement of the read
pulse.

> When it comes to analog media like disk drives where the position 
> of the bit pulse can jitter significantly I would expect a 
> significantly higher clock rate would be very useful.  

One way to do the precompensation is to run a clock fast enough
such that you can move the transition one cycle early or late.
The other way is with an analog delay line.

> It all comes down to distinguishing which half of the bit time 
> the transition falls into.  With a run of six zeros (no transition) 
> between 1 bits (transition) it becomes more important to sample 
> with adequate resolution with a DPLL or to use an analog PLL.

The early magnetic tape used NRZI coding, flux transition for one,
no transition for zero. Odd parity means at least one bit will
change for every character written to tape. Even parity means
at least two will change, but you can't write the character
with all bits zero. Both were used for 7 track (six bit characters)
and odd parity was used for 800 BPI 9 track tapes.  There can be
long runs of zero (no transition) for any individual track, but
taken together there is at least one.

For 1600 BPI tapes, IBM changed to PE, which is pretty similar to
that used for single density floppies. The flux transition rate can
be twice the bit rate (3200/inch) but each track has its own clock
pulse. It is fairly insensitive to head azimuth, unlike 800 BPI NRZI.
There are no long periods without a transition on any track.
Reading tapes is much more reliable, especially on a different
drive than the data was written on.

IBM 6250 tapes use GCR, with more complicated patterns of bit
transitions, and more variation in time between transitions.
Again, much more reliable than its predecessor.

> I did a DPLL design for a data input to an IP circuit to packet card. 
> It worked well in simulation and in product test and verification.  I'm 
> not sure they have used this feature in the field though.  It was added 
> to the product "just in case" and that depends on the customer needing 
> the feature.

-- glen

Reply by Richard Damon ●July 30, 20132013-07-30

On 7/29/13 5:09 AM, alb wrote:
> On 29/07/2013 03:05, Richard Damon wrote:
>> On 7/26/13 11:22 AM, alb wrote:
>>> Hi all,
>>>
>>> I have the following specs for the physical level of a serial protocol:
>>>
>>>> For the communication with Frontend asynchronous LVDS connection is used.
>>>> The bitrate is set to 20 Mbps.
>>>> Data encoding on the LVDS line is NRZI:
>>>> - bit '1' is represented by a transition of the physical level,
>>>> - bit '0' is represented by no transition of the physical level,
>>>> - insertion of an additional bit '1' after 6 consecutive bits '0'.
>>>
>>> Isn't there a missing requirement on reset condition of the line?
>>> System clock is implicitly defined on a different section of the specs
>>> and is set at 40MHz.
> []
>> You don't need to specify a reset state, as either level will work. At
>> reset the line will be toggling every 7 bit times due to the automatic
>> insertion of a 1 after 6 0s.
> 
> Uhm, since there's a sync pattern of '111' I have to assume that no
> frame is transmitted when only zeros are flowing (with the '1' stuffed
> every 6 zeros).

My assumption for the protocol would be that between frames an "all
zero" pattern is sent. (note that this is on the layer above the raw
transport level, where every time 6 zeros are sent, a 1 is added). Thus
all frames will begin with three 1s in a row, as a signal for start of
frame (and also gives a lot of transitions to help lock the clock if
using a pll).
> 
>> I would be hard pressed to use 40 MHz as a system clock, unless I was
>> allowed to use both edges of the clock (so I could really sample at a 4x
>> rate).
> 
> I'm thinking about having a system clock multiplied internally via PLL
> and then go for a x4 or x8 in order to center the bit properly.

I would think that sampling at 4x of the data rate is an minimum, faster
will give you better margins for frequency errors. So with a 20 MHz data
rate, you need to sample the data at 80 MHz, faster can help, and will
cause less jitter in your recovered data clock out.

Note that the first level of processing will perform data detection and
clock recovery, and this might be where the 40 MHz came from, a 40 MHz
processing system can be told most of the time to take data every other
clock cycle, but have bandwidth to at times if the data is coming in
slightly faster to take data on two consecutive clocks. You don't want
to make this clock much faster than that, as then it becomes harder to
design for no benefit. Any higher speed bit detection clock needs to
have the results translated to this domain for further processing.  (You
could also generate a recovered clock, but that starts you down the road
to an async design as the recovered clock isn't well related to your
existing clock, being a combinatorial result of registers clocked on
your sampling clock.)
> 
>>
>> For a test bench, I would build something that could be set to work
>> slightly "off frequency" and maybe even with some phase jitter in the
>> data clock. 
> 
> Rick was suggesting a phase jitter with a high and a low frequency
> component. This can be even a more realistic case since it models slow
> drifts due to temperature variations... I do not know how critical would
> be to simulate *all* jitter components of a clock (they may depend on
> temperature, power noise, ground noise, ...).
> 
>> I am assuming that system clock does NOT travel between
>> devices, or there wouldn't be as much need for the auto 1 bit, unless
>> this is just a bias leveling, but if isn't real great for that.
> 
> Your assumption is correct. No clock distribution between devices.
>

Reply by alb ●July 30, 20132013-07-30

Hi Rick,

On 29/07/2013 17:19, rickman wrote:
[]
>> what do you mean by saying 'it becomes impossible to detect when you
>> have a sample error'?
> 
> I was assuming that perhaps you were doing something I didn't quite
> understand, but I'm pretty sure I am on target with this.  You *must* up
> your sample rate by a sufficient amount so that you can guarantee you
> get a minimum of two samples per bit.  Otherwise you have no way to
> distinguish a slipped sample due to clock mismatch.  Clock frequency
> mismatch is guaranteed, unless you are using the same clock somehow.  Is
> that the case?  If so, the sampling would just be synchronous and I
> don't follow where the problem is.

There's no clock distribution, therefore each end has its own clock
on-board. We are certainly talking about same oscillator frequency, but
how well they match is certainly something we *do not* want to rely on.

> It is not just a matter of phase, but of frequency.  With a 2x clock,
> seeing a transition 3 clocks later doesn't distinguish one bit time from
> two bit times.

I agree with you, the 2x clock is not fine enough to adjust for phase
shifts and/or frequency mismatch.

> I'm having trouble expressing myself I think, but I'm trying to say the
> basic premise of this design is flawed because the sample clock is only
> 2x the data rate.  I say you need 3x and I strongly encourage 4x.  At 4x
> the samples have four states, expected timing, fast timing, slow timing
> and "error" timing meaning the loop control isn't working.

uhm, I didn't quite follow what you mean by 'fast timing' and 'slow
timing'. With perfect frequency matching I would expect a bit to have a
transition on cycle #2 (see graph). If the bit is slightly shifted I
would either notice the transition in cycle 2 or cycle 3 depending on
being it slightly earlier or slightly later than the clock edge.

           bit
         center
            ^
            |
cycles   2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
Data     ________--------________--------________--------_____
SmplClk  -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
SmplData __________--------________--------________--------___

On perfect frequency match SmplData will be 1 clock delayed.

> Data     ____----____----____----____----____----____----____
> SmplClk  --__--__--__--__--__--__--__--__--__--__--__--__--__
> SmplData -----____----____----____----____----____----____----
> 
> This is how you expect it to work.  But if the data is sampled slightly
> off it looks like this.

Uhm this graphics shows a clock frequency which is 1x the clock
frequency of the data... Am I missing something??? This will never work
of course...

> The sample clock does not need to be any particular ratio to the data
> stream if you use an NCO to control the sample rate.  Then the phase
> detection will bump the rate up and down to suit.

I might use the internal PLL to multiply the clock frequency to x4 data
frequency (=80 MHz) and then phase lock on data just looking at the
transition. If for some reason I see a transition earlier or later I
would adjust my recovered clock accordingly.

I'm sure this stuff has been implemented a gazillions of times.

> Do you follow what I am saying?  Or have I mistaken what you are doing?

I follow partially...I guess you understood what I'm saying, but I'm
loosing you somewhere in the middle of the explanation (especially with
the graph representing a 1x clock rate...).

Reply by alb ●July 30, 20132013-07-30

On 29/07/2013 19:40, glen herrmannsfeldt wrote:
[]
> Everyone's old favorite asynchronous serial RS232 usually uses a
> clock at 16x, though I have seen 64x. From the beginning of the 
> start bit, it counts half a bit time (in clock cycles), verifies
> the start bit (and not random noise) then counts whole bits and
> decodes at that point. So, the actual decoding is done with a 1X
> clock, but with 16 (or 64) possible phase values. It resynchronizes
> at the beginning of each character, so it can't get too far off.

I believe that with 4x or 8x you could easily resync at the bit level.
First transition comes in a shift register (4ff or 8ff), when the shift
register has half of the bit set and half reset you generate a clock to
sample data. Second transition comes in and the same mechanism happens.
The clock recovered is adjust to match when the transition happens in
the middle of the shift register.

Since the protocol is bit stuffed, it won't get too far off.

[]
>> I'm having trouble expressing myself I think, but I'm trying to say the 
>> basic premise of this design is flawed because the sample clock is only 
>> 2x the data rate.  I say you need 3x and I strongly encourage 4x.  At 4x 
>> the samples have four states, expected timing, fast timing, slow timing 
>> and "error" timing meaning the loop control isn't working.
> 
> Seems to me that it should depend on how far of you can get. 
> For async RS232, you have to stay within about a quarter bit time
> over 10 bits, so even if the clock is 2% off, it still works.
> But as above, that depends on having a clock of the appropriate
> phase.

IMO a phase shift does not matter too much, while the frequency mismatch
will accumulate time differences and lead the transmitter and receiver
to have different timings. But if you lock on phase shift it means you
lock on frequency as well.

Reply by rickman ●July 30, 20132013-07-30

On 7/30/2013 1:01 PM, alb wrote:
> Hi Rick,
>
> On 29/07/2013 17:19, rickman wrote:
> []
>>> what do you mean by saying 'it becomes impossible to detect when you
>>> have a sample error'?
>>
>> I was assuming that perhaps you were doing something I didn't quite
>> understand, but I'm pretty sure I am on target with this.  You *must* up
>> your sample rate by a sufficient amount so that you can guarantee you
>> get a minimum of two samples per bit.  Otherwise you have no way to
>> distinguish a slipped sample due to clock mismatch.  Clock frequency
>> mismatch is guaranteed, unless you are using the same clock somehow.  Is
>> that the case?  If so, the sampling would just be synchronous and I
>> don't follow where the problem is.
>
> There's no clock distribution, therefore each end has its own clock
> on-board. We are certainly talking about same oscillator frequency, but
> how well they match is certainly something we *do not* want to rely on.
>
>> It is not just a matter of phase, but of frequency.  With a 2x clock,
>> seeing a transition 3 clocks later doesn't distinguish one bit time from
>> two bit times.
>
> I agree with you, the 2x clock is not fine enough to adjust for phase
> shifts and/or frequency mismatch.

Ok, we are on the same page then.

>> I'm having trouble expressing myself I think, but I'm trying to say the
>> basic premise of this design is flawed because the sample clock is only
>> 2x the data rate.  I say you need 3x and I strongly encourage 4x.  At 4x
>> the samples have four states, expected timing, fast timing, slow timing
>> and "error" timing meaning the loop control isn't working.
>
> uhm, I didn't quite follow what you mean by 'fast timing' and 'slow
> timing'. With perfect frequency matching I would expect a bit to have a
> transition on cycle #2 (see graph). If the bit is slightly shifted I
> would either notice the transition in cycle 2 or cycle 3 depending on
> being it slightly earlier or slightly later than the clock edge.
>
>             bit
>           center
>              ^
>              |
> cycles   2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
> Data     ________--------________--------________--------_____
> SmplClk  -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-
> SmplData __________--------________--------________--------___
>
> On perfect frequency match SmplData will be 1 clock delayed.

No point in even discussing the "perfect" frequency match.

>> Data     ____----____----____----____----____----____----____
>> SmplClk  --__--__--__--__--__--__--__--__--__--__--__--__--__
>> SmplData -----____----____----____----____----____----____----
>>
>> This is how you expect it to work.  But if the data is sampled slightly
>> off it looks like this.
>
> Uhm this graphics shows a clock frequency which is 1x the clock
> frequency of the data... Am I missing something??? This will never work
> of course...

Yes, you are right, still your diagram above shows a 4x clock.  That 
will work all day long.  It is the 2x clock that doesn't work well.  A 
3x clock will work but can't provide any info on whether it is sync'd or 
not.  A 4x clock can tell if the data has slipped giving an error.

What I meant further up by the timing is that your circuit will detect 
the data transitions and try to sample near the middle of the stable 
portion.  So with a 4x clock if it sees a transition where it expects 
one, it is "on time".  If it sees a transition one clock early it knows 
it is "slow", if it sees a transition clock one late it knows it is 
"fast".  When it sees a transition in the fourth phase, it should assume 
that it is out of sync and needs to go into hunt mode.  Or you can get 
fancier and use some hysteresis for the transitions between "hunt" and 
"locked" modes.

I designed this with an NCO controlled PLL.  With your async protocol 
you should be able to receive a packet based on the close frequency 
matching of the two ends.  This would really just be correcting for the 
phase of the incoming data and not worrying about the frequency 
mismatch... like a conventional UART.  This circuit can realign every 7 
pulses max.  That would work I think.

I was making this a bit more complicated because in my case I didn't 
have matched frequency clocks, it was specified in the software to maybe 
1-2% and the NCO had to PLL to the incoming data to get a frequency 
lock.  I also didn't have bit stuffing so a long enough string without 
transitions would cause a lock slip.

>> The sample clock does not need to be any particular ratio to the data
>> stream if you use an NCO to control the sample rate.  Then the phase
>> detection will bump the rate up and down to suit.
>
> I might use the internal PLL to multiply the clock frequency to x4 data
> frequency (=80 MHz) and then phase lock on data just looking at the
> transition. If for some reason I see a transition earlier or later I
> would adjust my recovered clock accordingly.

Yes, that is it exactly.  The bit stuffing will give you enough 
transitions that you should never lose lock.  It is trying to do this at 
2x that won't work well because you can't distinguish early from late.

> I'm sure this stuff has been implemented a gazillions of times.
>
>> Do you follow what I am saying?  Or have I mistaken what you are doing?
>
> I follow partially...I guess you understood what I'm saying, but I'm
> loosing you somewhere in the middle of the explanation (especially with
> the graph representing a 1x clock rate...).

Sorry.  If this is not clear now, I'll try the diagram again... lol

I would give you my code, but in theory it is proprietary to someone 
else.  Just think state machine that outputs a clock enable every four 
states, then either adds a state or skips a state to stay in alignment 
only when it sees data transitions.  If it sees a transition in the 
fourth state, it is not in alignment.  If there is no transition the FSM 
just counts...

A timing diagram is worth a thousand words.

-- 

Rick

Reply by alb ●July 31, 20132013-07-31

On 29/07/2013 22:14, rickman wrote:
[]
>> Everyone's old favorite asynchronous serial RS232 usually uses a
>> clock at 16x, though I have seen 64x. From the beginning of the
>> start bit, it counts half a bit time (in clock cycles), verifies
>> the start bit (and not random noise) then counts whole bits and
>> decodes at that point. So, the actual decoding is done with a 1X
>> clock, but with 16 (or 64) possible phase values. It resynchronizes
>> at the beginning of each character, so it can't get too far off.
> 
> Yes, that protocol requires a clock matched to the senders clock to at
> least 2.5% IIRC.  The protocol the OP describes has much longer char
> sequences which implies much tighter clock precision at each end and I'm
> expecting it to use a clock recovery circuit... but maybe not.  I think
> he said they don't use one but get "frequent" errors.

At the physical level the bit stuffing will allow to resync continuously
therefore I'm not concerned if there's a clock recovery circuit.

We are using 40MHz (0.5 ppm stability) but after few seconds you can
already see how many cycles two clocks can drift apart.

> I've never analyzed an async design with longer data streams so I don't
> know how much precision would be required, but I"m sure you can't do
> reliable data recovery with a 2x clock (without a pll).  I think this
> would contradict the Nyquist criterion.

<neatpick mode on>
Nyquist criterion has nothing to do with being able to sample data. As a
matter of fact your internal clock is perfectly capable to sample data
flowing in your fpga without the need to be 2x the data rate.
<neatpick mode off>

> In my earlier comments when I'm talking about a PLL I am referring to a
> digital PLL.  I guess I should have said a DPLL.

Why bothering? If you have a PLL on your FPGA you can profit of it,
otherwise you need something fancier.

Reply by alb ●July 31, 20132013-07-31

On 30/07/2013 06:45, Richard Damon wrote:
[]
>> Uhm, since there's a sync pattern of '111' I have to assume that no
>> frame is transmitted when only zeros are flowing (with the '1' stuffed
>> every 6 zeros).
> 
> My assumption for the protocol would be that between frames an "all
> zero" pattern is sent. (note that this is on the layer above the raw
> transport level, where every time 6 zeros are sent, a 1 is added). Thus
> all frames will begin with three 1s in a row, as a signal for start of
> frame (and also gives a lot of transitions to help lock the clock if
> using a pll).

A frame is defined as follows:

- sync  :'111'
- header: dtype (4) - n.u.(2) - length (10)
- data  : (16) * length

in principle between frames there can be any number of zeros (with bit
stuffing). An 'all zero' pattern in this sense might be of any number of
bits.

[]
>> I'm thinking about having a system clock multiplied internally via PLL
>> and then go for a x4 or x8 in order to center the bit properly.
> 
> I would think that sampling at 4x of the data rate is an minimum, faster
> will give you better margins for frequency errors. So with a 20 MHz data
> rate, you need to sample the data at 80 MHz, faster can help, and will
> cause less jitter in your recovered data clock out.

I also agree with you, no way a 2x would be sufficient to recover a
phase shift.

> 
> Note that the first level of processing will perform data detection and
> clock recovery, and this might be where the 40 MHz came from, a 40 MHz
> processing system can be told most of the time to take data every other
> clock cycle, but have bandwidth to at times if the data is coming in
> slightly faster to take data on two consecutive clocks. 

A 40 MHz would be sampling 2x, which is clearly not sufficient.

> You don't want
> to make this clock much faster than that, as then it becomes harder to
> design for no benefit. Any higher speed bit detection clock needs to
> have the results translated to this domain for further processing.  (You
> could also generate a recovered clock, but that starts you down the road
> to an async design as the recovered clock isn't well related to your
> existing clock, being a combinatorial result of registers clocked on
> your sampling clock.)

The deframed data (the data portion of the above mentioned frame
structure) are going into a fifo, I think I can rework it to be a dual
clock fifo to cross domain.

Reply by rickman ●July 31, 20132013-07-31

On 7/31/2013 3:36 AM, alb wrote:
> On 29/07/2013 22:14, rickman wrote:
> []
>>> Everyone's old favorite asynchronous serial RS232 usually uses a
>>> clock at 16x, though I have seen 64x. From the beginning of the
>>> start bit, it counts half a bit time (in clock cycles), verifies
>>> the start bit (and not random noise) then counts whole bits and
>>> decodes at that point. So, the actual decoding is done with a 1X
>>> clock, but with 16 (or 64) possible phase values. It resynchronizes
>>> at the beginning of each character, so it can't get too far off.
>>
>> Yes, that protocol requires a clock matched to the senders clock to at
>> least 2.5% IIRC.  The protocol the OP describes has much longer char
>> sequences which implies much tighter clock precision at each end and I'm
>> expecting it to use a clock recovery circuit... but maybe not.  I think
>> he said they don't use one but get "frequent" errors.
>
> At the physical level the bit stuffing will allow to resync continuously
> therefore I'm not concerned if there's a clock recovery circuit.
>
> We are using 40MHz (0.5 ppm stability) but after few seconds you can
> already see how many cycles two clocks can drift apart.
>
>> I've never analyzed an async design with longer data streams so I don't
>> know how much precision would be required, but I"m sure you can't do
>> reliable data recovery with a 2x clock (without a pll).  I think this
>> would contradict the Nyquist criterion.
>
> <neatpick mode on>
> Nyquist criterion has nothing to do with being able to sample data. As a
> matter of fact your internal clock is perfectly capable to sample data
> flowing in your fpga without the need to be 2x the data rate.
> <neatpick mode off>

I don't know what you are talking about.  If you asynchronously sample, 
you very much do have to satisfy the Nyquist criterion.  A 2x clock, 
because it isn't *exactly* 2x, can *not* be used to capture a bitstream 
so that you can find the the transitions and know which bit is which. 
Otherwise there wouldn't be so many errors in the existing circuit.

>> In my earlier comments when I'm talking about a PLL I am referring to a
>> digital PLL.  I guess I should have said a DPLL.
>
> Why bothering? If you have a PLL on your FPGA you can profit of it,
> otherwise you need something fancier.

Not sure of your context.  You can't use the PLL on the FPGA to recover 
the clock from an arbitrary data stream.  It is not designed for that 
and will not work because of the gaps in data transitions.  It is 
designed to allow the multiplication of clock frequencies.  A DPLL can 
be easily designed to recover the clock, but needs to be greater than 3x 
the data rate in order to distinguish the fast condition from the slow 
condition.

You can use the FPGA PLL to multiply your clock from 2x to 4x to allow 
the DPLL to work correctly.

-- 

Rick

Previous 123 4 5 Next

serial protocol specs and verification

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group