FPGARelated.com
Forums

Virtex 4 Tapped Delay Lines

Started by al99999 November 26, 2005
Hi,

I was wondering if anybody could help.  I'm looking for a way to
create tapped delay lines on a Xilinx Virtex 4 without having to
specify which logic slices should be used.  I'm trying to create a
time interval analyser to measure
accurately (to within approx 500 picoseconds) the length of time
between a start and a stop pulse.

If anybody has any simpler ideas of how to do this instead of using
tapped delay lines i'd be very grateful.  The alternative i'd been
thinking of were to up the clock speed (currently at 10MHz) to say
250MHz and then have 8 phase shifts of this clock.  When the start
(or stop) pulse arrives it would only AND correctly with one set of
phases specifying which part of the original clock pulse it was in,
but this might be prone to strange delays etc. through the AND gate.

Thanks a lot!

Alastair

On Sat, 26 Nov 2005 19:15:36 -0600, alastairlynch@blueyonder.co-dot-uk.no-spam.invalid (al99999) wrote:
>Hi, > >I was wondering if anybody could help. I'm looking for a way to >create tapped delay lines on a Xilinx Virtex 4 without having to >specify which logic slices should be used. I'm trying to create a >time interval analyser to measure >accurately (to within approx 500 picoseconds) the length of time >between a start and a stop pulse.
Xilinx has a tapped delay line in each IOB for skewing the data path, and this can be between the input pad and the input FF. "Virtex-4 modules have an IDELAY module in the input path of every user I/O. IDELAY allows the implementation of deskew algorithms to correctly capture incoming data. IDELAY can be applied to data signals, clock signals, or both. IDELAY features a fully-controllable, 64-tap delay line. Each tap delay is carefully calibrated to provide an absolute delay value of 78 ps independent of process, voltage, and temperature variations." So the range is 5 ns, with 78 ps resolution. http://toolbox.xilinx.com/docsan/xilinx7/books/data/docs/v4lsc/v4lsc0122_113.html
>If anybody has any simpler ideas of how to do this instead of using >tapped delay lines i'd be very grateful. The alternative i'd been >thinking of were to up the clock speed (currently at 10MHz) to say >250MHz and then have 8 phase shifts of this clock. When the start >(or stop) pulse arrives it would only AND correctly with one set of >phases specifying which part of the original clock pulse it was in, >but this might be prone to strange delays etc. through the AND gate.
So you are on the right track. I think with IDELAY and your example numbers, you would bring the input signal in on 8 input pins, and set the delay from 0 to 3.5 nS , all the input flipflops are clocked by your 250 MHz clock, and then you decode the resulting 8 bit result to get your fine timing, and count the 250 MHz clock for coarse timing. You can either double your resolution or halve the number of inputs if you also take advantage of the DDR capability: At 250 MHz, your cycle time is 4 ns, and half cyycle time is 2 ns, so a DDR input FF-pair, with IDELAY set to 0, gives you sampling at 0 and 2 ns. With IDELAY set to 7 (546 ps) you get sampling at 0.546 and 2.546 ns.
>Thanks a lot! > >Alastair
I would also add some sort of calibration capability to the design such as being able to drive all the inputs from a reference signal (make the IOBs bidirectional), and then go through a calibration process to figure out the phase relationship to the effective 250 MHz clock edge, and the sampling time of the inputs. The result might feed a barrel shifter on the 8 bit code to handle the phase correction. A totally different approach might be to use the SerDes blocks if you have them, with all the 10B/8B decoding and other protocol logic disabled. You would then get a stream of 20 or 40 bit words with the SerDes receiver running from the transmitter clock and have it setup for 2.000 gigabits per second. Again, you would need some sort of calibration process to correct for the arbitrary phase relationship between the TX clock and the deserialized RX data. Once you figur this out, it should remain stable till you reset/cycle the power. Have fun, Philip
Quote:
Xilinx has a tapped delay line in each IOB for skewing the data path,
and this can be between the input pad and the input FF.

The range is 5 ns, with 78 ps resolution.

http://toolbox.xilinx.com/docsan/xilinx7/books/data/docs/v4lsc/v4lsc0122_113.html


Ok, looking at the datasheet for IDELAY in fixed delay mode, the only
output is 'O' which is the data output. Do I not need to be able to
access the output of the tap multiplexer?

Quote:
So you are on the right track. I think with IDELAY and your example
numbers, you would bring the input signal in on 8 input pins, and set
the delay from 0 to 3.5 nS , all the input flipflops are clocked by
your 250 MHz clock, and then you decode the resulting 8 bit result
to get your fine timing, and count the 250 MHz clock for coarse
timing.


What do you mean bring the input signal on to 8 input pins? Physically
wire up the input pulse to 8 of the virtex 4 IO pins?

Sorry, just a little bit confused!! I'd be grateful for any more detail
you could provide on how to go about doing this.

Thanks!

Alastair

> Xilinx has a tapped delay line in each IOB for skewing the data
path,
> and this can be between the input pad and the input FF. > > The range is 5 ns, with 78 ps resolution. > >
http://toolbox.xilinx.com/docsan/xilinx7/books/data/docs/v4lsc/v4lsc0122_113.html
>
Ok, looking at the datasheet for IDELAY in fixed delay mode, the only output is 'O' which is the data output. Do I not need to be able to access the output of the tap multiplexer?
> So you are on the right track. I think with IDELAY and your example > numbers, you would bring the input signal in on 8 input pins, and
set
> the delay from 0 to 3.5 nS , all the input flipflops are clocked by > your 250 MHz clock, and then you decode the resulting 8 bit result > to get your fine timing, and count the 250 MHz clock for coarse > timing.
What do you mean bring the input signal on to 8 input pins? Physically wire up the input pulse to 8 of the virtex 4 IO pins? Sorry, just a little bit confused!! I'd be grateful for any more detail you could provide on how to go about doing this. Thanks! Al
Al, Philip gave you good advice:
For each input pin, you can specify a delay from the pad to the O. The
granularity (given a 200 MHz calibration frequency) is 78.125 ps,but
each tap has its own non-cumulative error of about 15 ps.
I would improve your accuracy by using 16 inputs, each having a
different IDELAY value, so that you divide the 5 ns into 16 steps of
312 ps each (give or take a 15 ps non-accumulative error). The tap
delays are unaffected by any jitter of the 200 MHz clock.
You interconnect all 16 inputs. When an edge comes in, it will be
delayed differently in each IDELAY, and you use your 200 MHz clock to
register a 16-bit input word which has ones on one end, and zeros on
the other.
It's then your job to find the transition point (look-up-tables are
good for that),and that 4-bit binary value identifies the time as a
fraction of your 5 ns timing (200 MHz)
This means you have an absolute time for the rising as well as for the
falling edge, and the difference is your pulse width.  Worst-case error
is thus +/- one tap.
Peter Alfke, from home

Peter Alfke wrote:

 > Al, Philip gave you good advice:
 > For each input pin, you can specify a delay from the pad to the O. The
 > granularity (given a 200 MHz calibration frequency) is 78.125 ps,but
 > each tap has its own non-cumulative error of about 15 ps.
 > I would improve your accuracy by using 16 inputs, each having a
 > different IDELAY value, so that you divide the 5 ns into 16 steps of
 > 312 ps each (give or take a 15 ps non-accumulative error).


Are the pin-captures within this 15ps window, or is that just the
error of the delay elements themselves ?

  The tap

 > delays are unaffected by any jitter of the 200 MHz clock.
 > You interconnect all 16 inputs. When an edge comes in, it will be
 > delayed differently in each IDELAY, and you use your 200 MHz clock to
 > register a 16-bit input word which has ones on one end, and zeros on
 > the other.
 > It's then your job to find the transition point (look-up-tables are
 > good for that),and that 4-bit binary value identifies the time as a
 > fraction of your 5 ns timing (200 MHz)
 > This means you have an absolute time for the rising as well as for the
 > falling edge, and the difference is your pulse width.  Worst-case error
 > is thus � one tap.
 > Peter Alfke, from home


This sounds like a good app-note...., Peter ?

Such an app note could also cover :
a) If you use just one FPGA pin (eg existing PCB design), what are the
alternatives ?

b) Trickiest portion of this, I can see, will be crossing the 'phase 
boundary' between the delay line capture, and the counter-capture.
Edge detect flag could be as simple as Sample.0 <> Sample.15.

For the Calibrate Philip mentions, and this ease of edge detect, the
delay block should be toleranced to be always greater than the clock - 
ie  maybe 6ns for 200MHz.

  The Clock can be scaled, to match the FPGAs ability to count/capture
the edges - which will be related a little to the max time between edges
- longer counters are slower

c) Pattern detect might need to be single sample error tolerant.
ie a pattern of 111110100000000 might occur ?

-jg



The 15 ps are what I remember as the difference between the ideal delay
from pin to O vs the measured delay, because the taps are not perfectly
equal. As a difference between further non-adjacent taps, this
statistical error actually gets smaller. The total delay over the 64
bits is exactly 5 ns = one period of the 200 MHz clock. It is
servo-controlled. The 200 MHz are allowed to vary by +/-10%, (causing
of course an inversely proportional change in tap delay) although that
is not described in the data sheet..
I could imagine calibrating this with a variable frequency input of
<<200 MHz, effectively measuring the half-period of the incoming
signal. Any discontinuities could be attributed to wrong tap-settings
and/or different pc-board-to-chip (package) delays. This can of course
be remedied by changing individual tap settings (The design in question
uses only 25% of the available tap settings). Sampling errors as Jim
showed should be impossible, once the design is properly adjusted.

I think IDELAY is one of the most exciting innovations in Virtex-4
(together with the FIFO controller).
Peter Alfke, from home.

Thanks for all your help.  One quick last question, is it possible to
internally connect the pins, or do I need to physically wire them up
external to the fpga?  Thanks again,

Alastair

On Sun, 27 Nov 2005 11:15:45 -0600, alastairlynch@blueyonder.co-dot-uk.no-spam.invalid (al99999) wrote:
> >Ok, looking at the datasheet for IDELAY in fixed delay mode, the only >output is 'O' which is the data output. Do I not need to be able to >access the output of the tap multiplexer?
You need to look a bit further (which I admit is not easy, as the data sheet has to cover a massive amount of information, and if you don't know what you are looking for, it can be hard to find) In the user's guide (ug070.pdf) http://www.xilinx.com/bvdocs/userguides/ug070.pdf (Just documenting the I/O is from page 215 through 384) on page 309 is figure 7-1 and it shows some of what is in the input part of the I/O tile. (doesn't show ISERDES or I/O standards selection for example), but it does show the IDELAY and the DDR structure. Note that all the little muxes are config bits. The figure shows that the output from IDELAY can be used either directly (O), but it also can feed the inpout Flip Flops.
>> So you are on the right track. I think with IDELAY and your example >> numbers, you would bring the input signal in on 8 input pins, and >set >> the delay from 0 to 3.5 nS , all the input flipflops are clocked by >> your 250 MHz clock, and then you decode the resulting 8 bit result >> to get your fine timing, and count the 250 MHz clock for coarse >> timing. > >What do you mean bring the input signal on to 8 input pins? >Physically wire up the input pulse to 8 of the virtex 4 IO pins?
Right. Be careful of signal skew on your PCB. Or 4 inputs if you use my IDELAY + DDR suggestion.
>Sorry, just a little bit confused!! I'd be grateful for any more >detail you could provide on how to go about doing this.
This is tricky stuff. Unfortunately, to be able to get the most out of what these chips have to offer, it requires a lot of study. 169 pages of user guide is a lot to read when looking for one specific detail, but the investment in learning this stuff, is it is easier to find next time :-)
>Thanks! > >Al
Cheers, Philip
On Mon, 28 Nov 2005 03:15:36 -0600, alastairlynch@blueyonder.co-dot-uk.no-spam.invalid (al99999) wrote:
>Thanks for all your help. One quick last question, is it possible to >internally connect the pins, or do I need to physically wire them up >external to the fpga? Thanks again, > >Alastair
You could bring the signal in on 1 pin, and then setup 8 other I/Os as bi directional, and send the signal out on all 8, and then bring it back in on those 8, with the IDELAY stuff. Doing this will make the external pins wiggle, so they would all have to be "no connection" externally. Overall, I would not recommend this structure, as you will not have good control of the delay to each of the output circuits, and this would therefore add to the error in timing. I think it is best to distribute the signal on your PCB. Philip