FPGARelated.com
Forums

How To Synchronize FPGAs

Started by Leroy Tanner September 22, 2004
Hello newsreaders,

For a while I have been confronted with the following task which I find
quite challenging but unfortuantely didn't manage to solve it, yet.
What I want to do is to use 2-4 FPGAs (Xilinx Virtex 2 Pro) together on one
printed circuit board (PCB). They are used to process a large amount of
incoming serial data (data rates of several GHz's). My idea is to handle
that data parallel by the 2-4 FPGAs. But now there arises the problem how to
adequately split the data and how to synchronize the FPGAs among one
another, in particular?
Is it possible or first of all a realistic idea to synchronize multiple
FPGAs in the GHz range? How can this be done without much protocoll
overhead? I would like to do it without applying an extra transfer protocoll
among the FPGAs just for that purpose! Up to this date I didn't find a
proper solution, yet.
Maybe someone can give me a hint? Any ideas how to solve that problem?

Regards,    Leroy Tanner


Maybe I am missing something, but wouldn't you just drive all the chips with 
one onboard clock then in your code trigger the processes on the rising 
edge?

Don

"Leroy Tanner" <ikeepthespiritalive@freenet.de> wrote in message 
news:cirft3$j4c$1@mamenchi.zrz.TU-Berlin.DE...
> Hello newsreaders, > > For a while I have been confronted with the following task which I find > quite challenging but unfortuantely didn't manage to solve it, yet. > What I want to do is to use 2-4 FPGAs (Xilinx Virtex 2 Pro) together on > one > printed circuit board (PCB). They are used to process a large amount of > incoming serial data (data rates of several GHz's). My idea is to handle > that data parallel by the 2-4 FPGAs. But now there arises the problem how > to > adequately split the data and how to synchronize the FPGAs among one > another, in particular? > Is it possible or first of all a realistic idea to synchronize multiple > FPGAs in the GHz range? How can this be done without much protocoll > overhead? I would like to do it without applying an extra transfer > protocoll > among the FPGAs just for that purpose! Up to this date I didn't find a > proper solution, yet. > Maybe someone can give me a hint? Any ideas how to solve that problem? > > Regards, Leroy Tanner > >
Post Below...

"Don Golding" <dgolding@sbcglobal.net> wrote in message
news:Prf4d.24210$uJ3.5681@newssvr29.news.prodigy.com...
> Maybe I am missing something, but wouldn't you just drive all the chips
with
> one onboard clock then in your code trigger the processes on the rising > edge? > > Don > > "Leroy Tanner" <ikeepthespiritalive@freenet.de> wrote in message > news:cirft3$j4c$1@mamenchi.zrz.TU-Berlin.DE... > > Hello newsreaders, > > > > For a while I have been confronted with the following task which I find > > quite challenging but unfortuantely didn't manage to solve it, yet. > > What I want to do is to use 2-4 FPGAs (Xilinx Virtex 2 Pro) together on > > one > > printed circuit board (PCB). They are used to process a large amount of > > incoming serial data (data rates of several GHz's). My idea is to handle > > that data parallel by the 2-4 FPGAs. But now there arises the problem
how
> > to > > adequately split the data and how to synchronize the FPGAs among one > > another, in particular? > > Is it possible or first of all a realistic idea to synchronize multiple > > FPGAs in the GHz range? How can this be done without much protocoll > > overhead? I would like to do it without applying an extra transfer > > protocoll > > among the FPGAs just for that purpose! Up to this date I didn't find a > > proper solution, yet. > > Maybe someone can give me a hint? Any ideas how to solve that problem? > > > > Regards, Leroy Tanner > > > > > >
Start Post.... It gets tricky when you have multiple FPGAs clocked at hundred(s) of MHz. I don't have any direct expeience there, but I think looking for appnotes on vendor sites that address "Board Level De-skew" (using FPGA clocking resources to account for clock distribution headaches) and specifically for Xilinx, "Channel bonding" (using multiple RocketIO transceivers to receive data in parallel). The RocketIO transceivers are difficult beasts, at least if you're not using a standard protocol. I'm not sure if the channel bonding can span multiple V2pro devices, but I know it can span multiple transceivers. Not sure on your budget, or application requirements, but it may be worthwhile going to a single, larger part that contains the resources you need. It at least partially removes the headache of high-speed PCB design/layout. --Josh Model
...or at least take all the high speed serial stuff into one FPGA and
distribute it from that one to the others at a slower parallel rate. Also,
it looks like V4 could take care of this with its ChipSync thingy for source
synchronous application.
Cheers, Syms.
"Josh Model" <model@ll.nospam.mit.edu> wrote in message news:iWf4d.45>
> Not sure on your budget, or application requirements, but it may be > worthwhile going to a single, larger part that contains the resources you > need. It at least partially removes the headache of high-speed PCB > design/layout. > > > --Josh Model
Yes, you *are* missing something...  ;)

Don Golding wrote:
> > Maybe I am missing something, but wouldn't you just drive all the chips with > one onboard clock then in your code trigger the processes on the rising > edge? > > Don > > "Leroy Tanner" <ikeepthespiritalive@freenet.de> wrote in message > news:cirft3$j4c$1@mamenchi.zrz.TU-Berlin.DE... > > Hello newsreaders, > > > > For a while I have been confronted with the following task which I find > > quite challenging but unfortuantely didn't manage to solve it, yet. > > What I want to do is to use 2-4 FPGAs (Xilinx Virtex 2 Pro) together on > > one > > printed circuit board (PCB). They are used to process a large amount of > > incoming serial data (data rates of several GHz's). My idea is to handle > > that data parallel by the 2-4 FPGAs. But now there arises the problem how > > to > > adequately split the data and how to synchronize the FPGAs among one > > another, in particular? > > Is it possible or first of all a realistic idea to synchronize multiple > > FPGAs in the GHz range? How can this be done without much protocoll > > overhead? I would like to do it without applying an extra transfer > > protocoll > > among the FPGAs just for that purpose! Up to this date I didn't find a > > proper solution, yet. > > Maybe someone can give me a hint? Any ideas how to solve that problem? > > > > Regards, Leroy Tanner > > > >
-- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

Leroy Tanner wrote:

 >  But now there arises the problem how to
> adequately split the data and how to synchronize the FPGAs among one > another, in particular?
> Is it possible or first of all a realistic idea to synchronize multiple > FPGAs in the GHz range? How can this be done without much protocoll > overhead?
I believe most important is to first latch the signals in the IOB to minimize clock skew problems. Otherwise, an external shift register to generate bit parallel signals for input to the FPGA. -- glen
"Symon" <symon_brewer@hotmail.com>:
> ...or at least take all the high speed serial stuff into one FPGA and > distribute it from that one to the others at a slower parallel rate.
ok, I agree on that and it might be a good approach to minimize skewing in the first section. but nevertheless I must synchronize the other FPGAs to each other, not at a rate of several GHz but say at ca. 300 MHz. In my opinion a central clock isn't an appropriate solution!?
Think about what a central clock entails from purely a routing perspective.
Let's assume you're an SI wizard, and have no issues there.

300 MHz would be ~ 3.3 ns per clock cycle.  If I remember my rule of thumb,
you've got about 6 inches per 1 ns for the speed of an electrical signal in
FR-4 material.  So the worst case match between all your data lines and all
clock lines for all FPGA's will be the skew that eats into your timing
budget.

Just as an example (I'm not really a layout person, so it's my posterior
speaking), matching all lines to 4 FPGAs +/- 3 inches seems relatively
tricky, but not completely unreasonable.  So now ~1/3 of your entire clock
cycle is wasted (more, if you were assuming DDR) before you even get to the
FPGA fabric.  it makes laying out your design that much more tricky.

Now, in the slightly more real world you've got to throw in the jitter
present on a 300 MHz clock, impedance mismatches causing reflections,
crosstalk on your board with all that data zipping around (because GHz and
even 300 MHz lines are really antennae)  and you've got a lot to deal with.

Anyhow, synchronzing dataflow at those speeds on a PCB is not nearly as
simple as just plopping down a clock.  It's a hard design, but you get to
choose where to place the burden.  If you've got really good PCB people,
maybe they can match and terminate the really well.  If you've got the DCM/
DLL (or their altera, or "insert brand" counterpart) hardware to de-skew the
board clock, you could let the FPGA do it (though I don't recall at what
frequencies the DCM's top out).  If you've got neither, you might want to
consider going to a single chip serial interface, because you're going to
get into trouble otherwise.

--Josh


"Leroy Tanner" <ikeepthespiritalive@freenet.de> wrote in message
news:cj1476$9pc$1@mamenchi.zrz.TU-Berlin.DE...
> > "Symon" <symon_brewer@hotmail.com>: > > ...or at least take all the high speed serial stuff into one FPGA and > > distribute it from that one to the others at a slower parallel rate. > > ok, I agree on that and it might be a good approach to minimize skewing in > the first section. but nevertheless I must synchronize the other FPGAs to > each other, not at a rate of several GHz but say at ca. 300 MHz. In my > opinion a central clock isn't an appropriate solution!? > >
Hi Leroy,
Say you've got 4 FPGAs A, B, C & D. Each gets fed the 300MHz clock, so on
the fabric of each FPGA is CLK_A, CLK_B etc. When you send data from (say)
FPGA B to FPGA D, send a clock with the data, generated by FPGA B from its
internal CLK_B, called (say) CLK_B_TO_D. Use this source synchronous clock
with a DCM in FPGA D to get the data into a BRAM FIFO inside FPGA D. Get the
data out from this FIFO into the fabric of FPGA D using CLK_D. Repeat for
all the other paths. Any good?
Cheers, Syms.

"Leroy Tanner" <ikeepthespiritalive@freenet.de> wrote in message
news:cj1476$9pc$1@mamenchi.zrz.TU-Berlin.DE...
> > "Symon" <symon_brewer@hotmail.com>: > > ...or at least take all the high speed serial stuff into one FPGA and > > distribute it from that one to the others at a slower parallel rate. > > ok, I agree on that and it might be a good approach to minimize skewing in > the first section. but nevertheless I must synchronize the other FPGAs to > each other, not at a rate of several GHz but say at ca. 300 MHz. In my > opinion a central clock isn't an appropriate solution!? > >
On Wed, 22 Sep 2004 11:14:39 +0200, Leroy Tanner wrote:

> For a while I have been confronted with the following task which I find > quite challenging but unfortuantely didn't manage to solve it, yet. > What I want to do is to use 2-4 FPGAs (Xilinx Virtex 2 Pro) together on one > printed circuit board (PCB). They are used to process a large amount of > incoming serial data (data rates of several GHz's). My idea is to handle > that data parallel by the 2-4 FPGAs. But now there arises the problem how to > adequately split the data and how to synchronize the FPGAs among one > another, in particular?
There are two ways to approach this problem: (1) have each FPGA perform a part of the process on the entire data stream or (2) have each FPGA perform the entire process on part of the data stream. We once implemented (2) for a bandwidth expander where each chip did the complete process (one clock cycle Huffman decoding, translation of the code to a value, then arithmetic processing) for a portion of the incoming data stream. Each chip was provided a chunk of the incoming data (e.g., in a two-chip system, chip one processed chunks 1,3,5,... of the data and chip two was processed chunks 2,4,6,... of the data). We actually used two on the board because of I/O bandwidth limitations, but the chip was designed to allow for 1,2,4,or 8 chip operation. -=Dave=-