Daniel S. wrote:
> Weng Tianxiang wrote:
> >> Flexibility, scalability and routability are what makes ring buses so
> >> popular in modern large-scale, high-bandwidth ASICs and systems. It is
> >> all a matter of trading some up-front complexity and latency for
> >> long-term gain.
> >>
> >> Since high-speed parallel buses end up needing pipelining to meet
> >> high-speed timings, the complexity and area delta between multiple
> >> parallel buses and ring-bus topologies is shrinking.
> >>
> >
> > Hi Daniel,
> > It is very interesting to learn there is a ring bus structure over
> > there.
> >
> > "Flexibility, scalability and routability are what makes ring buses so
> > popular in modern large-scale, high-bandwidth ASICs and systems"
> >
> > Can you please me some reference papers about ring bus applications in
> > ASIC or FPGA?
> >
> > Normally what a designer is concerns most about is data latency in a
> > bus structure
> > Thank you.
> >
> > Weng
>
> Real-world ring-buses:
> - IBM Power4 Multi-Chip-Module core-to-core interconnect
> - IBM Power4 MCM-to-MCM interconnect
> - IBM Power4 system-to-system interconnect
> - ATI X1600/X1800 memory ring-bus
>
> IBM made lots of noise about its ring bus architecture a few years ago
> but I am pretty sure I read about something similar many years earlier.
> I am guessing Power5 must be using much of the same even though IBM did
> not make as much noise about it.
>
> --
> Daniel Sauvageau
> moc.xortam@egavuasd
> Matrox Graphics Inc.
> 1155 St-Regis, Dorval, Qc, Canada
> 514-822-6000

Hi Daniel,
1. IBM uses ring buses in its core-to-core interconnect. It is a good
option there, because time latency among cores are not urgent. That
saves a fast switch design among multiple CPUs. You may imagine if
there are 16 cores, how difficult it is to design a fastest switch
among the 16 cores!!! And IBM has good experiences with ring net
system.

2. I guess you cannot list a 2nd example to use ring buses in a CPU
among its tens of registers. Why? Internally nodoby could afford the
clock latency among any registers.

3. You option to use ring buses in your application is justified: you
are not concerned about the latency.

Thank you for ring bus information.

Weng

David Ashley wrote:
> Weng Tianxiang wrote:
> > David Ashley wrote:
> >
> >>Weng Tianxiang wrote:
> >>
> >>>Hi Daniel,
> >>>It is very interesting to learn there is a ring bus structure over
> >>>there.
> >>
> >>Weng,
> >>
> >>It occured to me that your circuit was identical to the ring
> >>buffer one. N users each had a fifo going to the DDR.
> >>Then there was one stream coming out of the DDR, so
> >>it's (N+1) interfaces. But then you said each user needs
> >>its own fifo so it can store + forward data from the DDR.
> >>So you've got 2N interfaces effectively. The new fifos are
> >>just moved into the user's realm and not part of the DDR
> >>controller.
> >>
> >>My point is the same circuitry exists in both cases. You've
> >>just exercised some creative accounting :).
> >>
> >>-Dave
> >>
> >>--
> >>David Ashley                http://www.xdr.com/dash
> >>Embedded linux, device drivers, system architecture
> >
> >
> > HI David,
> > I am really surprised to what you recognized.
> >
> > They are totally different.
> >
> > This is ring topology:
> >
> > A --> B --> C --> D
> > ^ -----------------------|
> >
> > This is my topoloty:
> >
> > A --> E --> | --> A
> > B --> E --> | --> B
> > C --> E --> | --> C
> > D --> E --> | --> D
> >
> > Weng
> >
>
> Right, ok I didn't understand what ring topology was, sorry.
> Snip out my reference to ring topology then but my observation
> still goes. E is the DDR, right? Anyway you don't show the
> fifos associated with ABCD on the right side.
>
> -Dave
>
> --
> David Ashley                http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture

Hi David,
Yes, every component has a read fifo internally. I don't have to show
them in the interface. They are not the part of interface.

1. We are talking about how to implement a multiple DDR controller
interface;
2. We are talking about its performance efficiency;
3. We are talking about the minimum number of wires of the DDR
controller interface;

An individual read fifo for each component will give designers a
freedon or a buffer to isolate many differences among DDR and
compoments.

For example, they may use different clocks, different data rate and so
on.

Weng

Weng Tianxiang wrote:
> David Ashley wrote:
> 
>>Weng Tianxiang wrote:
>>
>>>Hi Daniel,
>>>It is very interesting to learn there is a ring bus structure over
>>>there.
>>
>>Weng,
>>
>>It occured to me that your circuit was identical to the ring
>>buffer one. N users each had a fifo going to the DDR.
>>Then there was one stream coming out of the DDR, so
>>it's (N+1) interfaces. But then you said each user needs
>>its own fifo so it can store + forward data from the DDR.
>>So you've got 2N interfaces effectively. The new fifos are
>>just moved into the user's realm and not part of the DDR
>>controller.
>>
>>My point is the same circuitry exists in both cases. You've
>>just exercised some creative accounting :).
>>
>>-Dave
>>
>>--
>>David Ashley                http://www.xdr.com/dash
>>Embedded linux, device drivers, system architecture
> 
> 
> HI David,
> I am really surprised to what you recognized.
> 
> They are totally different.
> 
> This is ring topology:
> 
> A --> B --> C --> D
> ^ -----------------------|
> 
> This is my topoloty:
> 
> A --> E --> | --> A
> B --> E --> | --> B
> C --> E --> | --> C
> D --> E --> | --> D
> 
> Weng
> 

Right, ok I didn't understand what ring topology was, sorry.
Snip out my reference to ring topology then but my observation
still goes. E is the DDR, right? Anyway you don't show the
fifos associated with ABCD on the right side.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Weng Tianxiang wrote:
>> Flexibility, scalability and routability are what makes ring buses so
>> popular in modern large-scale, high-bandwidth ASICs and systems. It is
>> all a matter of trading some up-front complexity and latency for
>> long-term gain.
>>
>> Since high-speed parallel buses end up needing pipelining to meet
>> high-speed timings, the complexity and area delta between multiple
>> parallel buses and ring-bus topologies is shrinking.
>>
> 
> Hi Daniel,
> It is very interesting to learn there is a ring bus structure over
> there.
> 
> "Flexibility, scalability and routability are what makes ring buses so
> popular in modern large-scale, high-bandwidth ASICs and systems"
> 
> Can you please me some reference papers about ring bus applications in
> ASIC or FPGA?
> 
> Normally what a designer is concerns most about is data latency in a
> bus structure 
> Thank you.
> 
> Weng

Real-world ring-buses:
- IBM Power4 Multi-Chip-Module core-to-core interconnect
- IBM Power4 MCM-to-MCM interconnect
- IBM Power4 system-to-system interconnect
- ATI X1600/X1800 memory ring-bus

IBM made lots of noise about its ring bus architecture a few years ago 
but I am pretty sure I read about something similar many years earlier. 
I am guessing Power5 must be using much of the same even though IBM did 
not make as much noise about it.

-- 
Daniel Sauvageau
moc.xortam@egavuasd
Matrox Graphics Inc.
1155 St-Regis, Dorval, Qc, Canada
514-822-6000

David Ashley wrote:
> Weng Tianxiang wrote:
> > Hi Daniel,
> > It is very interesting to learn there is a ring bus structure over
> > there.
>
> Weng,
>
> It occured to me that your circuit was identical to the ring
> buffer one. N users each had a fifo going to the DDR.
> Then there was one stream coming out of the DDR, so
> it's (N+1) interfaces. But then you said each user needs
> its own fifo so it can store + forward data from the DDR.
> So you've got 2N interfaces effectively. The new fifos are
> just moved into the user's realm and not part of the DDR
> controller.
>
> My point is the same circuitry exists in both cases. You've
> just exercised some creative accounting :).
>
> -Dave
>
> --
> David Ashley                http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture

HI David,
I am really surprised to what you recognized.

They are totally different.

This is ring topology:

A --> B --> C --> D
^ -----------------------|

This is my topoloty:

A --> E --> | --> A
B --> E --> | --> B
C --> E --> | --> C
D --> E --> | --> D

Weng

Weng Tianxiang wrote:
> Hi Daniel,
> It is very interesting to learn there is a ring bus structure over
> there.

Weng,

It occured to me that your circuit was identical to the ring
buffer one. N users each had a fifo going to the DDR.
Then there was one stream coming out of the DDR, so
it's (N+1) interfaces. But then you said each user needs
its own fifo so it can store + forward data from the DDR.
So you've got 2N interfaces effectively. The new fifos are
just moved into the user's realm and not part of the DDR
controller.

My point is the same circuitry exists in both cases. You've
just exercised some creative accounting :).

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Daniel S. wrote:
> Weng Tianxiang wrote:
> > Daniel S. wrote:
> >> David Ashley wrote:
> >> Since routing multiple 32+bits buses consumes a fair amount of routing
> >> and control logic which needs tweaking whenever the design changes, I
> >> have been considering ring buses for future designs. As long as latency
> >> is not a primary issue, the ring-bus can also be used for data
> >> streaming, with the memory controller simply being one more possible
> >> target/initiator node.
> >>
> >> Using dual ring buses (clockwise + counter-clockwise) to link critical
> >> nodes can take care of most latency concerns by improving proximity. For
> >> large and extremely intensive applications like GPUs, the memory
> >> controller can have multiple ring bus taps to further increase bandwidth
> >> and reduce latency - look at ATI's X1600 GPUs.
> >>
> >> Ring buses are great in ASICs since they have no a-priori routing
> >> constraints, I wonder how well this would apply to FPGAs since these are
> >> optimized for linear left-to-right data paths, give or take a few
> >> rows/columns. (I did some preliminary work on this and the partial
> >> prototype reached 240MHz on V4LX25-10, limited mostly by routing and 4:1
> >> muxes IIRC.)
> >
> > Hi Daniel,
> > Here is my suggestion.
> > For example, there are 5 components which have access to DDR controller
> > module.
> > What I would like to do is:
> > 1. Each of 5 components has an output buffer shared by DDR controller
> > module;
> > 2. DDR controller module has an output bus shared by all 5 components
> > as their input bus.
> >
> > Each data has an additional bit to indicate if it is a data or a
> > command.
> > If it is a command, it indicates which the output bus is targeting at.
> > If it is a data, the data belongs to the targeted component.
> >
> > Output data streams look like this:
> > Command;
> > data;
> > ...
> > data;
> > Command;
> > data;
> > ...
> > data;
> >
> > In the command data, you may add any information you like.
> > The best benefit of this scheme is it has no delays and no penalty in
> > performance, and it has minimum number of buses.
> >
> > I don't see ring bus has any benefits over my scheme.
> >
> > In ring situation, you must have (N+1)*2 buses for N >= 2. In my
> > scheme, it must have N+1 buses, where N is the number of components,
> > excluding DDR controller module.
> >
> > Weng
>
> In a basic ring, there needs to be only N segments to create a closed
> loop with N nodes, memory-controller included. Double that for a fully
> doubly-linked bus.
>
> Why use a ring bus?
> - Nearly immune to wire delays since each node inserts bus pipelining
> FFs with distributed buffer control (big plus for ASICs)
> - Low signal count (all things being relative) memory controller:
> 	- 36bits input (muxed command/address/data/etc.)
> 	- 36bits output (muxed command/address/data/etc.)
> - Same interface regardless of how many memory clients are on the bus
> - Can double as a general-purpose modular interconnect, this can be
> useful for node-to-node burst transfers like DMA
> - Bandwidth and latency can be tailored by shuffling components,
> inserting extra memory controller taps or adding rings as necessary
> - Basic arbitration is provided for free by node ordering
>
> The only major down-side to ring buses is worst-case latency. Not much
> of an issue for me since my primary interest is video
> processing/streaming - I can simply preload one line ahead and pretty
> much forget about latency.
>
> Flexibility, scalability and routability are what makes ring buses so
> popular in modern large-scale, high-bandwidth ASICs and systems. It is
> all a matter of trading some up-front complexity and latency for
> long-term gain.
>
> Since high-speed parallel buses end up needing pipelining to meet
> high-speed timings, the complexity and area delta between multiple
> parallel buses and ring-bus topologies is shrinking.
>
> --
> Daniel Sauvageau
> moc.xortam@egavuasd
> Matrox Graphics Inc.
> 1155 St-Regis, Dorval, Qc, Canada
> 514-822-6000

Hi Daniel,
It is very interesting to learn there is a ring bus structure over
there.

"Flexibility, scalability and routability are what makes ring buses so
popular in modern large-scale, high-bandwidth ASICs and systems"

Can you please me some reference papers about ring bus applications in
ASIC or FPGA?

Normally what a designer is concerns most about is data latency in a
bus structure 
Thank you.

Weng

Weng Tianxiang wrote:
> Daniel S. wrote:
>> David Ashley wrote:
>> Since routing multiple 32+bits buses consumes a fair amount of routing
>> and control logic which needs tweaking whenever the design changes, I
>> have been considering ring buses for future designs. As long as latency
>> is not a primary issue, the ring-bus can also be used for data
>> streaming, with the memory controller simply being one more possible
>> target/initiator node.
>>
>> Using dual ring buses (clockwise + counter-clockwise) to link critical
>> nodes can take care of most latency concerns by improving proximity. For
>> large and extremely intensive applications like GPUs, the memory
>> controller can have multiple ring bus taps to further increase bandwidth
>> and reduce latency - look at ATI's X1600 GPUs.
>>
>> Ring buses are great in ASICs since they have no a-priori routing
>> constraints, I wonder how well this would apply to FPGAs since these are
>> optimized for linear left-to-right data paths, give or take a few
>> rows/columns. (I did some preliminary work on this and the partial
>> prototype reached 240MHz on V4LX25-10, limited mostly by routing and 4:1
>> muxes IIRC.)
> 
> Hi Daniel,
> Here is my suggestion.
> For example, there are 5 components which have access to DDR controller
> module.
> What I would like to do is:
> 1. Each of 5 components has an output buffer shared by DDR controller
> module;
> 2. DDR controller module has an output bus shared by all 5 components
> as their input bus.
> 
> Each data has an additional bit to indicate if it is a data or a
> command.
> If it is a command, it indicates which the output bus is targeting at.
> If it is a data, the data belongs to the targeted component.
> 
> Output data streams look like this:
> Command;
> data;
> ...
> data;
> Command;
> data;
> ...
> data;
> 
> In the command data, you may add any information you like.
> The best benefit of this scheme is it has no delays and no penalty in
> performance, and it has minimum number of buses.
> 
> I don't see ring bus has any benefits over my scheme.
> 
> In ring situation, you must have (N+1)*2 buses for N >= 2. In my
> scheme, it must have N+1 buses, where N is the number of components,
> excluding DDR controller module.
> 
> Weng

In a basic ring, there needs to be only N segments to create a closed 
loop with N nodes, memory-controller included. Double that for a fully 
doubly-linked bus.

Why use a ring bus?
- Nearly immune to wire delays since each node inserts bus pipelining 
FFs with distributed buffer control (big plus for ASICs)
- Low signal count (all things being relative) memory controller:
	- 36bits input (muxed command/address/data/etc.)
	- 36bits output (muxed command/address/data/etc.)
- Same interface regardless of how many memory clients are on the bus
- Can double as a general-purpose modular interconnect, this can be 
useful for node-to-node burst transfers like DMA
- Bandwidth and latency can be tailored by shuffling components, 
inserting extra memory controller taps or adding rings as necessary
- Basic arbitration is provided for free by node ordering

The only major down-side to ring buses is worst-case latency. Not much 
of an issue for me since my primary interest is video 
processing/streaming - I can simply preload one line ahead and pretty 
much forget about latency.

Flexibility, scalability and routability are what makes ring buses so 
popular in modern large-scale, high-bandwidth ASICs and systems. It is 
all a matter of trading some up-front complexity and latency for 
long-term gain.

Since high-speed parallel buses end up needing pipelining to meet 
high-speed timings, the complexity and area delta between multiple 
parallel buses and ring-bus topologies is shrinking.

-- 
Daniel Sauvageau
moc.xortam@egavuasd
Matrox Graphics Inc.
1155 St-Regis, Dorval, Qc, Canada
514-822-6000

Weng Tianxiang wrote:
><big cut>
> 1. My design never use module design methodology. I use a big file to
> contain all logic statements except modules from Xilinx core.
> 
> If a segment is to be used for other project, just a copy and paste to
> do the same things as module methodology does, but all signal names
> never change cross all function modules.

This is an interesting point. I just finished "VHDL for Logic Synthesis"
by Andrew Rushton, a book recommended by earlier post a few weeks
ago so I bought a copy. Rushton goes to great pains to say multiple
times:

"The natural form of hierarchy in VHDL, at least when it is used for RTL
design, is the component. Do not be tempted to use subprograms as a
form of hierarchical design! Any entity/architecture pair can be used
as a component in a higher level architecture. Thus, complex circuits
can be built up in stages from lower level components."

I was convinced by his arguments + examples. I'd think having a
modular component approach wouldn't harm you, because during
synthesis redundant interfaces + wires + logic would likely get
optimized away. So the overriding factor is choosing which is easiest
to implement, understand, maintain, share, etc. IE human factors.

Having said that as a 'c' programmer I almost never create libraries.
I have source code that does what I want, for a specific task. Later
if I have to do something similiar, I go look at what I've already done
and copy sections of code out as needed. Perfect example is the
Berkeley Sockets layer, the library calls are so obscure all you want
to do is cut and paste something you managed to get working
before, to do the same thing again...Alternative would be to wrap
the sockets interface in something else, supposedly simpler. But
then it wouldn't have all the functionality...

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Weng Tianxiang wrote:
> KJ wrote:
> > Weng Tianxiang wrote:
> > > KJ wrote:
> > > > "David Ashley" <dash@nowhere.net.dont.email.me> wrote in message
> > > > news:4505047b$1_1@x-privat.org...
> > > > > Weng Tianxiang wrote:
> > > > >> Hi Daniel,
> > > > >> Here is my suggestion.
> > > > >> For example, there are 5 components which have access to DDR controller
> > > > >> module.
> > > > >> What I would like to do is:
> > > > >> 1. Each of 5 components has an output buffer shared by DDR controller
> > > > >> module;
> > > > Not sure what is being 'shared'.  If it is the actual DDR output pins then
> > > > this is problematic....you likely won't be able to meet DDR timing when
> > > > those DDR signals are coming and spread out to 5 locations instead of just
> > > > one as it would be with a standard DDR controller.  Even if it did work for
> > > > 5 it wouldn't scale well either (i.e. 10 users of the DDR).
> > > >
> > > > If what is 'shared' is the output from the 5 component that feed in to the
> > > > input of the DDR controller, than you're talking about internal tri-states
> > > > which may be a problem depending on which target device is in question.
> > > >
> > > > <snip>
> > > > >> In the command data, you may add any information you like.
> > > > >> The best benefit of this scheme is it has no delays and no penalty in
> > > > >> performance, and it has minimum number of buses.
> > > > You haven't convinced me of any of these points.  Plus how it would address
> > > > the pecularities of DDRs themselves where there is a definite performance
> > > > hit for randomly thrashing about in memory has not been addressed.
> > > > >>
> > > > >> Weng
> > > > >>
> > > > >
> > > > > Weng,
> > > > >
> > > > > Your strategy seems to make sense to me. I don't actually know what a
> > > > > ring buffer is. Your design seems appropriate for the imbalance built
> > > > > into the system -- that is, any of the 5 components can initiate a
> > > > > command at any time, however the DDR controller can only respond
> > > > > to one command at a time. So you don't need a unique link to each
> > > > > component for data coming from the DDR.
> > > > A unique link to an arbitrator though allows each component to 'think' that
> > > > it is running independently and addressing DDR at the same time.  In other
> > > > words, all 5 components can start up their own transaction at the exact same
> > > > time.  The arbitration logic function would buffer up all 5, selecting one
> > > > of them for output to the DDR.  When reading DDR this might not help
> > > > performance much but for writing it can be a huge difference.
> > > >
> > > > >
> > > > > However thinking a little more on it, each of the 5 components must
> > > > > have logic to ignore the data that isn't targeted at themselves. Also
> > > > > in order to be able to deal with data returned from the DDR at a
> > > > > later time, perhaps a component might store it in a fifo anyway.
> > > > >
> > > > > The approach I had sort of been envisioning involved for each
> > > > > component you have 2 fifos, one goes for commands and data
> > > > > from the component to the ddr, and the other is for data coming
> > > > > back from the ddr. The ddr controller just needs to decide which
> > > > > component to pull commands from --  round robin would be fine
> > > > > for my application. If it's a read command, it need only stuff the
> > > > > returned data in the right fifo.
> > > > That's one approach.  If you think some more on this you should be able to
> > > > see a way to have a single fifo for the readback data from the DDR (instead
> > > > of one per component).
> > > >
> > > > KJ
> > >
> > > Hi,
> > > My scheme is not only a strategy, but a finished work. The following is
> > > more to disclose.
> > >
> > > 1. What means sharing between 1 component and DDR controller system is:
> > > The output fifo of one component are shared by one component and DDR
> > > controller module, one component uses write half and DDR uses another
> > > read half.
> > >
> > > 2. The output fifo uses the same technique as what I mentioned in the
> > > previous email:
> > > command word and data words are mixed, but there are more than that:
> > > The command word contains either write or read commands.
> > >
> > > So in the output fifo, data stream looks like this:
> > > Read command, address, number of bytes;
> > > Write command, address, number of bytes;
> > > Data;
> > > ...
> > > Data;
> > > Write command, address, number of bytes;
> > > Data;
> > > ...
> > > Data;
> > > Read command, address, number of bytes;
> > > Read command, address, number of bytes;
> > > ...
> > >
> > > 3. In DDR controller side, there is small logic to pick read commands
> > > from input command/data stream, then put them into a read command queue
> > > that is used by DDR module to access read commands. You don't have to
> > > worry why read command is put behind a write command. For all
> > > components, if a read command is issued after a write command, the read
> > > command cannot be executed until write data is fully written into DDR
> > > system to avoid interfering the write/read order.
> > >
> > > 4. The DDR has its output fifo and a different output bus. The output
> > > fifo plays a buffer that separate coupling between DDR its own
> > > operations and output function.
> > >
> > > DDR read data from DDR memory and put data into its output fifo. There
> > > is output bus driver that picks up data from the DDR output buffer,
> > > then put it in output bus in a format that target component likes best.
> > > Then the output bus is shared by 5 components which read their own
> > > data, like a wireless communication channel: they only listen and get
> > > their own data on the output bus, never inteference with others.
> > >
> > > 5. All components work at their full speeds.
> > >
> > > 6. Arbitor module resides in DDR controller module. It doesn't control
> > > which component should output data, but it controls which fifo should
> > > be read first to avoid its fullness and determine how to insert
> > > commands into DDR command streams that will be sent to DDR chip. In
> > > that way, all output fifo will work in full speeds according to their
> > > own rules.
> > >
> > > 7. Every component must have a read fifo to store data read from DDR
> > > output bus. One cannot skip the read fifo, because you must have a
> > > capability to adjust read speed for each component and read data from
> > > DDR output bus will disappear after 1 clock.
> > >
> > > In short, each component has a write fifo whose read side is used by
> > > DDR controller and a read fifo that picks data from DDR controller
> > > output bus.
> > >
> > > In the result, the number of wires used for communications between DDR
> > > controller and all components are dramatically reduced at least by more
> > > than 100 wires for a 5 component system.
> > >
> > > What is the other problem?
> > >
> > Weng,
> >
> > OK, I'm a bit clearer now on what you have now.  What you've described
> > is (I think) also functionally identical to what I was suggesting
> > earlier (which is also a working, tested and shipping design).
> >
> > >From a design reuse standpoint it is not quite as good as what I
> > suggested though.  A better partioning would be to have the fifos and
> > control logic in a standalone module.  Each component would talk point
> > to point with this new module on one side (equivalent to your
> > components writing commands and data into the fifo).  The function of
> > this module would be to select (based on whatever arbitration algorithm
> > is preferable) and output commands over a point to point connection to
> > a standard DDR Controller (this is equivalent to your DDR Controller
> > 'read' side of the fifo).  This module is essentially the bus
> > arbitration module.
> >
> > Whether implemented as a standalone module (as I've done) or embedded
> > into a customized DDR Controller (as you've done) ends up with the same
> > functionality and should result in the same logic/resource usage and
> > result in a working design that can run the DDRs at the best possible
> > rate.
> >
> > But in my case, I now have a standalone arbitration module with
> > standardized interfaces that can be used to arbitrate with totally
> > different things other than DDRs.  In my case, I instantiated three
> > arbitrators that connected to three separate DDRs (two with six
> > masters, one with 12) and a fourth arbitrator that connected 13 bus
> > masters to a single PCI bus.  No code changes are required, only change
> > the generics when instantiating the module to essentially 'tune' it to
> > the particular usage.
> >
> > One other point:  you probably don't need a read data fifo per
> > component, you can get away with just one single fifo inside the
> > arbitration module.  That fifo would not hold the read data but just
> > the code to tell the arbitrator who to route the read data back to.
> > The arbitor would write this code into the fifo at the point where it
> > initiates a read to the DDR controller.  The read data itself could be
> > broadcast to all components in parallel once it arrives back.  Only one
> > component though would get the signal flagging that the data was valid
> > based on a simple decode of  the above mentioned code that the arbitor
> > put into the small read fifo.  In other words, this fifo would only
> > need to be wide enough to handle the number of users (i.e. 5 masters
> > would imply a 3 bit code is needed) and only deep enough to handle
> > whatever the latency is between initiating a read command to the DDR
> > controller and when the data actually comes back.
> >
> > KJ
>
> Hi KJ,
> 1. My design never use module design methodology. I use a big file to
> contain all logic statements except modules from Xilinx core.
>
> If a segment is to be used for other project, just a copy and paste to
> do the same things as module methodology does, but all signal names
> never change cross all function modules.
>
> 2. Individual read fifo is needed for each component. The reason is
> issuing a read command and the data read back are not synchronous and
> one must have its own read fifo to store its own read data. After
> reading data falled into its read fifo, each components can decide what
> next to do on its own situation.
>
> If only one read buffer is used, big problems would arise. For example,
> if you have PCI-x/PCI bus, if their modules have read data, they cannot
> immediately transfer the read data until they get PCI-x/PCI bus
> control. That process may last very long, for example 1K clocks,
> causing other read data blocked by one read buffer design.
>
> 3. Strategically, by using my method one has a great flexibility to do
> anything you want in the fastest speed and with minimum wire
> connections among DDR controller and all components.
>
> Actually in my design there is no arbitor, because there is no common
> bus to arbitrate. There is onle write-fifo select logic to decide which
> write fifo should be picked first to write its data into DDR chip,
> based on many factors, not only because one write fifo has data.
>
> The many write factors include:
> a. write priority;
> b. write address if it falls into the same bank+column of current write
> command;
> c. if write fifo is approaching to be full, depending on the source
> date input rate;
> d. ...
>
> 4. Different components have different priority to access to DDR
> controller. You may imagine, for example, there are 2 PowerPC, one
> PCI-e, one PCI-x, one Gigabit stream. You may put priority table as
> like this to handle read commands:
> a. two PowerPC has top priority and they have equal rights to access
> DDR;
> b. PCI-e may the lowest one in priority, because it is a package
> protocol, any delays do few damages to the performance if any.
> c. ...
>
> Weng

Hi KJ,
If you like, please put your module interface in the group and I would
like to indicate which wires are redundent if my design was
implemented.

"In my case, I instantiated three
arbitrators that connected to three separate DDRs (two with six
masters, one with 12) and a fourth arbitrator that connected 13 bus
masters to a single PCI bus."

What you did is to expand PCI bus arbitor idea to DDR input bus.
In my design DDR doesn't need a bus arbitor at all. All components
connected with a DDR controller have no common bus to share and they
provide the best performance over yours. So from this point of view, my
DDR controller interface has nothing common with yours. Both work, but
in different strategies.

My strategy is more complex than yours, but with best performance. It
saves a middle write fifo for DDR controller: DDR controller has no
special write fifo, it uses all component write fifo as its write fifo,
saving clocks and memory space, getting best performance for DDR
controller.

Weng