FPGARelated.com
Forums

How to use the DDR SDRAM instead of Block RAM?

Started by Ken Soon March 16, 2007
Hi,
I have a design that is implemented on the Virtex4 and it consumes 69
FIFO16/RAMB16s. I hope to be able to implement it on a Spartan 3E evaluation
board but clearly, it is not enough as the 3E only has 36Block rams
My evaluation board has 4 DDR SDRAM of 512MBit, can I use those instead of
the insufficient block rams?

Cheers,
Ken


Ken Soon wrote:
> Hi, > I have a design that is implemented on the Virtex4 and it consumes 69 > FIFO16/RAMB16s. I hope to be able to implement it on a Spartan 3E evaluation > board but clearly, it is not enough as the 3E only has 36Block rams > My evaluation board has 4 DDR SDRAM of 512MBit, can I use those instead of > the insufficient block rams? > > Cheers, > Ken
It depends on how much bandwidth you need and whether or not you can spare the logic for the necessary memory controller. I am working on a high-performance memory controller of my own to interface a PC3200 DDR DIMM to an XC2VP30 that can sustain anywhere from 100% read to 100% write and my preliminary data says this is pretty expensive: my final memory controller will consume about 40 BRAMs out of 136. About half of the BRAM cost comes from the quad x36 internal memory ring bus controllers' FIFOs and most of the other half comes from the dual-ported back-end's 128bits wide FIFOs. This design is overkill for a VP30 but I am currently unemployed so I am simply killing time by doing something that may be useful later - or at the very least instructive since I have never interfaced DRAMs directly before. If your DRAMs provide you only a 32bits-wide data bus and you can operate your internal data path at twice the DDR clock, things get much simpler and smaller. A 0 BRAM memory controller would be possible under these conditions... but I'd still slap on a BRAM FIFO or three (local RX, local TX, pass-through) with extra logic to 'ringify' it if more than one or two function blocks are going to share access.
> > It depends on how much bandwidth you need and whether or not you can spare
the logic for
> the necessary memory controller. > > I am working on a high-performance memory controller of my own to
interface a PC3200 DDR
> DIMM to an XC2VP30 that can sustain anywhere from 100% read to 100% write
and my
> preliminary data says this is pretty expensive: my final memory controller
will consume
> about 40 BRAMs out of 136. About half of the BRAM cost comes from the quad
x36 internal
> memory ring bus controllers' FIFOs and most of the other half comes from
the dual-ported
> back-end's 128bits wide FIFOs. This design is overkill for a VP30 but I am
currently
> unemployed so I am simply killing time by doing something that may be
useful later - or at
> the very least instructive since I have never interfaced DRAMs directly
before.
> > If your DRAMs provide you only a 32bits-wide data bus and you can operate
your internal
> data path at twice the DDR clock, things get much simpler and smaller. A 0
BRAM memory
> controller would be possible under these conditions... but I'd still slap
on a BRAM FIFO
> or three (local RX, local TX, pass-through) with extra logic to 'ringify'
it if more than
> one or two function blocks are going to share access.
Thanks a lot Daniel. Actually, I am still quite new to fpga but I have been just tasked with this very difficult project (of video scaling), at least to me. Somehow I am to solve the problems of the block rams. (In addition, I didn't mention that I would also have to convert the DSP48s used in the Virtex to using the CLB in the Spartan). Well, anyway, been much help, at least I understand there is a probable chance of using the DDR SDRAM as block rams replacements. (Though I suspect there is also speed problems for doing so) Oh great attitude there, imo, on keeping on doing useful things!
Ken Soon wrote:
>> It depends on how much bandwidth you need and whether or not you can spare > the logic for >> the necessary memory controller. >> >> I am working on a high-performance memory controller of my own to > interface a PC3200 DDR >> DIMM to an XC2VP30 that can sustain anywhere from 100% read to 100% write > and my >> preliminary data says this is pretty expensive: my final memory controller > will consume >> about 40 BRAMs out of 136. About half of the BRAM cost comes from the quad > x36 internal >> memory ring bus controllers' FIFOs and most of the other half comes from > the dual-ported >> back-end's 128bits wide FIFOs. This design is overkill for a VP30 but I am > currently >> unemployed so I am simply killing time by doing something that may be > useful later - or at >> the very least instructive since I have never interfaced DRAMs directly > before. >> If your DRAMs provide you only a 32bits-wide data bus and you can operate > your internal >> data path at twice the DDR clock, things get much simpler and smaller. A 0 > BRAM memory >> controller would be possible under these conditions... but I'd still slap > on a BRAM FIFO >> or three (local RX, local TX, pass-through) with extra logic to 'ringify' > it if more than >> one or two function blocks are going to share access. > > Thanks a lot Daniel. Actually, I am still quite new to fpga but I have been > just tasked with this very difficult project (of video scaling), at least to > me. Somehow I am to solve the problems of the block rams. (In addition, I > didn't mention that I would also have to convert the DSP48s used in the > Virtex to using the CLB in the Spartan). > > Well, anyway, been much help, at least I understand there is a probable > chance of using the DDR SDRAM as block rams replacements. (Though I suspect > there is also speed problems for doing so) > > Oh great attitude there, imo, on keeping on doing useful things!
Spartan 3 has hardware multipliers, so you are not entirely stuck with doing everything in CLBs... only the accumulate, multiplexing and few other unique features you might not have used. If you inferred those DSPs, you should not need to do anything in particular for the Spartan switch. For the DRAMs, first estimate the peak bandwidth you need and the best-case the memory you have can do. If your requirement is less than 75% of your memory's theoretical maximum, it should be doable with a relatively simple memory controller but you will most likely have to re-think your pipeline to accomodate the extra latency: you will at least have to add logic to prefetch data and BRAMs to buffer both the input(s) and output(s). You can start with pipelined bursts of eight words to do full-row transfers to start - video applications can usually be made very compatible with full-row transfers. For more random memory access patterns, bank interleaving and hiding precharge/activate behind burst transfers to other banks is practically mandatory to achieve better than 50%. Experience with DRAMs is often listed as an asset on jobs I am looking at so I am trying to complete a relevant design in case I get an interview for one of these. Things are progressing slowly and I am expecting lots of grief from the IOB side, in part due to inexperience and in part because the V2Pro's IOBs appear to be more quirky than the V4s I have worked on last year.
> Spartan 3 has hardware multipliers, so you are not entirely stuck with
doing everything in
> CLBs... only the accumulate, multiplexing and few other unique features
you might not have
> used. If you inferred those DSPs, you should not need to do anything in
particular for the
> Spartan switch. > > For the DRAMs, first estimate the peak bandwidth you need and the
best-case the memory you
> have can do. If your requirement is less than 75% of your memory's
theoretical maximum, it
> should be doable with a relatively simple memory controller but you will
most likely have
> to re-think your pipeline to accomodate the extra latency: you will at
least have to add
> logic to prefetch data and BRAMs to buffer both the input(s) and
output(s). You can start
> with pipelined bursts of eight words to do full-row transfers to start -
video
> applications can usually be made very compatible with full-row transfers.
For more random
> memory access patterns, bank interleaving and hiding precharge/activate
behind burst
> transfers to other banks is practically mandatory to achieve better than
50%.
> > Experience with DRAMs is often listed as an asset on jobs I am looking at
so I am trying
> to complete a relevant design in case I get an interview for one of these.
Things are
> progressing slowly and I am expecting lots of grief from the IOB side, in
part due to
> inexperience and in part because the V2Pro's IOBs appear to be more quirky
than the V4s I
> have worked on last year.
Yeh for the hardware multipliers, I guessed it automatically changed the DSP48s to the multipliers, but alas, shortage problems again. 60 mulitpliers needed for 36 available. Hmm, for the time being, I shall try to find the information about the peak bandwidth first. Then later on, i could move on to the logic for prefetch data and BRAMs as IO buffers. Heh, I guessed everyone got their own problems too. IOBs..huh.. Can't help you much on that though. Good luck to you! Thanks alot!~
Ken,

   The Spartan & Virtex parts are VERY different parts... you will
certainly encounter a handful of problems trying to port this design.
First, the spartan has multipliers, but the DSP slices include MUCH
more than that.  You will find that if the old design used the
multiplier-accumulate portion of the DSP slices, each of those is
going to eat up a bunch of FPGA fabric (slices) in the ported
version.  As for using the DRAM... you will need to either design a
controller or use an existing one.  Xilinx has a memory interface
generator that will generate a controller for you.  There are also
probably a couple on opencores.org.  If you are brand new to FPGA
design, I would definately not suggest trying to design your own -
DRAM controllers can be tricky.   But you will need to rewrite parts
of your code to properly interface to the controller, as it will
likely not look exactlly like a block ram.  Also, if the design made
use of the dual-ports on the BRAMs, then you're going to have to come
up with a way to get around the fact that you no longer have the
luxery of dual ports.  Good luck, you've got a hefty project ahead of
you.



On Mar 19, 8:41 pm, "Ken Soon" <c...@xilinx.com> wrote:
> > Spartan 3 has hardware multipliers, so you are not entirely stuck with > doing everything in > > CLBs... only the accumulate, multiplexing and few other unique features > you might not have > > used. If you inferred those DSPs, you should not need to do anything in > particular for the > > Spartan switch. > > > For the DRAMs, first estimate the peak bandwidth you need and the > > best-case the memory you> have can do. If your requirement is less than 75% of your memory's > > theoretical maximum, it> should be doable with a relatively simple memory controller but you will > most likely have > > to re-think your pipeline to accomodate the extra latency: you will at > least have to add > > logic to prefetch data and BRAMs to buffer both the input(s) and > > output(s). You can start > > > > > > > with pipelined bursts of eight words to do full-row transfers to start - > video > > applications can usually be made very compatible with full-row transfers. > For more random > > memory access patterns, bank interleaving and hiding precharge/activate > behind burst > > transfers to other banks is practically mandatory to achieve better than > 50%. > > > Experience with DRAMs is often listed as an asset on jobs I am looking at > so I am trying > > to complete a relevant design in case I get an interview for one of these. > Things are > > progressing slowly and I am expecting lots of grief from the IOB side, in > part due to > > inexperience and in part because the V2Pro's IOBs appear to be more quirky > than the V4s I > > have worked on last year. > > Yeh for the hardware multipliers, I guessed it automatically changed the > DSP48s to the multipliers, but alas, shortage problems again. 60 mulitpliers > needed for 36 available. > > Hmm, for the time being, I shall try to find the information about the peak > bandwidth first. Then later on, i could move on to the logic for prefetch > data and BRAMs as IO buffers. > > Heh, I guessed everyone got their own problems too. IOBs..huh.. Can't help > you much on that though. > Good luck to you! Thanks alot!~- Hide quoted text - > > - Show quoted text -
Ken Soon wrote:
> ... > Yeh for the hardware multipliers, I guessed it automatically changed the > DSP48s to the multipliers, but alas, shortage problems again. 60 mulitpliers > needed for 36 available.
Analyze the design a bit; 60 multipliers sounds like a lot, though I have not done video work. Maybe you don't need all of them, or maybe some are being used to multiply small numbers that could be handled in LUTs. If some of the multipliers are only doing a multiply every 2 or 3 or 4 clocks, maybe some could be shared.
Hi,

Sure, DSP48 has more stuff grafted to the multiplier but the multiplier is 
by far the most costly and performance-critical function. Adders and muxes 
are not particularly expensive by comparison.

Since the OP said the application was video scaling, the access pattern 
should be highly linear and require loading only a few consecutive lines of 
video data for highly efficient processing. The (usually) highly linear 
structure of scaling functions and data usually lends itself quite nicely 
to pipelining so sharing some multipliers, assuming the overall count 
cannot be reduced in the first place, should not be too problematic at 
medium/low resolutions.

Building a custom DRAM controller is certainly not a novice's job: most 
companies that use DRAMs in high-end applications have at least one skilled 
engineer working full-time on optimizing, updating and testing memory 
controllers to achieve the highest throughput, lowest and steadiest 
latencies possible or making sure outsourced/third-party controllers work 
as advertised. Reworking any design to move from BRAMs to DRAMs (or even 
SRAMs, to a lesser extent) will be pretty far from trivial in all but a few 
rare cases since DRAM R/W control is of such a drastically different nature 
from BRAMs'.

Hours/days/weeks... possibly months of fun ahead.

Paul wrote:
> Ken, > > The Spartan & Virtex parts are VERY different parts... you will > certainly encounter a handful of problems trying to port this design. > First, the spartan has multipliers, but the DSP slices include MUCH > more than that. You will find that if the old design used the > multiplier-accumulate portion of the DSP slices, each of those is > going to eat up a bunch of FPGA fabric (slices) in the ported > version. As for using the DRAM... you will need to either design a > controller or use an existing one. Xilinx has a memory interface > generator that will generate a controller for you. There are also > probably a couple on opencores.org. If you are brand new to FPGA > design, I would definately not suggest trying to design your own - > DRAM controllers can be tricky. But you will need to rewrite parts > of your code to properly interface to the controller, as it will > likely not look exactlly like a block ram. Also, if the design made > use of the dual-ports on the BRAMs, then you're going to have to come > up with a way to get around the fact that you no longer have the > luxery of dual ports. Good luck, you've got a hefty project ahead of > you. > >> Hmm, for the time being, I shall try to find the information about the peak >> bandwidth first. Then later on, i could move on to the logic for prefetch >> data and BRAMs as IO buffers. >> >> Heh, I guessed everyone got their own problems too. IOBs..huh.. Can't help >> you much on that though. >> Good luck to you! Thanks alot!~- Hide quoted text - >> >> - Show quoted text - > >
"Duane Clark" <junkmail@junkmail.com> wrote in message
news:aITLh.15$rO7.4@newssvr25.news.prodigy.net...
> Ken Soon wrote: > > ... > > Yeh for the hardware multipliers, I guessed it automatically changed the > > DSP48s to the multipliers, but alas, shortage problems again. 60
mulitpliers
> > needed for 36 available. > > Analyze the design a bit; 60 multipliers sounds like a lot, though I > have not done video work. Maybe you don't need all of them, or maybe > some are being used to multiply small numbers that could be handled in > LUTs. If some of the multipliers are only doing a multiply every 2 or 3 > or 4 clocks, maybe some could be shared.
That's a good idea of sharing the multipliers. Hmm I just have to figured out though. I will be using the DRAM controller found on the Xilinx website since I am working on the Spartan 3E, make things a little less tricky, ya. Well I am spanking brand new to FPGA design though unless you wanna count just simply grabbing simple designs from the net to configure the Spartan 3E starter kit. Yep definitely, months of "fun" ahead of me.
"Ken Soon" <csoon@xilinx.com> writes:

> "Duane Clark" <junkmail@junkmail.com> wrote in message > news:aITLh.15$rO7.4@newssvr25.news.prodigy.net... > > Ken Soon wrote: > > > ... > > > Yeh for the hardware multipliers, I guessed it automatically changed the > > > DSP48s to the multipliers, but alas, shortage problems again. 60 > mulitpliers > > > needed for 36 available. > > > > Analyze the design a bit; 60 multipliers sounds like a lot, though I > > have not done video work. Maybe you don't need all of them, or maybe > > some are being used to multiply small numbers that could be handled in > > LUTs. If some of the multipliers are only doing a multiply every 2 or 3 > > or 4 clocks, maybe some could be shared. > > That's a good idea of sharing the multipliers. Hmm I just have to figured > out though. > > I will be using the DRAM controller found on the Xilinx website > since I am working on the Spartan 3E, make things a little less > tricky, ya.
<snip> Could you provide a pointer to that DRAM controller? thutt