FPGARelated.com
Forums

RAM in Altera EABs and Xilinx Block Rams

Started by rickman June 12, 2004
I am using RAM in a processor design and I am having trouble
understanding exactly how best to use these functions for my design.  I
will be using them to implement stacks, program memory and data memory. 
Ideally the write function will look like an addressable register where
the address, data and enables are setup prior to the clock and the write
happens on the clock edge.  The read should be async so that I can
provide an address and get data after a delay.  

The Altera part is an EP1K50 where the EAB read can be async.  The write
however is only shown as either fully async or fully registered.  I
recall that I was warned when reading and writing the same address the
data out has a longer delay.  But I can't seem to find a reference to
that.  I am also unclear if I can use the write the way I want or if it
requires input registers.  

The Xilinx part is an XC3S400 with dual port block rams.  It seems like
the read path must be registered as well as the write path.  I think I
could live with that if I could read the data that is being written (top
of stack) in the same clock cycle.  But I belive the docs say that the
other port can either read the old data or is invalid.  But then I may
be able to use a single port ram for a stack.  The address would always
be pointing to the current TOS and as soon as a new value were pushed,
the next clock edge would read the new data as it is written to the new
address.  

I don't want to pipeline anything in this design to keep it very
simple.  Right now the design is pretty clean and the delay paths are
pretty short.  

Can anyone clarify how these rams work without pipelining?  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
rickman <spamgoeshere4@yahoo.com> wrote in message news:<40CB4C4A.9FB9CD47@yahoo.com>...
> I am using RAM in a processor design and I am having trouble > understanding exactly how best to use these functions for my design. I
Rick, I wish I had something more constructive to offer... I have a Stratix design and I use read latency of 2 cycles everywhere (one for address in, one for data out.) While one can eliminate the data output register it adds enough ns that it's just not worth it. I can't help noticing the (huge?) disparity between the 1K50 and the 3S400, and am surprised that you're still using the ACEX parts. In that vein, I'm carrying around the notion that _all_ newer FPGAs are or will require registered ports... so why not bite the bullet and go synchronous ? <snip>
> I don't want to pipeline anything in this design to keep it very > simple. Right now the design is pretty clean and the delay paths are > pretty short.
I'm also not sure from your post whether "pipelined" is synonymous with "registered", ie you're trying to do something like one instruction per clock cycle and/or you can't tolerate the 2 ticks latency. Also, what's you're desired clock speed ? Regards, -rajeev-
"rickman" <spamgoeshere4@yahoo.com> escribi&#4294967295; en el mensaje
news:40CB4C4A.9FB9CD47@yahoo.com...
> I am using RAM in a processor design and I am having trouble > understanding exactly how best to use these functions for my design. I > will be using them to implement stacks, program memory and data memory. > Ideally the write function will look like an addressable register where > the address, data and enables are setup prior to the clock and the write > happens on the clock edge. The read should be async so that I can > provide an address and get data after a delay. > > The Altera part is an EP1K50 where the EAB read can be async. The write > however is only shown as either fully async or fully registered. I > recall that I was warned when reading and writing the same address the > data out has a longer delay. But I can't seem to find a reference to > that. I am also unclear if I can use the write the way I want or if it > requires input registers. > > The Xilinx part is an XC3S400 with dual port block rams. It seems like > the read path must be registered as well as the write path. I think I > could live with that if I could read the data that is being written (top > of stack) in the same clock cycle. But I belive the docs say that the > other port can either read the old data or is invalid. But then I may > be able to use a single port ram for a stack. The address would always > be pointing to the current TOS and as soon as a new value were pushed, > the next clock edge would read the new data as it is written to the new > address. >
i dont know exactly how the spartan3 is related to the spartan2, but it might help you, check this out http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr5.html it says that when you write data, one of the ports reads what you're writting. From Coregen options i'd guess that you can also set it up as read-after-write (this one) or write-after-read (which would read the previous contents, and then write)
> I don't want to pipeline anything in this design to keep it very > simple. Right now the design is pretty clean and the delay paths are > pretty short. > > Can anyone clarify how these rams work without pipelining? >
Coregen ask you about that too, but the link i gave you dont mention anything. Though, if i recall correctly, i also read (somewhere in xilinx site) that the latency is dependant on the size of the RAM, bigger gets 2 cycles latency, but smaller can get 1 cycle i think. (sorry i dont have a link)
Xilinx (Virtex2 or Spartan3) BlockRAM reading while writing:
Any write operation also performs a read, and outputs it on the Do output.
The user can choose: write before read (= output the data that is being
witten), or read before write (=output the previous content that is now
being overwritten) or "no change"( keep the old data on the Do lines.


Peter Alfke

>
Hi Rick,
I can offer my experiences with Xilinx blockram. You're correct that both
the read and write are synchronous. There are three write options,
WRITE_FIRST, READ_FIRST and NO_CHANGE. Carefully (!) read about these in the
data sheet. I use WRITE_FIRST almost exclusively, where the "same clock edge
that writes the data input (DI) into the memory also transfers DI into the
output registers DO".
When I did my processor design, I also used one as a stack. Like your design
I didn't use pipelining. This was to keep the design small and simple. On
the BlockRAM I used one port for PUSHING/POPPING registers, and the other
for CALL/RETURN subroutine addresses. The catch with these blockrams is
that, if you read from one port whilst you're writing to the *same* address
on the other port, the read data is indeterminate. This makes sense if you
think about what the BlockRAM is doing. Check out 'Conflict Resolution' in
the user guide (I'm looking at ug012 for V2PRO). This means for me that I
can't do a POP instruction immediately after doing a CALL subroutine, and I
can't do a RETURN immediately after doing a PUSH. No problem to avoid this
in the code, of course. It's a wierd thing to do anyway.
The ModelSIM simulator also warns if conflicts occur and, of course,
simulates the RAM accurately.
Good luck!
Cheers, Syms.


Here is the official Xilinx text (I just rewrote this for the new User
Guide). 
Conflict Avoidance.
Virtex-2 BlockRAM is a true dual-port RAM where both ports can access any
memory location at any time. When accessing the SAME MEMORY LOCATION from
both ports, the user must, however, observe certain restrictions, specified
by the clock-to-clock set-up time window.See the following:

There are two fundamentally different situations:
The two ports either have a common clock ("Synchronous Clocking"), or the
clock frequency or phase is different for the two ports ("Asynchronous
Clocking").

Asynchronous Clocking is the more general case, where the active edges of
both clocks do not occur simultaneously:
There are no timing constraints when both ports perform a read operation on
the same location.
When one port performs a write operation, the other port must not read- or
write-access the same memory location by using a clock edge that falls
within the specified forbidden clock-to-clock set-up time window. (If this
restriction is ignored, a read operation might read unreliable data, perhaps
a mixture of old and new data in this location; a write operation might
result in wrong data stored in this location. There is, however, no risk of
physical damage to the device.)

Synchronous Clocking is the special case, where the active edges of both
port clocks occur simultaneously:
There are no timing constraints when both ports perform a read operation.
When one port performs a write operation, the other port must not write into
the same location, unless both ports write identical data.
When one port performs a write operation, the other port can reliably read
data from the same location if the write port is in READ_FIRST mode.
DATA_OUT will then reflect the previously stored data.

If the write port is in either WRITE_FIRST or in NO_CHANGE mode, then the
DATA-OUT on the read port would become invalid (unreliable). Obviously, the
read-port's mode setting does not affect this.

June 2004   Peter Alfke ( this text has not yet been posted on xilinx.com)

>
"rickman" <spamgoeshere4@yahoo.com> escribi&#4294967295; en el mensaje
news:40CB4C4A.9FB9CD47@yahoo.com...
> I am using RAM in a processor design and I am having trouble > understanding exactly how best to use these functions for my design. I > will be using them to implement stacks, program memory and data memory. > Ideally the write function will look like an addressable register where > the address, data and enables are setup prior to the clock and the write > happens on the clock edge. The read should be async so that I can > provide an address and get data after a delay. > > The Altera part is an EP1K50 where the EAB read can be async. The write > however is only shown as either fully async or fully registered. I > recall that I was warned when reading and writing the same address the > data out has a longer delay. But I can't seem to find a reference to > that. I am also unclear if I can use the write the way I want or if it > requires input registers. > > The Xilinx part is an XC3S400 with dual port block rams. It seems like > the read path must be registered as well as the write path. I think I > could live with that if I could read the data that is being written (top > of stack) in the same clock cycle. But I belive the docs say that the > other port can either read the old data or is invalid. But then I may > be able to use a single port ram for a stack. The address would always > be pointing to the current TOS and as soon as a new value were pushed, > the next clock edge would read the new data as it is written to the new > address. >
i dont know exactly how the spartan3 is related to the spartan2, but it might help you, check this out http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr5.html it says that when you write data, one of the ports reads what you're writting. From Coregen options i'd guess that you can also set it up as read-after-write (this one) or write-after-read (which would read the previous contents, and then write)
> I don't want to pipeline anything in this design to keep it very > simple. Right now the design is pretty clean and the delay paths are > pretty short. > > Can anyone clarify how these rams work without pipelining? >
Coregen ask you about that too, but the link i gave you dont mention anything. Though, if i recall correctly, i also read (somewhere in xilinx site) that the latency is dependant on the size of the RAM, bigger gets 2 cycles latency, but smaller can get 1 cycle i think. (sorry i dont have a link)
Quoting Peter's text from below, "When one port performs a write operation,
the other port must not write into the same location, unless both ports
write identical data."

For a one-port dedicated read and one-port dedicated write configuration
that I *believe* rickman is pursuing, a little trick could be used:  feed
the data to *both* write ports and enable the write to the nomally read-only
port when a RdAddr==WrAddr compare is valid.  This increases the effective
address setup time but gives the desired WRITE_FIRST functionality without
increasing the Clk-to-out time.


"Peter Alfke" <peter@xilinx.com> wrote in message
news:BCF3748F.69FD%peter@xilinx.com...
> Here is the official Xilinx text (I just rewrote this for the new User > Guide). > Conflict Avoidance. > Virtex-2 BlockRAM is a true dual-port RAM where both ports can access any > memory location at any time. When accessing the SAME MEMORY LOCATION from > both ports, the user must, however, observe certain restrictions,
specified
> by the clock-to-clock set-up time window.See the following: > > There are two fundamentally different situations: > The two ports either have a common clock ("Synchronous Clocking"), or the > clock frequency or phase is different for the two ports ("Asynchronous > Clocking"). > > Asynchronous Clocking is the more general case, where the active edges of > both clocks do not occur simultaneously: > There are no timing constraints when both ports perform a read operation
on
> the same location. > When one port performs a write operation, the other port must not read- or > write-access the same memory location by using a clock edge that falls > within the specified forbidden clock-to-clock set-up time window. (If this > restriction is ignored, a read operation might read unreliable data,
perhaps
> a mixture of old and new data in this location; a write operation might > result in wrong data stored in this location. There is, however, no risk
of
> physical damage to the device.) > > Synchronous Clocking is the special case, where the active edges of both > port clocks occur simultaneously: > There are no timing constraints when both ports perform a read operation. > When one port performs a write operation, the other port must not write
into
> the same location, unless both ports write identical data. > When one port performs a write operation, the other port can reliably read > data from the same location if the write port is in READ_FIRST mode. > DATA_OUT will then reflect the previously stored data. > > If the write port is in either WRITE_FIRST or in NO_CHANGE mode, then the > DATA-OUT on the read port would become invalid (unreliable). Obviously,
the
> read-port's mode setting does not affect this. > > June 2004 Peter Alfke ( this text has not yet been posted on xilinx.com) > > > >
Rajeev wrote:
> > I wish I had something more constructive to offer... I have a Stratix > design and I use read latency of 2 cycles everywhere (one for address in, > one for data out.) While one can eliminate the data output register it > adds enough ns that it's just not worth it. > > I can't help noticing the (huge?) disparity between the 1K50 and the > 3S400, and am surprised that you're still using the ACEX parts. In that > vein, I'm carrying around the notion that _all_ newer FPGAs are or will > require registered ports... so why not bite the bullet and go synchronous ?
In my design it adds a clock cycle delay to have a register on the data out side of the RAM. So that slows things down a lot. I am using the ACEX parts because I need the 5 volt tolerance that has been left behind by the newer parts. For that function, they work very well.
> I'm also not sure from your post whether "pipelined" is synonymous with > "registered", ie you're trying to do something like one instruction per > clock cycle and/or you can't tolerate the 2 ticks latency.
Yes, if you have more than one register in the fetch-decode-execute cycle, then more than one clock cycle is needed and if you want to start a new instruction on every clock (as I do) it would have to be pipelined. Non-pipelined MCUs are *much* simpler and not necessarily slower in the time to execute any given instruction. Pipelining only lets you add more hardware to overlap execution of multiple instructions. You also don't have to deal with throwing away prefetches if you don't pipeline. After looking at the structure of the Xilinx Spartan 3 block rams, I see that I can't escape the output register. But seeing the mode where the read is done post-write I realized that I can add a mux and an output register which will always reflect the top of the stack without a read delay! I am still not certain it will work ok in the Xilinx part, but this works great in the Altera parts and it speeds up the cycle time a lot. I can decode and execute the current instruction and fetch the next instruction in no more than two levels of logic and one RAM delay per clock cycle. I expect this to run at 60 to 80 MHz without too much trouble. If I work on optimizing the placement and routing, I might even get 100MHz out of this. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
roller wrote:
> > i dont know exactly how the spartan3 is related to the spartan2, but it > might help you, check this out > > http://toolbox.xilinx.com/docsan/xilinx4/data/docs/lib/dsgnelpr5.html > > it says that when you write data, one of the ports reads what you're > writting. From Coregen options i'd guess that you can also set it up as > read-after-write (this one) or write-after-read (which would read the > previous contents, and then write)
Yes, I saw that. It gave me an idea of how I can deal with the read delay in the Altera part. But I belive the Xilinx part still gives you a two clock delay on reading the new data. I am using the RAM for stacks among other things. So I can use a separate register to always hold the top of stack. But if it pushes to the stack on one clock cycle and on the next clock cycle pops, the data on the output of the Xilinx RAM is still stale. I guess I can use the dual port and always have the read one address below the write. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX