FPGARelated.com
Forums

True dual-port RAM in VHDL: XST question

Started by Jonathan Bromley June 24, 2009
Summary Jonathan Bromley discusses a discrepancy in VHDL RAM inference where synthesis tools like XST and Quartus correctly infer dual-port RAM from signal assignments, despite such code being semantically invalid for simulation due to multiple drivers.

Jonathan Bromley discusses a discrepancy in VHDL RAM inference where synthesis tools like XST and Quartus correctly infer dual-port RAM from signal assignments, despite such code being semantically invalid for simulation due to multiple drivers. While using shared variables is the theoretically correct VHDL method to model this behavior, it is often considered cumbersome or "ugly" by developers.

The thread explores the challenges of creating a single VHDL template that is both accurate for simulation and recognizable by synthesis engines to target hardware block RAM (BRAM).

  • Xilinx XST and Altera Quartus can infer dual-port RAM from signal assignments in two processes, even though this creates simulation conflicts.
  • Shared variables are necessary in VHDL to correctly model the dual-port memory behavior for simulation when using two independent clocks.
  • Asynchronous memory models usually fail to synthesize into efficient FPGA block RAM and instead consume excessive flip-flop resources.
  • Synthesis tools rely heavily on specific template matching, making it difficult to write portable, simulation-accurate VHDL for complex components like dual-clock RAM.
  • Verilog avoids this specific VHDL issue because all variables are inherently shared across procedural blocks.
VHDLRAM InferenceXilinx ISEFPGA Synthesis
hi all,

As promised many weeks ago, I'm building what I
hope will be a comprehensive summary of how to do
RAM inference from VHDL and Verilog code for all
the common synthesis tools and FPGAs.  It will
go on our website some time this summer (sorry,
it's not a high-priority project).

I've encountered what seems to me to be a bug
in XST (all versions from 8 to 11 inclusive)
and I would value your opinion before I start
to give Xilinx a hard time about it.  By the
way, exactly the same bug appears to be present
in Quartus but I haven't yet done enough detailed
investigation to comment on that properly.

To create true (dual-clock) dual-port RAM,
I need to create two clocked processes.  This
requires me to use a shared variable for
the memory itself (ugly but possible, works
correctly in XST):

  type t_mem is array (0 to 2**ABITS-1) of 
              std_logic_vector(DBITS-1 downto 0);
  shared variable mem: t_mem;  -- the memory storage
begin -- the architecture
  process (clock0) -- manages port A
  begin
    if rising_edge (clock0) then
      if we0 = '1' then  -- write to port A
        mem(to_integer(unsigned(a0))) := wd0;
        rd0 <= wd0;
      else
        rd0 <= mem(to_integer(unsigned(a0)));
      end if;
    end if;
  end process;
  --
  process (clock1) -- manages port B
  begin
    if rising_edge (clock1) then
      if we1 = '1' then
        mem(to_integer(unsigned(a1))) := wd1;
        rd1 <= wd1;
      else
        rd1 <= mem(to_integer(unsigned(a1)));
      end if;
    end if;
  end process;

That, I believe, is the right way to do it.

However, both XST and Quartus give THE SAME SYNTHESIS
RESULTS if I change "shared variable" to "signal", and
make signal assignments instead of variable assignments
to the mem() array.  This is just plain WRONG!  Writing
to a signal from two processes represents two resolved
drivers on the signal, and does not correctly model a
dual-port memory in simulation.

Given that the whole point of memory inference from
HDL code is that you get a convenient, readable,
accurate simulation model as part of your design
code, this behaviour by the synthesis tools is
incomprehensible to me.  Can anyone clarify?  Has
anyone fallen foul of this problem?  Best of all,
could Brian Philofsky, who has written so clearly
and helpfully about XST in the past, please speak
up and tell us what the blazes is going on here?

Thanks
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message 
news:prs345lln7bmpp71hrk9p33ehfkq8231gj@4ax.com...
> hi all, > > As promised many weeks ago, I'm building what I > hope will be a comprehensive summary of how to do > RAM inference from VHDL and Verilog code for all > the common synthesis tools and FPGAs. It will > go on our website some time this summer (sorry, > it's not a high-priority project). > > I've encountered what seems to me to be a bug > in XST (all versions from 8 to 11 inclusive) > and I would value your opinion before I start > to give Xilinx a hard time about it. By the > way, exactly the same bug appears to be present > in Quartus but I haven't yet done enough detailed > investigation to comment on that properly. > > To create true (dual-clock) dual-port RAM, > I need to create two clocked processes. This > requires me to use a shared variable for > the memory itself (ugly but possible, works > correctly in XST): > > type t_mem is array (0 to 2**ABITS-1) of > std_logic_vector(DBITS-1 downto 0); > shared variable mem: t_mem; -- the memory storage > begin -- the architecture > process (clock0) -- manages port A > begin > if rising_edge (clock0) then > if we0 = '1' then -- write to port A > mem(to_integer(unsigned(a0))) := wd0; > rd0 <= wd0; > else > rd0 <= mem(to_integer(unsigned(a0))); > end if; > end if; > end process; > -- > process (clock1) -- manages port B > begin > if rising_edge (clock1) then > if we1 = '1' then > mem(to_integer(unsigned(a1))) := wd1; > rd1 <= wd1; > else > rd1 <= mem(to_integer(unsigned(a1))); > end if; > end if; > end process; > > That, I believe, is the right way to do it. > > However, both XST and Quartus give THE SAME SYNTHESIS > RESULTS if I change "shared variable" to "signal", and > make signal assignments instead of variable assignments > to the mem() array. This is just plain WRONG! Writing > to a signal from two processes represents two resolved > drivers on the signal, and does not correctly model a > dual-port memory in simulation. > > Given that the whole point of memory inference from > HDL code is that you get a convenient, readable, > accurate simulation model as part of your design > code, this behaviour by the synthesis tools is > incomprehensible to me. Can anyone clarify? Has > anyone fallen foul of this problem? Best of all, > could Brian Philofsky, who has written so clearly > and helpfully about XST in the past, please speak > up and tell us what the blazes is going on here? >
Your knowledge of VHDL is greater than mine, but I assumed that
> if we1 = '1' then > mem(to_integer(unsigned(a1))) := wd1; > end if;
was equivalent to; if we1 = '1' then mem(to_integer(unsigned(a1))) := wd1; else mem(to_integer(unsigned(a1))) := mem(to_integer(unsigned(a1))); end if; if you used something like; if we1 = '1' then mem(to_integer(unsigned(a1))) := wd1; else mem(to_integer(unsigned(a1))) := (others => 'Z'); end if; Would this then give more consistent results where both processes wouldn't be fighting against each other? Happy to be told I'm wrong.
On Wed, 24 Jun 2009 11:37:24 +0100, "Fredxx" wrote:

>if you used something like; > if we1 = '1' then > mem(to_integer(unsigned(a1))) := wd1; > else > mem(to_integer(unsigned(a1))) := (others => 'Z'); > end if; > >Would this then give more consistent results where both processes wouldn't >be fighting against each other?
Sadly, no. I see what you're getting at, but I don't think you could ever get the memory to have the correct contents if both ports are doing that all the time. Each process may overwrite locations it's already correctly written, using Zs, for no good reason. Suppose you could get it right somehow, and arrange that each process is driving Z to all locations it's never written, but appropriate values to locations it has written. What then happens if the second process writes to a location that previously was written by the other? How can it tell the first process now to put Z on that location? In truth the "correct" solution would be to write the whole thing as a single process with two clocks: process (clock0, clock1) variable mem: t_mem; begin if rising_edge(clock0) then if we0 = '1' then mem(a0) := wd0; end if; end if; if rising_edge(clock1) then if we1 = '1' then mem(a1) := wd1; end if; end if; ... But I suspect synthesis tools would chuck that overboard without a second thought. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message 
news:bn0445lbemmf3qnl87te1hps2l3nc9novv@4ax.com...
> On Wed, 24 Jun 2009 11:37:24 +0100, "Fredxx" wrote: > >>if you used something like; >> if we1 = '1' then >> mem(to_integer(unsigned(a1))) := wd1; >> else >> mem(to_integer(unsigned(a1))) := (others => 'Z'); >> end if; >> >>Would this then give more consistent results where both processes wouldn't >>be fighting against each other? > > Sadly, no. I see what you're getting at, but I don't think you could > ever get the memory to have the correct contents if both ports are > doing that all the time. Each process may overwrite locations it's > already correctly written, using Zs, for no good reason. > > Suppose you could get it right somehow, and arrange that each process > is driving Z to all locations it's never written, but appropriate > values to locations it has written. What then happens if the second > process writes to a location that previously was written by the other? > How can it tell the first process now to put Z on that location? > > In truth the "correct" solution would be to write the whole thing > as a single process with two clocks: > > process (clock0, clock1) > variable mem: t_mem; > begin > if rising_edge(clock0) then > if we0 = '1' then > mem(a0) := wd0; > end if; > end if; > if rising_edge(clock1) then > if we1 = '1' then > mem(a1) := wd1; > end if; > end if; > ... > > But I suspect synthesis tools would chuck that overboard > without a second thought. > -- > Jonathan Bromley, Consultant > > DOULOS - Developing Design Know-how > VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services > > Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK > jonathan.bromley@MYCOMPANY.com > http://www.MYCOMPANY.com > > The contents of this message may contain personal views which > are not the views of Doulos Ltd., unless specifically stated. >
I perhaps am making the (erroneous) assumption that two statements will be or'd together and the Z's will be overdriven by the signals. But as you say, I would be replacing the RAM locations with Z's or something that the synthesiser concocts. To be honest, I think it isn't good practice to have signals driven by 2 clocks, and I'd probably use clock switching primitives instead so the memory would be written in one process with just one clock.
On Wed, 24 Jun 2009 12:44:27 +0100, "Fredxx" wrote:

>I perhaps am making the (erroneous) assumption that two statements will be >or'd together and the Z's will be overdriven by the signals.
That's more-or-less correct. Each process represents a driver on any signal it writes. If multiple processes write to a signal, then the actual signal value is determined by resolving the various driven values. Of course, anything else overdrives Z. The hard-to-solve problem: suppose process A writes a value to a memory location at some time; clearly, you want that value to remain in the location and not to be overwritten to Z on the next clock, so you can't allow process A to change its mind about that value. Some time later, suppose process B writes to the same location. Now you have two non-Z drivers on the same set of bits. How can process B tell process A that it's time for its driver to lapse back to Z? Shared variables, for all their ugliness, solve this problem neatly (which is why my problem simply doesn't exist in Verilog, where all variables are shared).
>To be honest, I think it isn't good practice to have signals driven by 2 >clocks, and I'd probably use clock switching primitives instead so the >memory would be written in one process with just one clock.
In normal logic I would 100% agree, but here I'm talking about modeling and synthesizing the FPGAs' built-in RAM blocks, which have the option of independent clocks on the two ports. So it is important to write VHDL corresponding to that behavior. You could mux the clocks onto a single port, but that would be a totally different design. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message 
news:e76445hlo55p6dki0oi764adcblv5oloul@4ax.com...
> On Wed, 24 Jun 2009 12:44:27 +0100, "Fredxx" wrote: > > > In normal logic I would 100% agree, but here I'm talking about > modeling and synthesizing the FPGAs' built-in RAM blocks, which > have the option of independent clocks on the two ports. So > it is important to write VHDL corresponding to that behavior. > You could mux the clocks onto a single port, but that would > be a totally different design.
Ah - I see - that does sound rather tricky and can see where you're coming from.
On Jun 24, 7:44=A0am, "Fredxx" <fre...@spam.com> wrote:
> "Jonathan Bromley" <jonathan.brom...@MYCOMPANY.com> wrote in message > > news:bn0445lbemmf3qnl87te1hps2l3nc9novv@4ax.com... > > > > > On Wed, 24 Jun 2009 11:37:24 +0100, "Fredxx" wrote: > > >>if you used something like; > >> =A0 =A0 =A0if we1 =3D '1' then > >> =A0 =A0 =A0 =A0mem(to_integer(unsigned(a1))) :=3D wd1; > >> =A0 =A0 =A0else > >> =A0 =A0 =A0 =A0mem(to_integer(unsigned(a1))) :=3D (others =3D> 'Z'); > >> =A0 =A0 =A0end if; > > >>Would this then give more consistent results where both processes would=
n't
> >>be fighting against each other? > > > Sadly, no. =A0I see what you're getting at, but I don't think you could > > ever get the memory to have the correct contents if both ports are > > doing that all the time. =A0Each process may overwrite locations it's > > already correctly written, using Zs, for no good reason. > > > Suppose you could get it right somehow, and arrange that each process > > is driving Z to all locations it's never written, but appropriate > > values to locations it has written. =A0What then happens if the second > > process writes to a location that previously was written by the other? > > How can it tell the first process now to put Z on that location? > > > In truth the "correct" solution would be to write the whole thing > > as a single process with two clocks: > > > =A0process (clock0, clock1) > > =A0 =A0variable mem: t_mem; > > =A0begin > > =A0 =A0if rising_edge(clock0) then > > =A0 =A0 =A0if we0 =3D '1' then > > =A0 =A0 =A0 =A0mem(a0) :=3D wd0; > > =A0 =A0 =A0end if; > > =A0 =A0end if; > > =A0 =A0if rising_edge(clock1) then > > =A0 =A0 =A0if we1 =3D '1' then > > =A0 =A0 =A0 =A0mem(a1) :=3D wd1; > > =A0 =A0 =A0end if; > > =A0 =A0end if; > > =A0 =A0... > > > But I suspect synthesis tools would chuck that overboard > > without a second thought. > > -- > > Jonathan Bromley, Consultant > > > DOULOS - Developing Design Know-how > > VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services > > > Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK > > jonathan.brom...@MYCOMPANY.com > >http://www.MYCOMPANY.com > > > The contents of this message may contain personal views which > > are not the views of Doulos Ltd., unless specifically stated. > > I perhaps am making the (erroneous) assumption that two statements will b=
e
> or'd together and the Z's will be overdriven by the signals. =A0But as yo=
u
> say, I would be replacing the RAM locations with Z's or something that th=
e
> synthesiser concocts. > > To be honest, I think it isn't good practice to have signals driven by 2 > clocks, and I'd probably use clock switching primitives instead so the > memory would be written in one process with just one clock.
That would be a truly bizarre circuit design. I don't know how they actually construct memory to use separate clocks, but I expect it uses an async memory with two independent synchronous interfaces. FPGA reps have posted here that there is a lot of "magic" in the logic between the sync interfaces and the async memory inside the block ram. All of this would be very hard to describe using an HDL. But driving a signal with 'z' or switching clocks is not the way to go at all... Rick
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message 
news:e76445hlo55p6dki0oi764adcblv5oloul@4ax.com...
> On Wed, 24 Jun 2009 12:44:27 +0100, "Fredxx" wrote: > >>I perhaps am making the (erroneous) assumption that two statements will be >>or'd together and the Z's will be overdriven by the signals. > > That's more-or-less correct. Each process represents a driver > on any signal it writes. If multiple processes write to a signal, > then the actual signal value is determined by resolving the > various driven values. Of course, anything else overdrives Z. > > The hard-to-solve problem: suppose process A writes a value > to a memory location at some time; clearly, you want that > value to remain in the location and not to be overwritten > to Z on the next clock, so you can't allow process A to change > its mind about that value. Some time later, suppose process B > writes to the same location. Now you have two non-Z drivers > on the same set of bits. How can process B tell process A > that it's time for its driver to lapse back to Z? Shared > variables, for all their ugliness, solve this problem > neatly (which is why my problem simply doesn't exist in > Verilog, where all variables are shared). > >>To be honest, I think it isn't good practice to have signals driven by 2 >>clocks, and I'd probably use clock switching primitives instead so the >>memory would be written in one process with just one clock. > > In normal logic I would 100% agree, but here I'm talking about > modeling and synthesizing the FPGAs' built-in RAM blocks, which > have the option of independent clocks on the two ports. So > it is important to write VHDL corresponding to that behavior. > You could mux the clocks onto a single port, but that would > be a totally different design.
What's wrong with an asynchronous memory, where the appropriate clocks latch the control signals to create synchronous RAM. Then we can do something like: process (a0, we0, wd0, a1, we1, wd1) begin if we0 = '1' then -- write to port A mem(conv_integer(a0)) <= wd0; end if; if we1 = '1' then -- write to port A mem(conv_integer(a1)) <= wd1; end if; rd0 <= mem(conv_integer(a0)); rd1 <= mem(conv_integer(a1)); end process; It works in simulation!!
On Wed, 24 Jun 2009 15:45:37 +0100, "Fredxx" wrote:

>What's wrong with an asynchronous memory[...] >It works in simulation!!
Nothing wrong with them, except that they don't exist in real FPGAs. By contrast, dual-ported dual-clock synchronous RAMs most certainly do :-) -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
On Jun 24, 10:45=A0am, "Fredxx" <fre...@spam.com> wrote:
> "Jonathan Bromley" <jonathan.brom...@MYCOMPANY.com> wrote in message > > news:e76445hlo55p6dki0oi764adcblv5oloul@4ax.com... > > > > > On Wed, 24 Jun 2009 12:44:27 +0100, "Fredxx" wrote: > > >>I perhaps am making the (erroneous) assumption that two statements will=
be
> >>or'd together and the Z's will be overdriven by the signals. > > > That's more-or-less correct. =A0Each process represents a driver > > on any signal it writes. =A0If multiple processes write to a signal, > > then the actual signal value is determined by resolving the > > various driven values. =A0Of course, anything else overdrives Z. > > > The hard-to-solve problem: =A0suppose process A writes a value > > to a memory location at some time; clearly, you want that > > value to remain in the location and not to be overwritten > > to Z on the next clock, so you can't allow process A to change > > its mind about that value. =A0Some time later, suppose process B > > writes to the same location. =A0Now you have two non-Z drivers > > on the same set of bits. =A0How can process B tell process A > > that it's time for its driver to lapse back to Z? =A0Shared > > variables, for all their ugliness, solve this problem > > neatly (which is why my problem simply doesn't exist in > > Verilog, where all variables are shared). > > >>To be honest, I think it isn't good practice to have signals driven by =
2
> >>clocks, and I'd probably use clock switching primitives instead so the > >>memory would be written in one process with just one clock. > > > In normal logic I would 100% agree, but here I'm talking about > > modeling and synthesizing the FPGAs' built-in RAM blocks, which > > have the option of independent clocks on the two ports. =A0So > > it is important to write VHDL corresponding to that behavior. > > You could mux the clocks onto a single port, but that would > > be a totally different design. > > What's wrong with an asynchronous memory, where the appropriate clocks la=
tch
> the control signals to create synchronous RAM. > > Then we can do something like: > > process (a0, we0, wd0, a1, we1, wd1) > begin > =A0 if we0 =3D '1' then =A0-- write to port A > =A0 =A0 mem(conv_integer(a0)) <=3D wd0; > =A0 end if; > =A0 if we1 =3D '1' then =A0-- write to port A > =A0 =A0 mem(conv_integer(a1)) <=3D wd1; > =A0 end if; > =A0 rd0 <=3D mem(conv_integer(a0)); > =A0 rd1 <=3D mem(conv_integer(a1)); > end process; > > It works in simulation!!
I doubt that it will synthesize. Synthesis is largely a matter of template matching. You can describe a behavior any way you want in simulation. But if the synthesis tool does not recognize that form, it won't synthesize to anything useful. Often memory that is not recognized as a block ram is synthesized as distributed memory using much of the FFs on a chip. Not only that, but it takes forever to complete just to find out you don't have a workable design. Rick