FPGARelated.com
Forums

Here is new definition for keyword "if_2", version 2.

Started by Weng Tianxiang September 27, 2019
Here is new definition for keyword "if_2", version 2.

It is developed based on many discussions after my first post: " New keyword "if_2" is suggested for dealing with 2-write port memory."

New keyword "if_2" is used to put m-write and n-read memory module from chip manufactures' toolbox behind HDL language so that with the new keyword "if_2" introduction any m-write and n-read memory module would be fully specified in HDL with very simple coding and without special technique and knowledge about memory module, or instantiated memory module needed for circuit designers. All related complex job is left to synthesizer' manufacturers.

  If_2-statement ::=

    [ if_2_label : ]

      if_2 condition then

        sequence_of_statements

      { elsif condition then

        sequence_of_statements }

      [ else

        sequence_of_statements ]

      end if [ if_2_label ] ;

1. Any assignment statement's target array in sequence_of_statements under an if-2 statement is an independent write to a memory that must be executed, not obeying statement sequence in a process, regardless how many writes to the target array are coded before or after its appearance.

2. Any assignment statement's target non-array signal in sequence_of_statements under an if_2 statement obeys statement sequence in a process.

3. An if-statement under an if_2-statement is treated as an if_2-statement.

4. An if_2-statement can only exist within a clocked process.


Here is a code example to specify a 3-write and 2-read memory module:

p1: process(CLK) is

begin

  if CLK'event and CLK = '1' then

    if C1 then

      An_Array(a) <= D1; -- it is first write to array An_Array

    end if;

    if_2 C2 then

      An_Array(b) <= D2; -- it is the second write to array An_Array

    end if;

    if_2 C3 then

      An_Array(c) <= D3; -- it is the third write to array An_Array

    end if;

    X <= An_Array(j); -- first read from array An_Array

    Y <= An_Array(k); -- second read from array An_Array

  end if;

end process;

Especial thanks to the creative response writers who mentioned keyword "if_3", who gave me the specification of Cyclone and has deep discussions with me, and Han from HDL-lab whose implementation of a 8-write and 8-read memory for a CPU chip gave me deep impression long before the new idea is born.

Weng
On Thursday, September 26, 2019 at 11:42:35 PM UTC-4, Weng Tianxiang wrote:
> Here is new definition for keyword "if_2", version 2. >
Same comments as with the first 'definition' which is that it provides no benefit to anyone that uses VHDL and it expands the keyword list of the standard without providing any benefit. Good luck with that. Now that VHDL-2019 is near the end of the finish line, VHDL-2030 won't be far behind, that will be your next opportunity. Each instance of 'if_2' in your example process can be replaced with today's 'if' and the example process works with every VHDL standard that has been released to date. So, if 'if_2' ever became part of the standard, then anyone who would use it is locking themselves into requiring use of a particular standard when it is not needed. That is poor design practice. I guess users will just have to muddle through by typing the exact same thing except for the needless '_2'. Kevin Jennings
I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well.

architecture example of memtest is
...
. shared variable memory : memory_t; -- !!!
begin
. . blks : for i in 0 to PORT_COUNT-1 generate
. . . memport : process(clocks(i))
. . . begin
. . . . . if rising_edge(clocks(i)) then
. . . . . . . if stbs(i) = '1' then
. . . . . . . . . memory(addrs(i)) := writes(i);
. . . . . . . end if;
. . . . . . . reads(i) <= memory(addrs(i));
. . . . . end if;
. . . end process;
. . end generate;
end architecture;

Does it mean a 10-write and 10-read port memory module?

I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further.

As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say?

If it is really a 10-port memory module, why a synthesizer does not do well?

Thank you.

Weng
On Friday, September 27, 2019 at 7:25:33 PM UTC-4, Weng Tianxiang wrote:
> I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well. > > architecture example of memtest is > ... > . shared variable memory : memory_t; -- !!! > begin > . . blks : for i in 0 to PORT_COUNT-1 generate > . . . memport : process(clocks(i)) > . . . begin > . . . . . if rising_edge(clocks(i)) then > . . . . . . . if stbs(i) = '1' then > . . . . . . . . . memory(addrs(i)) := writes(i); > . . . . . . . end if; > . . . . . . . reads(i) <= memory(addrs(i)); > . . . . . end if; > . . . end process; > . . end generate; > end architecture; > > Does it mean a 10-write and 10-read port memory module? > > I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further. > > As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say? > > If it is really a 10-port memory module, why a synthesizer does not do well? > > Thank you. > > Weng
Yes, it describes a single memory with N ports where N is defined by PORT_COUNT. What part of the code do you find confusing? There is no shortcoming in the language. This code describes the memory properly. If a synthesizer can't synthesize this code for 10 ports that is a problem with the synthesizer, not the language. I'm willing to bet you will have a hard time finding a library module for a 10 port memory. If you don't understand the language enough to know this code describes an N port memory, you really are not in a position to tell the rest of us how the language should be changed to accommodate your lack of understanding. -- Rick C. - Get 2,000 miles of free Supercharging - Tesla referral code - https://ts.la/richard11209
On Friday, September 27, 2019 at 5:05:55 PM UTC-7, Rick C wrote:
> On Friday, September 27, 2019 at 7:25:33 PM UTC-4, Weng Tianxiang wrote: > > I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well. > > > > architecture example of memtest is > > ... > > . shared variable memory : memory_t; -- !!! > > begin > > . . blks : for i in 0 to PORT_COUNT-1 generate > > . . . memport : process(clocks(i)) > > . . . begin > > . . . . . if rising_edge(clocks(i)) then > > . . . . . . . if stbs(i) = '1' then > > . . . . . . . . . memory(addrs(i)) := writes(i); > > . . . . . . . end if; > > . . . . . . . reads(i) <= memory(addrs(i)); > > . . . . . end if; > > . . . end process; > > . . end generate; > > end architecture; > > > > Does it mean a 10-write and 10-read port memory module? > > > > I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further. > > > > As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say? > > > > If it is really a 10-port memory module, why a synthesizer does not do well? > > > > Thank you. > > > > Weng > > Yes, it describes a single memory with N ports where N is defined by PORT_COUNT. What part of the code do you find confusing? > > There is no shortcoming in the language. This code describes the memory properly. If a synthesizer can't synthesize this code for 10 ports that is a problem with the synthesizer, not the language. I'm willing to bet you will have a hard time finding a library module for a 10 port memory. > > If you don't understand the language enough to know this code describes an N port memory, you really are not in a position to tell the rest of us how the language should be changed to accommodate your lack of understanding. > > -- > > Rick C. > > - Get 2,000 miles of free Supercharging > - Tesla referral code - https://ts.la/richard11209
Rick, I think your conclusion is made too earlier. Here is what the code author responses: The synthesizer doesn&rsquo;t do well because any additional port to a memory makes it exponentially harder to implement. You can implement a multi-memory in 3 ways: real physical designed hard memory block. These are the dual-ported memories of FPGAs flip-flops or latch based arrays. These are very area inefficient. weird architectures that use dual-ported memories to build memories with a larger number of ports. That&rsquo;s the paper that you linked too. It is extremely area inefficient as well. In practice. designers avoid multi-ported memories like the plague because they are very costly. It has nothing to do with language features. As shown above: writing the RTL for a 10-ported memory is trivial. You don&rsquo;t need new keywords for it. For example: a 10 ported read/write memory would require on the order of 100 RAMs using the paper that you linked to. That is why synthesis tools don&rsquo;t infer them: you&rsquo;d give designers a lot of rope to hang themselves with a feature for which there is no demand. Rick, After seeing the code author's response do you have any new idea? Weng
On Friday, September 27, 2019 at 9:21:02 PM UTC-4, Weng Tianxiang wrote:
> On Friday, September 27, 2019 at 5:05:55 PM UTC-7, Rick C wrote: > > On Friday, September 27, 2019 at 7:25:33 PM UTC-4, Weng Tianxiang wrote: > > > I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well. > > > > > > architecture example of memtest is > > > ... > > > . shared variable memory : memory_t; -- !!! > > > begin > > > . . blks : for i in 0 to PORT_COUNT-1 generate > > > . . . memport : process(clocks(i)) > > > . . . begin > > > . . . . . if rising_edge(clocks(i)) then > > > . . . . . . . if stbs(i) = '1' then > > > . . . . . . . . . memory(addrs(i)) := writes(i); > > > . . . . . . . end if; > > > . . . . . . . reads(i) <= memory(addrs(i)); > > > . . . . . end if; > > > . . . end process; > > > . . end generate; > > > end architecture; > > > > > > Does it mean a 10-write and 10-read port memory module? > > > > > > I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further. > > > > > > As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say? > > > > > > If it is really a 10-port memory module, why a synthesizer does not do well? > > > > > > Thank you. > > > > > > Weng > > > > Yes, it describes a single memory with N ports where N is defined by PORT_COUNT. What part of the code do you find confusing? > > > > There is no shortcoming in the language. This code describes the memory properly. If a synthesizer can't synthesize this code for 10 ports that is a problem with the synthesizer, not the language. I'm willing to bet you will have a hard time finding a library module for a 10 port memory. > > > > If you don't understand the language enough to know this code describes an N port memory, you really are not in a position to tell the rest of us how the language should be changed to accommodate your lack of understanding. > > > > -- > > > > Rick C. > > > > - Get 2,000 miles of free Supercharging > > - Tesla referral code - https://ts.la/richard11209 > > Rick, > I think your conclusion is made too earlier. > > Here is what the code author responses: > > The synthesizer doesn&rsquo;t do well because any additional port to a memory makes it exponentially harder to implement. > > You can implement a multi-memory in 3 ways: > > real physical designed hard memory block. These are the dual-ported memories of FPGAs > > flip-flops or latch based arrays. These are very area inefficient. > > weird architectures that use dual-ported memories to build memories with a larger number of ports. That&rsquo;s the paper that you linked too. It is extremely area inefficient as well. > > In practice. designers avoid multi-ported memories like the plague because they are very costly. It has nothing to do with language features. As shown above: writing the RTL for a 10-ported memory is trivial. You don&rsquo;t need new keywords for it. > > For example: a 10 ported read/write memory would require on the order of 100 RAMs using the paper that you linked to. > > That is why synthesis tools don&rsquo;t infer them: you&rsquo;d give designers a lot of rope to hang themselves with a feature for which there is no demand. > > Rick, > After seeing the code author's response do you have any new idea?
I'm not clear on what your points are. I don't see anything in this post that contradicts anything I've said. What did I say that you are addressing? BTW, it is hard to follow the conversation when you keep starting new threads on the same topic. -- Rick C. + Get 2,000 miles of free Supercharging + Tesla referral code - https://ts.la/richard11209
On 28/09/2019 00:25, Weng Tianxiang wrote:
> I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well. > > architecture example of memtest is > ... > . shared variable memory : memory_t; -- !!! > begin > . . blks : for i in 0 to PORT_COUNT-1 generate > . . . memport : process(clocks(i)) > . . . begin > . . . . . if rising_edge(clocks(i)) then > . . . . . . . if stbs(i) = '1' then > . . . . . . . . . memory(addrs(i)) := writes(i); > . . . . . . . end if; > . . . . . . . reads(i) <= memory(addrs(i)); > . . . . . end if; > . . . end process; > . . end generate; > end architecture; > > Does it mean a 10-write and 10-read port memory module? > > I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further. > > As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say? > > If it is really a 10-port memory module, why a synthesizer does not do well?
Because there are no suitable primitives for the synthesis tool to map to. This is not the say the synthesis vendor couldn't infer a decuple (had to look this up) port memory block using existing techniques like templates, attributes, synthesis directive etc but I suspect the number of configurations would be too large for very little return. As many others have told you adding a new keyword to the language will not make this any easier! I would be interested to find out what circuit needs a true decuple port memory block. Processor register files and network controllers require a large number of read/write ports but I am sure it is not as high as 10. Regards, Hans www.ht-lab.com
> > Thank you. > > Weng >
On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote:
> On 28/09/2019 00:25, Weng Tianxiang wrote: > > I download someone's key code from pastebin.com and copy it here for easy discussion. He claims that the following code describes a 10-write and 10-read memory module, but he admitted that a synthesizer does not run it well. > > > > architecture example of memtest is > > ... > > . shared variable memory : memory_t; -- !!! > > begin > > . . blks : for i in 0 to PORT_COUNT-1 generate > > . . . memport : process(clocks(i)) > > . . . begin > > . . . . . if rising_edge(clocks(i)) then > > . . . . . . . if stbs(i) = '1' then > > . . . . . . . . . memory(addrs(i)) := writes(i); > > . . . . . . . end if; > > . . . . . . . reads(i) <= memory(addrs(i)); > > . . . . . end if; > > . . . end process; > > . . end generate; > > end architecture; > > > > Does it mean a 10-write and 10-read port memory module? > > > > I really don't understand what the code means, and how synthesizer executes it, and hope some experts explain it to me further. > > > > As a fact, he says that "The fact that this probably wouldn't go so well when you ran it comes down to the synthesizer not the language." That is absolutely not as good as you promised what you have describe it before: the method to generate an n-write and m-read memory module is well established in HDL grammar. If a well defined code based on a grammar cannot run well by a synthesizer, can I believe what you say? > > > > If it is really a 10-port memory module, why a synthesizer does not do well? > > Because there are no suitable primitives for the synthesis tool to map > to. This is not the say the synthesis vendor couldn't infer a decuple > (had to look this up) port memory block using existing techniques like > templates, attributes, synthesis directive etc but I suspect the number > of configurations would be too large for very little return. > > As many others have told you adding a new keyword to the language will > not make this any easier! > > I would be interested to find out what circuit needs a true decuple port > memory block. Processor register files and network controllers require a > large number of read/write ports but I am sure it is not as high as 10. > > Regards, > Hans > www.ht-lab.com > > > > > > Thank you. > > > > Weng > >
Hi Hans, I remember that you mentioned that you implemented a 8*8 port memory module using technique based on paper "Efficient Multi-Ported Memories for FPGAs". Can you disclose more details and your experiences about your implementation? And what is the best technique to design a CPU register file in your opinion? In my project, I need multiple 2-write and 2 read port memory, true dual port memory does not meet my requirement. I estimate that I need 4 RAM with each having 1-write and 1-read port. Even though my project is still in logic design stage and there is no problem for me to simulate the logic, based on current logic design: an array can be read n times and written m times: when multiple writing to an array in a process I guess a simulator would only write any data at the written address once it meets an assignment statement that would guarantee the last write is valid if their writing addresses are same. The technique based on the paper needs n*m RAM blocks if each RAM block has one write and one read port. What role may a dual port memory block play? Thank you. Weng
On Saturday, September 28, 2019 at 10:02:42 AM UTC-4, Weng Tianxiang wrote:
> On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote: > > On 28/09/2019 00:25, Weng Tianxiang wrote: > > > I download someone's key code from pastebin.com and copy it here for =
easy discussion. He claims that the following code describes a 10-write and= 10-read memory module, but he admitted that a synthesizer does not run it = well.
> > >=20 > > > architecture example of memtest is > > > ... > > > . shared variable memory : memory_t; -- !!! > > > begin > > > . . blks : for i in 0 to PORT_COUNT-1 generate > > > . . . memport : process(clocks(i)) > > > . . . begin > > > . . . . . if rising_edge(clocks(i)) then > > > . . . . . . . if stbs(i) =3D '1' then > > > . . . . . . . . . memory(addrs(i)) :=3D writes(i); > > > . . . . . . . end if; > > > . . . . . . . reads(i) <=3D memory(addrs(i)); > > > . . . . . end if; > > > . . . end process; > > > . . end generate; > > > end architecture; > > >=20 > > > Does it mean a 10-write and 10-read port memory module? > > >=20 > > > I really don't understand what the code means, and how synthesizer ex=
ecutes it, and hope some experts explain it to me further.
> > >=20 > > > As a fact, he says that "The fact that this probably wouldn't go so w=
ell when you ran it comes down to the synthesizer not the language." That i= s absolutely not as good as you promised what you have describe it before: = the method to generate an n-write and m-read memory module is well establis= hed in HDL grammar. If a well defined code based on a grammar cannot run we= ll by a synthesizer, can I believe what you say?
> > >=20 > > > If it is really a 10-port memory module, why a synthesizer does not d=
o well?
> >=20 > > Because there are no suitable primitives for the synthesis tool to map=
=20
> > to. This is not the say the synthesis vendor couldn't infer a decuple=
=20
> > (had to look this up) port memory block using existing techniques like=
=20
> > templates, attributes, synthesis directive etc but I suspect the number=
=20
> > of configurations would be too large for very little return. > >=20 > > As many others have told you adding a new keyword to the language will=
=20
> > not make this any easier! > >=20 > > I would be interested to find out what circuit needs a true decuple por=
t=20
> > memory block. Processor register files and network controllers require =
a=20
> > large number of read/write ports but I am sure it is not as high as 10. > >=20 > > Regards, > > Hans > > www.ht-lab.com > >=20 > >=20 > > >=20 > > > Thank you. > > >=20 > > > Weng > > > >=20 > Hi Hans, >=20 > I remember that you mentioned that you implemented a 8*8 port memory modu=
le using technique based on paper "Efficient Multi-Ported Memories for FPGA= s".=20
>=20 > Can you disclose more details and your experiences about your implementat=
ion? And what is the best technique to design a CPU register file in your o= pinion?
>=20 > In my project, I need multiple 2-write and 2 read port memory, true dual =
port memory does not meet my requirement. I estimate that I need 4 RAM with= each having 1-write and 1-read port.=20
>=20 > Even though my project is still in logic design stage and there is no pro=
blem for me to simulate the logic, based on current logic design: an array = can be read n times and written m times: when multiple writing to an array = in a process I guess a simulator would only write any data at the written a= ddress once it meets an assignment statement that would guarantee the last = write is valid if their writing addresses are same.=20
>=20 > The technique based on the paper needs n*m RAM blocks if each RAM block h=
as one write and one read port. What role may a dual port memory block play= ? =20 You think too much in terms of the HDL you have written. There is no way f= or the HDL to know the two addresses are equal, so the first/last thing doe= sn't enter into the matter. That is also why the suggested code to infer a= multiple write port memory is with a shared variable and separate processe= s. =20 Remember that an HDL is a hardware description language. Exactly what hard= ware are you trying to describe? That is, how do you expect the tools to i= mplement your multiple write port memory? =20 The fact that your code simulated means nothing if the code can't be synthe= sized to working hardware.=20 --=20 Rick C. -- Get 2,000 miles of free Supercharging -- Tesla referral code - https://ts.la/richard11209
On 28/09/2019 15:02, Weng Tianxiang wrote:
> On Saturday, September 28, 2019 at 1:24:26 AM UTC-7, HT-Lab wrote: >> On 28/09/2019 00:25, Weng Tianxiang wrote:
..
>>> > > Hi Hans, > > I remember that you mentioned that you implemented a 8*8 port memory module using technique based on paper "Efficient Multi-Ported Memories for FPGAs".
Hi Weng, I actually used the XOR variant (not multipumped) to implement a 4W8R port. You can find the paper here: http://fpgacpu.ca/multiport/FPGA2012-LaForest-XOR-Paper.pdf and more papers on the main page: http://fpgacpu.ca/multiport/
> > Can you disclose more details and your experiences about your implementation? And what is the best technique to design a CPU register file in your opinion?
That all depends on your design. In my case I could use the XOR variant as I have a pipelined design were I could latch the register file's read request early on in the pipeline and then in a later stage XOR with the new results for the write request. The XOR is the most area efficient but was the most complicated to add to my design (due to data hazards and the fact that each write request also needs a read request).
> > In my project, I need multiple 2-write and 2 read port memory, true dual port memory does not meet my requirement. I estimate that I need 4 RAM with each having 1-write and 1-read port.
In that case forget about LaForest Et.al paper and simple use one of the core wizards like Intel's MegaWizard, Xilinx's Coregen etc. You get 2W2R area/speed optimised design with lots of configurable options.
> > Even though my project is still in logic design stage and there is no problem for me to simulate the logic, based on current logic design: an array can be read n times and written m times: when multiple writing to an array in a process I guess a simulator would only write any data at the written address once it meets an assignment statement that would guarantee the last write is valid if their writing addresses are same.
The core wizards gives you the option what should happen if you read/write to the same address.
> > The technique based on the paper needs n*m RAM blocks if each RAM block has one write and one read port. What role may a dual port memory block play?
Not sure what you are asking, you need DPRAM's as the basic building block for a a multi-port design. If you have the time I would suggest to implement the various versions and see how they behave, I learned a lot from it. Good luck, Hans www.ht-lab.com
> > Thank you. > > Weng >