comp.arch.fpga | My invention: Coding wave-pipelined circuits with buffering function in HDL

Hi,

A wive-pipelined circuit has the same logic as its pipeline counterpart exc=
ept that the wive-pipelined circuit has only one stage, a critical path fro=
m the input register passing through a piece of computational logic to the =
output register, and no intermediate registers.

My invention kernel idea is: A designer provides the least information and =
logic code about the critical path, and leave all complex logic designs to =
a synthesizer and a system library that is what an HDL should do.

All coding has 3 steps:
1. Write a Critical Path Component (CPC) with defined interface;

2. Call a Wave-Pipelining Component (WPC) provided by a system library;

3. Call one of 3 link statement to link a CPC instantiation with a paired W=
PC instantiation to specify what your target is.

Here is the all code on a 64*64 bits signed integer multiplier C <=3D A*B.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.wave_pipeline_package.all;

-- CPC code for wave-pipelined 64-bit signed integer multiplier C <=3D A*B
-- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in=
 VHDL
-- link1(): generation would fail if the circuit cannot accept 1 data per c=
ycle
-- link2(): generation never fails and the circuit is capable of accepting =
1 data per=20
-- INPUT_CLOCK_NUMBER cycles

entity CPC_1_2 is=20
   generic (  =20
      input_data_width  : positive  :=3D 64;                  -- optional
      output_data_width : positive  :=3D 128                  -- optional
   );
   port (
      CLK   :  in std_logic;
      WE_i  :  in std_logic;     -- '1': write enable to input registers A =
& B=20
      Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
      Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
      WE_o_i:  in std_logic;  -- '1': write enable to output register C
      Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data =
C
   );
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
   signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
   signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
   signal   Rc :  signed(output_data_width-1 downto 0); -- output register =
C
   signal   Cl :  signed(output_data_width-1 downto 0); -- combinational lo=
gic
  =20
begin
   Cl    <=3D Ra * Rb;             -- combinational logic output, key part =
of CPC
   Dc_o  <=3D unsigned(Rc);        -- output through output register

   p_1 : process(CLK)
   begin
      if Rising_edge(CLK) then
         if WE_i =3D '1' then      -- WE_i =3D '1' : latch input data
            Ra <=3D Da_i;
            Rb <=3D Db_i;
         end if;
        =20
         if WE_O_I =3D '1' then    -- WE_O_I =3D '1': latch output data
            Rc <=3D Cl;
         end if;
      end if;
   end process;

---------------------------------------------------------------------------=
-----

end A_CPC_1_2;

In summary, after HDL adopting my system, writing a wave-pipelined circuit =
is simple as writing a one-cycle logic circuit.

Thank you.

Weng

Reply by Weng Tianxiang ●January 12, 20182018-01-12

On Wednesday, January 10, 2018 at 5:56:45 PM UTC-8, Weng Tianxiang wrote:
> Hi,
> 
> A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path from the input register passing through a piece of computational logic to the output register, and no intermediate registers.
> 
> My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.
> 
> All coding has 3 steps:
> 1. Write a Critical Path Component (CPC) with defined interface;
> 
> 2. Call a Wave-Pipelining Component (WPC) provided by a system library;
> 
> 3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is.
> 
> Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.
> 
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
> use work.wave_pipeline_package.all;
> 
> -- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
> -- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
> -- link1(): generation would fail if the circuit cannot accept 1 data per cycle
> -- link2(): generation never fails and the circuit is capable of accepting 1 data per 
> -- INPUT_CLOCK_NUMBER cycles
> 
> entity CPC_1_2 is 
>    generic (   
>       input_data_width  : positive  := 64;                  -- optional
>       output_data_width : positive  := 128                  -- optional
>    );
>    port (
>       CLK   :  in std_logic;
>       WE_i  :  in std_logic;     -- '1': write enable to input registers A & B 
>       Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
>       Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
>       WE_o_i:  in std_logic;  -- '1': write enable to output register C
>       Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C
>    );
> end CPC_1_2;
> 
> architecture A_CPC_1_2 of CPC_1_2 is
>    signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
>    signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
>    signal   Rc :  signed(output_data_width-1 downto 0); -- output register C
>    signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic
>    
> begin
>    Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC
>    Dc_o  <= unsigned(Rc);        -- output through output register
> 
>    p_1 : process(CLK)
>    begin
>       if Rising_edge(CLK) then
>          if WE_i = '1' then      -- WE_i = '1' : latch input data
>             Ra <= Da_i;
>             Rb <= Db_i;
>          end if;
>          
>          if WE_O_I = '1' then    -- WE_O_I = '1': latch output data
>             Rc <= Cl;
>          end if;
>       end if;
>    end process;
> 
> --------------------------------------------------------------------------------
> 
> end A_CPC_1_2;
> 
> In summary, after HDL adopting my system, writing a wave-pipelined circuit is simple as writing a one-cycle logic circuit.
> 
> Thank you.
> 
> Weng

Hi,

The following information is from Wikipedia:

1. The Intel 8087, announced in 1980, was the first x87 floating-point coprocessor for the 8086 line of microprocessors.

2. MMX is a single instruction, multiple data (SIMD) instruction set designed by Intel, introduced in 1997 with its P5-based Pentium line of microprocessors, designated as "Pentium with MMX Technology".[1] It developed out of a similar unit introduced on the Intel i860,[2] and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on recent IA-32 processors by Intel and other vendors.

MMX has subsequently been extended by several programs by Intel and others: 3DNow!, Streaming SIMD Extensions (SSE), and ongoing revisions of Advanced Vector Extensions (AVX).

8087's floating 64-bit multiplier needs 5 cycles to finish a data processing with one input data per cycle.

MMX floating 64-bit floating multiplier needs 4 cycles to finish a data processing with one set of input data per 2 cycles.

Because each multiplier needs one multiplicand A and one multiplier B to get the result C, so naturally many testing benches claim MMX 64-bit floating multiplier is 20% faster than 8087 (4 cycles vs 5 cycles).

With my invention, any college students with knowledge of HDL can write a MMX wave-pipelined 64-bit floating multiplier within half an hour under following conditions:

1. My invented system is fully accepted to HDL;

2. Synthesizer manufacturers have updated their products to handle the generation of related wave-pipelined circuits.
All related technology and algorithms are available off selves.

3. It needs time.

One of wonderful wave-pipelined circuits I think may be 16 channels FFT processor with wave-pipelined technology: the benefits are faster running frequency and a lot of saving in respect of logic area and power consumption.

Thank you.

Weng

Reply by Weng Tianxiang ●January 13, 20182018-01-13

On Saturday, January 13, 2018 at 1:31:17 PM UTC-8, Rick C. Hodgin wrote:
> Do you have a YouTube example?  And an example that wil 
> synthesize in Icarus?  So we can see your method compares to a
> standard example.
> 
> -- 
> Rick C. Hodgin

Hi Rick,

Actually I have got 3 patents issued for the subject: 

1. 9,747,252: Systematic method of coding wave-pipelined circuits in HDL.
2. 9,734,127: Systematic method of synthesizing wave-pipelined circuits in HDL.
3. 9,575,929: Apparatus of wave-pipelined circuits.

All 3 patents have the same specification, drawings, abstract with different claims

Here is my new non-provisional patent application 15,861,093 (application, hereafter), "Coding wave-pipelined circuits with buffering function in HDL", filed to USPTO on 2018/01/03. 

The non-provisional patent application 15,861,093 has a *txt (*.vhd) file attached so that they are not secrets and any persons who are interested in the subject can email me to get what he wants, I would email the file set to him, even full application set will be published 18 months later.

The following is part of my sell-promotional file to some big companies:

"The new application can be viewed in some extents as the continuation of the 3 patents logically, but legally it is a brand new invention devoting the main attention to coding buffering function for wave-pipelined circuits in HDL, a topic never mentioned in the 3 patents, while it is still paying great attention to improve the 3 patents to make them more robust, friendlier and more complete in point of view from coding designers." 

In the 3 previous patents a first version of source code was attached, the new application provides the second version. With the 2nd version of VHDL source code available you can use a VHDL-2002 or above simulator to simulate all workings and generate waves. The source file is also well noted with inserted debugging function code.

Please email me what you want me to send:
for 3 patents:
1.1 Specification 

1.2. 3 sets of claims.

1.3. Drawings.

1.4. Source code.

1.5. ZIP file of all above.

For new application:
2.1 Specification.

2.2. claims.

2.3. Drawings.

2.4. Abstract.

2.5. Source code.

2.6. ZIP file of all above.

For the new application, specification has 81 pages, 48 claims have 15 pages and drawings have 24 pages.

If you lack time, the best way to learn all working structures needs only 2.1 Specification; 2.3. Drawings; and 2.4. Abstract.

Because the target of my patents and new application is a) to make my invented system as part of HDL (not only VHDL, but all languages in HDL), and b) to make the source code as part of system library in HDL, I am willing to distribute my code and all related files to any persons who are really interested in how I did it.

Through CPC_1_2 you may know that my scheme needs the least logic information and coding from a designer to resolve a very difficult problem, an almost 50-years open problem. 

My Email address is wtx wtx @ gmail . com (please remove spaces between characters)

Thank you.

Weng

Reply by Weng Tianxiang ●January 16, 20182018-01-16

On Wednesday, January 10, 2018 at 5:56:45 PM UTC-8, Weng Tianxiang wrote:
> Hi,
> 
> A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path from the input register passing through a piece of computational logic to the output register, and no intermediate registers.
> 
> My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.
> 
> All coding has 3 steps:
> 1. Write a Critical Path Component (CPC) with defined interface;
> 
> 2. Call a Wave-Pipelining Component (WPC) provided by a system library;
> 
> 3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is.
> 
> Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.
> 
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
> use work.wave_pipeline_package.all;
> 
> -- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
> -- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
> -- link1(): generation would fail if the circuit cannot accept 1 data per cycle
> -- link2(): generation never fails and the circuit is capable of accepting 1 data per 
> -- INPUT_CLOCK_NUMBER cycles
> 
> entity CPC_1_2 is 
>    generic (   
>       input_data_width  : positive  := 64;                  -- optional
>       output_data_width : positive  := 128                  -- optional
>    );
>    port (
>       CLK   :  in std_logic;
>       WE_i  :  in std_logic;     -- '1': write enable to input registers A & B 
>       Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
>       Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
>       WE_o_i:  in std_logic;  -- '1': write enable to output register C
>       Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C
>    );
> end CPC_1_2;
> 
> architecture A_CPC_1_2 of CPC_1_2 is
>    signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
>    signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
>    signal   Rc :  signed(output_data_width-1 downto 0); -- output register C
>    signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic
>    
> begin
>    Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC
>    Dc_o  <= unsigned(Rc);        -- output through output register
> 
>    p_1 : process(CLK)
>    begin
>       if Rising_edge(CLK) then
>          if WE_i = '1' then      -- WE_i = '1' : latch input data
>             Ra <= Da_i;
>             Rb <= Db_i;
>          end if;
>          
>          if WE_O_I = '1' then    -- WE_O_I = '1': latch output data
>             Rc <= Cl;
>          end if;
>       end if;
>    end process;
> 
> --------------------------------------------------------------------------------
> 
> end A_CPC_1_2;
> 
> In summary, after HDL adopting my system, writing a wave-pipelined circuit is simple as writing a one-cycle logic circuit.
> 
> Thank you.
> 
> Weng

Hi,

Here is more information on WPC (Wave-Pipelining Component) provided by a system library (I wroted). 

1. There are only 2 WPCs to cover all wave-piplined circuits:
  a) It is used for the situation under which only one critical path is used.
  b) It is used for the situation under which more than one same critical path is used.

2. There are 5 types of structures of all wave-pipelined circuits based on my classification:
  a) A one cycle non-pipelining circuit when it is coded as a wave-pipelined circuit, but finally it turns out to be a 1-cycle regular circuit.

  b) A wave-pipelined circuit that can accept one input data per cycle with one critical path.

  c) A wave-pipelined circuit that can accept one input data per multiple cycles with one critical path.

  d) A wave-pipelined circuit that can accept one input data per cycle with more than one critical path, each critical path having an input register and an output register.

  e) A wave-pipelined circuit that can accept one input data per cycle with more than one critical path, each critical path having an input register and sharing a sole output register.

3. The method guarantees 100% success rate for generating a specific wave-pipelined circuit.

Thank you.

Weng

Reply by Jan Coombs ●January 17, 20182018-01-17

On Sat, 13 Jan 2018 13:31:14 -0800 (PST)
"Rick C. Hodgin" <rick.c.hodgin@gmail.com> wrote:

> Do you have a YouTube example?  And an example that wil 
> synthesize in Icarus?  So we can see your method compares to a
> standard example.

There is perhaps some explanation in "Wave-Pipelining: A
Tutorial and Research Survey"[1], and "DESIGN AND TIMING
ANALYSIS OF WAVE PIPELINED CIRCUITS"[2].

Jan Coombs
-- 

[1] IEEE  Transactions on VLSI Systems 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf

[2] Recep Ozgun's MSc thesis
https://soar.wichita.edu/bitstream/handle/10057/383/t06064.pdf?sequence=3

Reply by Weng Tianxiang ●January 17, 20182018-01-17

On Tuesday, January 16, 2018 at 11:40:55 PM UTC-8, Jan Coombs wrote:
> On Sat, 13 Jan 2018 13:31:14 -0800 (PST)
> "Rick C. Hodgin" <rick.c.hodgin@gmail.com> wrote:
>=20
> > Do you have a YouTube example?  And an example that wil=20
> > synthesize in Icarus?  So we can see your method compares to a
> > standard example.
>=20
> There is perhaps some explanation in "Wave-Pipelining: A
> Tutorial and Research Survey"[1], and "DESIGN AND TIMING
> ANALYSIS OF WAVE PIPELINED CIRCUITS"[2].
>=20
> Jan Coombs
> --=20
>=20
> [1] IEEE  Transactions on VLSI Systems=20
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=3D10.1.1.90.1783&rep=3D=
rep1&type=3Dpdf
>=20
> [2] Recep Ozgun's MSc thesis
> https://soar.wichita.edu/bitstream/handle/10057/383/t06064.pdf?sequence=
=3D3

Hi Jan,

I appreciate your efforts to dig deep into my inventions.I would like to pa=
tiently answer all reasonable technical questions.=20

Your reference [1] is none but what activates my inspiration to resolve the=
 open problem: design both a coding and a synthesizing methods so that any =
logic design engineers, including college students with basic knowledge in =
HDL, can code and generate a wave-piplined circuit.

All published materials I have read are centered on how to eliminate data c=
ontamination, a special feature which is never heard in any non-wave-pipeli=
ned circuit design.

A data contamination is defined as a later entered data catches up an earli=
er entered data, damaging the earlier entered data.

What my inventions do is to build a bridge between code designers and synth=
esizers in order to code and generate a wave-pipelined circuit in the easie=
st way:

If a code designer provides all necessary and sufficient information to a s=
ynthesizer, the synthesizer should and can generate a wave-pipelined circui=
t as it is specified.=20

Your reference [1] (1998) at page 142 below table 1 indicates that "Last, d=
ue to a lack of commercial tools that are directly applicable to designs us=
ing wave-pipelining, each group has more or less developed in-house design =
analysis and optimization tools which enable VLSI design using wave-pipelin=
ing."

So I have assumed at the beginning of my project that if a new part on wave=
-pipelined circuit in HDL standard is well designed and laid out,any synthe=
sizer manufacturers have the ability to generate a wave-pipelined circuit. =
The assumption was also based on your reference [1] (1998) at table 1 at pa=
ge 142 where it indicates there are 30 wave-pipelined circuits (20 years ag=
o), none of their authors have any relationships with a professional synthe=
sizer manufacturer.

Furthermore during the development period I found that no matter how many t=
ypes of wave-pipelined circuits are in the past or in the future, each of a=
ll wave-pipelined circuits comprises two part, one is the critical path, pr=
esented by CPC (Critical Path Component), all remaining logic is always the=
 same for a group of wave-pipelined circuits WPC (Wave-Pipelining Component=
), depending on what target a designer wants for his circuit.

In my design no timings related to a wave-pipelined circuit appear, never, =
because they are within the scope of a synthesizer operation and have nothi=
ng to do with their coding.

There is no a commercial synthesizer in the world which can directly genera=
te a wave-pipelined circuit. To prove my WPCs are correct, I coded a CPC wh=
ich does nothing but passes the data in the critical path obeying a critica=
l path behavior: if the critical path needs 5 cycle for signals to travel, =
its output would be available in 6 cycles and if the critical path is block=
ed, a later entered data would have a chance to damage an earlier entered d=
ata if design is not right. So essentially I have no very sophisticated too=
ls used, nor timing analysis.

Thank you.

Weng

Reply by Weng Tianxiang ●January 19, 20182018-01-19

Hi,

I have told that my invention kernel idea is: A designer provides the least information and logic code about the critical path, and leaves all complex logic designs to a synthesizer and a system library that is what an HDL should do. 

Here are the technique key points that I have used used to fully develop my technique, assuming that you are an experienced code designer in HDL.

Even though the technique is tricky, but it is easy to understand if you fully understand the concepts in this and next posts, each in 20 or more minutes for 80% engineers here

Here I am using 64*64 bits signed multiplexer as the target circuit example.

1. If my CPC_1_2 code is presented to a synthesizer, the first question you may ask is how do you code your WPC (Wive-Pipelining Component). For clarity, I copied the CPC_1_2 code here again.

By the way, I claim that nobody can further simplify the CPC_1_2 code to deliver full information about a critical path to a synthesizer for generating a wave-pipelined circuit! If you can, please challenge my claim. 

entity CPC_1_2 is 
   generic (   
      input_data_width  : positive  := 64;                  -- optional 
      output_data_width : positive  := 128                  -- optional 
   ); 
   port ( 
      CLK   :  in std_logic; 
      WE_i  :  in std_logic;     -- '1': write enable to input registers A & B 
      Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A 
      Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B 
      WE_o_i:  in std_logic;  -- '1': write enable to output register C 
      Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C 
   ); 
end CPC_1_2; 

architecture A_CPC_1_2 of CPC_1_2 is 
   signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A 
   signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B 
   signal   Rc :  signed(output_data_width-1 downto 0); -- output register C 
   signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic 
    
begin 
   Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC 
   Dc_o  <= unsigned(Rc);        -- output through output register 

   p_1 : process(CLK) 
   begin 
      if Rising_edge(CLK) then 
         if WE_i = '1' then      -- WE_i = '1' : latch input data 
            Ra <= Da_i; 
            Rb <= Db_i; 
         end if; 
          
         if WE_O_I = '1' then    -- WE_O_I = '1': latch output data 
            Rc <= Cl; 
         end if; 
      end if; 
   end process; 
end A_CPC_1_2; 

2. Assume 3 situations:
a) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per cycle, you should know how to code the WPC for the circuit. Because we have already assumed that the synthesizer is capable of generating the wave-pipelined circuit for it, leaving most difficult task to the synthesizer. By definition a WPC contains all remaining logic for the circuit except the CPC_1_2. 

b) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per 2 cycles, you should know how to code the WPC for the circuit.

c) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per 2 cycles, but the designer wants the circuit to be able of accepting one data per cycle, not one data per 2 cycles, you should know how to code the WPC for the circuit with 2 copies of critical paths and each alternatively accepting an input data per 2 cycles. Actually all CPCs have 2 types of code patterns, CPC_1_2 is one of them and another CPC_3 is slightly complex, but is an off shelf coding pattern either.In this situation CPC_3 code would replace CPC_1_2 with same input and output interfaces.

Now the problem comes: how do you know all 3 unknown parameters before you code the WPC for the 64*64 bits signed multiplexer? I think that this is the key reason why so many wave-pipelined circuits have been generated, but none of the circuits designers can resolve the 50 years old open problem.

And the circuit may, should and can be any type of pipelined circuits!

To be continued.

I would like to listen to your questions and comments!

Weng

Reply by rickman ●January 19, 20182018-01-19

Weng Tianxiang wrote on 1/10/2018 8:56 PM:
> Hi,
>
> A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path from the input register passing through a piece of computational logic to the output register, and no intermediate registers.
>
> My invention kernel idea is: A designer provides the least information and logic code about the critical path, and leave all complex logic designs to a synthesizer and a system library that is what an HDL should do.
>
> All coding has 3 steps:
> 1. Write a Critical Path Component (CPC) with defined interface;
>
> 2. Call a Wave-Pipelining Component (WPC) provided by a system library;
>
> 3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is.
>
> Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.
>
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
> use work.wave_pipeline_package.all;
>
> -- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
> -- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
> -- link1(): generation would fail if the circuit cannot accept 1 data per cycle
> -- link2(): generation never fails and the circuit is capable of accepting 1 data per
> -- INPUT_CLOCK_NUMBER cycles
>
> entity CPC_1_2 is
>    generic (
>       input_data_width  : positive  := 64;                  -- optional
>       output_data_width : positive  := 128                  -- optional
>    );
>    port (
>       CLK   :  in std_logic;
>       WE_i  :  in std_logic;     -- '1': write enable to input registers A & B
>       Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
>       Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
>       WE_o_i:  in std_logic;  -- '1': write enable to output register C
>       Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C
>    );
> end CPC_1_2;
>
> architecture A_CPC_1_2 of CPC_1_2 is
>    signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
>    signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
>    signal   Rc :  signed(output_data_width-1 downto 0); -- output register C
>    signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic
>
> begin
>    Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC
>    Dc_o  <= unsigned(Rc);        -- output through output register
>
>    p_1 : process(CLK)
>    begin
>       if Rising_edge(CLK) then
>          if WE_i = '1' then      -- WE_i = '1' : latch input data
>             Ra <= Da_i;
>             Rb <= Db_i;
>          end if;
>
>          if WE_O_I = '1' then    -- WE_O_I = '1': latch output data
>             Rc <= Cl;
>          end if;
>       end if;
>    end process;
>
> --------------------------------------------------------------------------------
>
> end A_CPC_1_2;
>
> In summary, after HDL adopting my system, writing a wave-pipelined circuit is simple as writing a one-cycle logic circuit.
>
> Thank you.
>
> Weng

What is SMB?

I think I understand the concept of wave pipelining.  It is just eliminating 
the intermediate registers of a pipeline circuit and designing the 
combinational logic so that the delays are even enough across the many paths 
so the output can be clocked at a given time and will receive a stable 
result from the input N clocks earlier.  In other words, the logic is 
designed so that the changes rippling through the logic never catch up to 
the changes created by the data entered 1 clock cycle earlier.  Nice if you 
can do it.

I can see where this would be useful in an ASIC.  In ASICs FFs and logic 
compete for space within the chip.  In FPGAs the ratio between FFs and logic 
are fixed and predetermined.  So using logic without using the FFs that are 
already there is not of much value.

-- 

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Reply by Jan Coombs ●January 20, 20182018-01-20

On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

   ...

> I think I understand the concept of wave pipelining.  It is
> just eliminating the intermediate registers of a pipeline
> circuit and designing the combinational logic so that the
> delays are even enough across the many paths so the output can
> be clocked at a given time and will receive a stable result
> from the input N clocks earlier.  In other words, the logic is
> designed so that the changes rippling through the logic never
> catch up to the changes created by the data entered 1 clock
> cycle earlier.  Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation. 

> I can see where this would be useful in an ASIC.  In ASICs FFs
> and logic compete for space within the chip.  In FPGAs the
> ratio between FFs and logic are fixed and predetermined.  So
> using logic without using the FFs that are already there is
> not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles. 

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted. 

Either way there must be a great deal of complex floor planning
and/or timing constraints needed to make this work. Automating
this would be amazing?

Jan Coombs

Reply by rickman ●January 20, 20182018-01-20

Jan Coombs wrote on 1/20/2018 2:20 PM:
> On Fri, 19 Jan 2018 17:42:57 -0500
> rickman <gnuarm.deletethisbit@gmail.com> wrote:
>
>    ...
>
>> I think I understand the concept of wave pipelining.  It is
>> just eliminating the intermediate registers of a pipeline
>> circuit and designing the combinational logic so that the
>> delays are even enough across the many paths so the output can
>> be clocked at a given time and will receive a stable result
>> from the input N clocks earlier.  In other words, the logic is
>> designed so that the changes rippling through the logic never
>> catch up to the changes created by the data entered 1 clock
>> cycle earlier.  Nice if you can do it.
>
> Thanks, interesting, but sounds complex to get reliable
> operation.
>
>> I can see where this would be useful in an ASIC.  In ASICs FFs
>> and logic compete for space within the chip.  In FPGAs the
>> ratio between FFs and logic are fixed and predetermined.  So
>> using logic without using the FFs that are already there is
>> not of much value.
>
> Generally true, but
>
> 1) You might be able to combine three stages that require 2/3 of
> a clock cycle for maximum propagation delay, and get the result
> in in the time of two clock cycles.

If your stages are only using 2/3 of a clock, you can regroup the logic to 
make it 1 clock each in two stages.  There is supposed to be software to 
handle that for you although I've never used it.


> 2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
> each tile can be a latch or a LUT, so flops are not wasted.

There's your first mistake, no one uses Actel/Microsemi FPGAs.  They long 
for the day they are as big as Lattice, lol!


> Either way there must be a great deal of complex floor planning
> and/or timing constraints needed to make this work. Automating
> this would be amazing?

Isn't that what the OP is claiming?  I'm surprised he could make this work 
over PVT.  The actual stable time has to be on a clock edge, the same clock 
edge under all conditions.  I wouldn't want to try that manually in a simple 
circuit.

-- 

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Previous12 3 4 5 Next

My invention: Coding wave-pipelined circuits with buffering function in HDL

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group