# My invention: Coding wave-pipelined circuits with buffering function in HDL

Started by January 10, 2018
```Hi,

A wive-pipelined circuit has the same logic as its pipeline counterpart exc=
ept that the wive-pipelined circuit has only one stage, a critical path fro=
m the input register passing through a piece of computational logic to the =
output register, and no intermediate registers.

My invention kernel idea is: A designer provides the least information and =
logic code about the critical path, and leave all complex logic designs to =
a synthesizer and a system library that is what an HDL should do.

All coding has 3 steps:
1. Write a Critical Path Component (CPC) with defined interface;

2. Call a Wave-Pipelining Component (WPC) provided by a system library;

3. Call one of 3 link statement to link a CPC instantiation with a paired W=
PC instantiation to specify what your target is.

Here is the all code on a 64*64 bits signed integer multiplier C <=3D A*B.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.wave_pipeline_package.all;

-- CPC code for wave-pipelined 64-bit signed integer multiplier C <=3D A*B
-- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in=
VHDL
-- link1(): generation would fail if the circuit cannot accept 1 data per c=
ycle
-- link2(): generation never fails and the circuit is capable of accepting =
1 data per=20
-- INPUT_CLOCK_NUMBER cycles

entity CPC_1_2 is=20
generic (  =20
input_data_width  : positive  :=3D 64;                  -- optional
output_data_width : positive  :=3D 128                  -- optional
);
port (
CLK   :  in std_logic;
WE_i  :  in std_logic;     -- '1': write enable to input registers A =
& B=20
Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
WE_o_i:  in std_logic;  -- '1': write enable to output register C
Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data =
C
);
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
signal   Rc :  signed(output_data_width-1 downto 0); -- output register =
C
signal   Cl :  signed(output_data_width-1 downto 0); -- combinational lo=
gic
=20
begin
Cl    <=3D Ra * Rb;             -- combinational logic output, key part =
of CPC
Dc_o  <=3D unsigned(Rc);        -- output through output register

p_1 : process(CLK)
begin
if Rising_edge(CLK) then
if WE_i =3D '1' then      -- WE_i =3D '1' : latch input data
Ra <=3D Da_i;
Rb <=3D Db_i;
end if;
=20
if WE_O_I =3D '1' then    -- WE_O_I =3D '1': latch output data
Rc <=3D Cl;
end if;
end if;
end process;

---------------------------------------------------------------------------=
-----

end A_CPC_1_2;

In summary, after HDL adopting my system, writing a wave-pipelined circuit =
is simple as writing a one-cycle logic circuit.

Thank you.

Weng

```
```On Wednesday, January 10, 2018 at 5:56:45 PM UTC-8, Weng Tianxiang wrote:
> Hi,
>
> A wive-pipelined circuit has the same logic as its pipeline counterpart except
that the wive-pipelined circuit has only one stage, a critical path from the input
register passing through a piece of computational logic to the output register, and
no intermediate registers.
>
> My invention kernel idea is: A designer provides the least information and logic
code about the critical path, and leave all complex logic designs to a synthesizer
and a system library that is what an HDL should do.
>
> All coding has 3 steps:
> 1. Write a Critical Path Component (CPC) with defined interface;
>
> 2. Call a Wave-Pipelining Component (WPC) provided by a system library;
>
> 3. Call one of 3 link statement to link a CPC instantiation with a paired WPC
instantiation to specify what your target is.
>
> Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.
>
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
> use work.wave_pipeline_package.all;
>
> -- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
> -- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
> -- link1(): generation would fail if the circuit cannot accept 1 data per cycle
> -- link2(): generation never fails and the circuit is capable of accepting 1 data
per
> -- INPUT_CLOCK_NUMBER cycles
>
> entity CPC_1_2 is
>    generic (
>       input_data_width  : positive  := 64;                  -- optional
>       output_data_width : positive  := 128                  -- optional
>    );
>    port (
>       CLK   :  in std_logic;
>       WE_i  :  in std_logic;     -- '1': write enable to input registers A & B
>       Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
>       Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
>       WE_o_i:  in std_logic;  -- '1': write enable to output register C
>       Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C
>    );
> end CPC_1_2;
>
> architecture A_CPC_1_2 of CPC_1_2 is
>    signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
>    signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
>    signal   Rc :  signed(output_data_width-1 downto 0); -- output register C
>    signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic
>
> begin
>    Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC
>    Dc_o  <= unsigned(Rc);        -- output through output register
>
>    p_1 : process(CLK)
>    begin
>       if Rising_edge(CLK) then
>          if WE_i = '1' then      -- WE_i = '1' : latch input data
>             Ra <= Da_i;
>             Rb <= Db_i;
>          end if;
>
>          if WE_O_I = '1' then    -- WE_O_I = '1': latch output data
>             Rc <= Cl;
>          end if;
>       end if;
>    end process;
>
> --------------------------------------------------------------------------------
>
> end A_CPC_1_2;
>
> In summary, after HDL adopting my system, writing a wave-pipelined circuit is
simple as writing a one-cycle logic circuit.
>
> Thank you.
>
> Weng

Hi,

The following information is from Wikipedia:

1. The Intel 8087, announced in 1980, was the first x87 floating-point coprocessor
for the 8086 line of microprocessors.

2. MMX is a single instruction, multiple data (SIMD) instruction set designed by
Intel, introduced in 1997 with its P5-based Pentium line of microprocessors,
designated as "Pentium with MMX Technology". It developed out of a similar unit
introduced on the Intel i860, and earlier the Intel i750 video pixel processor.
MMX is a processor supplementary capability that is supported on recent IA-32
processors by Intel and other vendors.

MMX has subsequently been extended by several programs by Intel and others: 3DNow!,
Streaming SIMD Extensions (SSE), and ongoing revisions of Advanced Vector Extensions
(AVX).

8087's floating 64-bit multiplier needs 5 cycles to finish a data processing with
one input data per cycle.

MMX floating 64-bit floating multiplier needs 4 cycles to finish a data processing
with one set of input data per 2 cycles.

Because each multiplier needs one multiplicand A and one multiplier B to get the
result C, so naturally many testing benches claim MMX 64-bit floating multiplier is
20% faster than 8087 (4 cycles vs 5 cycles).

With my invention, any college students with knowledge of HDL can write a MMX
wave-pipelined 64-bit floating multiplier within half an hour under following
conditions:

1. My invented system is fully accepted to HDL;

2. Synthesizer manufacturers have updated their products to handle the generation of
related wave-pipelined circuits.
All related technology and algorithms are available off selves.

3. It needs time.

One of wonderful wave-pipelined circuits I think may be 16 channels FFT processor
with wave-pipelined technology: the benefits are faster running frequency and a lot
of saving in respect of logic area and power consumption.

Thank you.

Weng

```
```On Saturday, January 13, 2018 at 1:31:17 PM UTC-8, Rick C. Hodgin wrote:
> Do you have a YouTube example?  And an example that wil
> synthesize in Icarus?  So we can see your method compares to a
> standard example.
>
> --
> Rick C. Hodgin

Hi Rick,

Actually I have got 3 patents issued for the subject:

1. 9,747,252: Systematic method of coding wave-pipelined circuits in HDL.
2. 9,734,127: Systematic method of synthesizing wave-pipelined circuits in HDL.
3. 9,575,929: Apparatus of wave-pipelined circuits.

All 3 patents have the same specification, drawings, abstract with different claims

Here is my new non-provisional patent application 15,861,093 (application,
hereafter), "Coding wave-pipelined circuits with buffering function in HDL", filed
to USPTO on 2018/01/03.

The non-provisional patent application 15,861,093 has a *txt (*.vhd) file attached
so that they are not secrets and any persons who are interested in the subject can
email me to get what he wants, I would email the file set to him, even full
application set will be published 18 months later.

The following is part of my sell-promotional file to some big companies:

"The new application can be viewed in some extents as the continuation of the 3
patents logically, but legally it is a brand new invention devoting the main
attention to coding buffering function for wave-pipelined circuits in HDL, a topic
never mentioned in the 3 patents, while it is still paying great attention to
improve the 3 patents to make them more robust, friendlier and more complete in
point of view from coding designers."

In the 3 previous patents a first version of source code was attached, the new
application provides the second version. With the 2nd version of VHDL source code
available you can use a VHDL-2002 or above simulator to simulate all workings and
generate waves. The source file is also well noted with inserted debugging function
code.

Please email me what you want me to send:
for 3 patents:
1.1 Specification

1.2. 3 sets of claims.

1.3. Drawings.

1.4. Source code.

1.5. ZIP file of all above.

For new application:
2.1 Specification.

2.2. claims.

2.3. Drawings.

2.4. Abstract.

2.5. Source code.

2.6. ZIP file of all above.

For the new application, specification has 81 pages, 48 claims have 15 pages and
drawings have 24 pages.

If you lack time, the best way to learn all working structures needs only 2.1
Specification; 2.3. Drawings; and 2.4. Abstract.

Because the target of my patents and new application is a) to make my invented
system as part of HDL (not only VHDL, but all languages in HDL), and b) to make the
source code as part of system library in HDL, I am willing to distribute my code and
all related files to any persons who are really interested in how I did it.

Through CPC_1_2 you may know that my scheme needs the least logic information and
coding from a designer to resolve a very difficult problem, an almost 50-years open
problem.

My Email address is wtx wtx @ gmail . com (please remove spaces between characters)

Thank you.

Weng

```
```On Wednesday, January 10, 2018 at 5:56:45 PM UTC-8, Weng Tianxiang wrote:
> Hi,
>
> A wive-pipelined circuit has the same logic as its pipeline counterpart except
that the wive-pipelined circuit has only one stage, a critical path from the input
register passing through a piece of computational logic to the output register, and
no intermediate registers.
>
> My invention kernel idea is: A designer provides the least information and logic
code about the critical path, and leave all complex logic designs to a synthesizer
and a system library that is what an HDL should do.
>
> All coding has 3 steps:
> 1. Write a Critical Path Component (CPC) with defined interface;
>
> 2. Call a Wave-Pipelining Component (WPC) provided by a system library;
>
> 3. Call one of 3 link statement to link a CPC instantiation with a paired WPC
instantiation to specify what your target is.
>
> Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.
>
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
> use work.wave_pipeline_package.all;
>
> -- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
> -- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
> -- link1(): generation would fail if the circuit cannot accept 1 data per cycle
> -- link2(): generation never fails and the circuit is capable of accepting 1 data
per
> -- INPUT_CLOCK_NUMBER cycles
>
> entity CPC_1_2 is
>    generic (
>       input_data_width  : positive  := 64;                  -- optional
>       output_data_width : positive  := 128                  -- optional
>    );
>    port (
>       CLK   :  in std_logic;
>       WE_i  :  in std_logic;     -- '1': write enable to input registers A & B
>       Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
>       Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
>       WE_o_i:  in std_logic;  -- '1': write enable to output register C
>       Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C
>    );
> end CPC_1_2;
>
> architecture A_CPC_1_2 of CPC_1_2 is
>    signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
>    signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
>    signal   Rc :  signed(output_data_width-1 downto 0); -- output register C
>    signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic
>
> begin
>    Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC
>    Dc_o  <= unsigned(Rc);        -- output through output register
>
>    p_1 : process(CLK)
>    begin
>       if Rising_edge(CLK) then
>          if WE_i = '1' then      -- WE_i = '1' : latch input data
>             Ra <= Da_i;
>             Rb <= Db_i;
>          end if;
>
>          if WE_O_I = '1' then    -- WE_O_I = '1': latch output data
>             Rc <= Cl;
>          end if;
>       end if;
>    end process;
>
> --------------------------------------------------------------------------------
>
> end A_CPC_1_2;
>
> In summary, after HDL adopting my system, writing a wave-pipelined circuit is
simple as writing a one-cycle logic circuit.
>
> Thank you.
>
> Weng

Hi,

Here is more information on WPC (Wave-Pipelining Component) provided by a system
library (I wroted).

1. There are only 2 WPCs to cover all wave-piplined circuits:
a) It is used for the situation under which only one critical path is used.
b) It is used for the situation under which more than one same critical path is
used.

2. There are 5 types of structures of all wave-pipelined circuits based on my
classification:
a) A one cycle non-pipelining circuit when it is coded as a wave-pipelined
circuit, but finally it turns out to be a 1-cycle regular circuit.

b) A wave-pipelined circuit that can accept one input data per cycle with one
critical path.

c) A wave-pipelined circuit that can accept one input data per multiple cycles
with one critical path.

d) A wave-pipelined circuit that can accept one input data per cycle with more
than one critical path, each critical path having an input register and an output
register.

e) A wave-pipelined circuit that can accept one input data per cycle with more
than one critical path, each critical path having an input register and sharing a
sole output register.

3. The method guarantees 100% success rate for generating a specific wave-pipelined
circuit.

Thank you.

Weng

```
```On Sat, 13 Jan 2018 13:31:14 -0800 (PST)
"Rick C. Hodgin" <rick.c.hodgin@gmail.com> wrote:

> Do you have a YouTube example?  And an example that wil
> synthesize in Icarus?  So we can see your method compares to a
> standard example.

There is perhaps some explanation in "Wave-Pipelining: A
Tutorial and Research Survey", and "DESIGN AND TIMING
ANALYSIS OF WAVE PIPELINED CIRCUITS".

Jan Coombs
--

 IEEE  Transactions on VLSI Systems
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.1783&rep=rep1&type=pdf

 Recep Ozgun's MSc thesis
https://soar.wichita.edu/bitstream/handle/10057/383/t06064.pdf?sequence=3

```
```On Tuesday, January 16, 2018 at 11:40:55 PM UTC-8, Jan Coombs wrote:
> On Sat, 13 Jan 2018 13:31:14 -0800 (PST)
> "Rick C. Hodgin" <rick.c.hodgin@gmail.com> wrote:
>=20
> > Do you have a YouTube example?  And an example that wil=20
> > synthesize in Icarus?  So we can see your method compares to a
> > standard example.
>=20
> There is perhaps some explanation in "Wave-Pipelining: A
> Tutorial and Research Survey", and "DESIGN AND TIMING
> ANALYSIS OF WAVE PIPELINED CIRCUITS".
>=20
> Jan Coombs
> --=20
>=20
>  IEEE  Transactions on VLSI Systems=20
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=3D10.1.1.90.1783&rep=3D=
rep1&type=3Dpdf
>=20
>  Recep Ozgun's MSc thesis
> https://soar.wichita.edu/bitstream/handle/10057/383/t06064.pdf?sequence=
=3D3

Hi Jan,

I appreciate your efforts to dig deep into my inventions.I would like to pa=
tiently answer all reasonable technical questions.=20

Your reference  is none but what activates my inspiration to resolve the=
open problem: design both a coding and a synthesizing methods so that any =
logic design engineers, including college students with basic knowledge in =
HDL, can code and generate a wave-piplined circuit.

All published materials I have read are centered on how to eliminate data c=
ontamination, a special feature which is never heard in any non-wave-pipeli=
ned circuit design.

A data contamination is defined as a later entered data catches up an earli=
er entered data, damaging the earlier entered data.

What my inventions do is to build a bridge between code designers and synth=
esizers in order to code and generate a wave-pipelined circuit in the easie=
st way:

If a code designer provides all necessary and sufficient information to a s=
ynthesizer, the synthesizer should and can generate a wave-pipelined circui=
t as it is specified.=20

Your reference  (1998) at page 142 below table 1 indicates that "Last, d=
ue to a lack of commercial tools that are directly applicable to designs us=
ing wave-pipelining, each group has more or less developed in-house design =
analysis and optimization tools which enable VLSI design using wave-pipelin=
ing."

So I have assumed at the beginning of my project that if a new part on wave=
-pipelined circuit in HDL standard is well designed and laid out,any synthe=
sizer manufacturers have the ability to generate a wave-pipelined circuit. =
The assumption was also based on your reference  (1998) at table 1 at pa=
ge 142 where it indicates there are 30 wave-pipelined circuits (20 years ag=
o), none of their authors have any relationships with a professional synthe=
sizer manufacturer.

Furthermore during the development period I found that no matter how many t=
ypes of wave-pipelined circuits are in the past or in the future, each of a=
ll wave-pipelined circuits comprises two part, one is the critical path, pr=
esented by CPC (Critical Path Component), all remaining logic is always the=
same for a group of wave-pipelined circuits WPC (Wave-Pipelining Component=
), depending on what target a designer wants for his circuit.

In my design no timings related to a wave-pipelined circuit appear, never, =
because they are within the scope of a synthesizer operation and have nothi=
ng to do with their coding.

There is no a commercial synthesizer in the world which can directly genera=
te a wave-pipelined circuit. To prove my WPCs are correct, I coded a CPC wh=
ich does nothing but passes the data in the critical path obeying a critica=
l path behavior: if the critical path needs 5 cycle for signals to travel, =
its output would be available in 6 cycles and if the critical path is block=
ed, a later entered data would have a chance to damage an earlier entered d=
ata if design is not right. So essentially I have no very sophisticated too=
ls used, nor timing analysis.

Thank you.

Weng
```
```Hi,

I have told that my invention kernel idea is: A designer provides the least
information and logic code about the critical path, and leaves all complex logic
designs to a synthesizer and a system library that is what an HDL should do.

Here are the technique key points that I have used used to fully develop my
technique, assuming that you are an experienced code designer in HDL.

Even though the technique is tricky, but it is easy to understand if you fully
understand the concepts in this and next posts, each in 20 or more minutes for 80%
engineers here

Here I am using 64*64 bits signed multiplexer as the target circuit example.

1. If my CPC_1_2 code is presented to a synthesizer, the first question you may ask
is how do you code your WPC (Wive-Pipelining Component). For clarity, I copied the
CPC_1_2 code here again.

By the way, I claim that nobody can further simplify the CPC_1_2 code to deliver
full information about a critical path to a synthesizer for generating a
wave-pipelined circuit! If you can, please challenge my claim.

entity CPC_1_2 is
generic (
input_data_width  : positive  := 64;                  -- optional
output_data_width : positive  := 128                  -- optional
);
port (
CLK   :  in std_logic;
WE_i  :  in std_logic;     -- '1': write enable to input registers A & B
Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
WE_o_i:  in std_logic;  -- '1': write enable to output register C
Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C
);
end CPC_1_2;

architecture A_CPC_1_2 of CPC_1_2 is
signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
signal   Rc :  signed(output_data_width-1 downto 0); -- output register C
signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic

begin
Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC
Dc_o  <= unsigned(Rc);        -- output through output register

p_1 : process(CLK)
begin
if Rising_edge(CLK) then
if WE_i = '1' then      -- WE_i = '1' : latch input data
Ra <= Da_i;
Rb <= Db_i;
end if;

if WE_O_I = '1' then    -- WE_O_I = '1': latch output data
Rc <= Cl;
end if;
end if;
end process;
end A_CPC_1_2;

2. Assume 3 situations:
a) If you know that each data needs 5 cycles to pass the 64*64 bits signed
multiplexer and the circuit can accept one data per cycle, you should know how to
code the WPC for the circuit. Because we have already assumed that the synthesizer
is capable of generating the wave-pipelined circuit for it, leaving most difficult
task to the synthesizer. By definition a WPC contains all remaining logic for the
circuit except the CPC_1_2.

b) If you know that each data needs 5 cycles to pass the 64*64 bits signed
multiplexer and the circuit can accept one data per 2 cycles, you should know how to
code the WPC for the circuit.

c) If you know that each data needs 5 cycles to pass the 64*64 bits signed
multiplexer and the circuit can accept one data per 2 cycles, but the designer wants
the circuit to be able of accepting one data per cycle, not one data per 2 cycles,
you should know how to code the WPC for the circuit with 2 copies of critical paths
and each alternatively accepting an input data per 2 cycles. Actually all CPCs have
2 types of code patterns, CPC_1_2 is one of them and another CPC_3 is slightly
complex, but is an off shelf coding pattern either.In this situation CPC_3 code
would replace CPC_1_2 with same input and output interfaces.

Now the problem comes: how do you know all 3 unknown parameters before you code the
WPC for the 64*64 bits signed multiplexer? I think that this is the key reason why
so many wave-pipelined circuits have been generated, but none of the circuits
designers can resolve the 50 years old open problem.

And the circuit may, should and can be any type of pipelined circuits!

To be continued.

I would like to listen to your questions and comments!

Weng

```
```Weng Tianxiang wrote on 1/10/2018 8:56 PM:
> Hi,
>
> A wive-pipelined circuit has the same logic as its pipeline counterpart except
that the wive-pipelined circuit has only one stage, a critical path from the input
register passing through a piece of computational logic to the output register, and
no intermediate registers.
>
> My invention kernel idea is: A designer provides the least information and logic
code about the critical path, and leave all complex logic designs to a synthesizer
and a system library that is what an HDL should do.
>
> All coding has 3 steps:
> 1. Write a Critical Path Component (CPC) with defined interface;
>
> 2. Call a Wave-Pipelining Component (WPC) provided by a system library;
>
> 3. Call one of 3 link statement to link a CPC instantiation with a paired WPC
instantiation to specify what your target is.
>
> Here is the all code on a 64*64 bits signed integer multiplier C <= A*B.
>
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
> use work.wave_pipeline_package.all;
>
> -- CPC code for wave-pipelined 64-bit signed integer multiplier C <= A*B
> -- CPC_1_2 is linked with SMB by link1() / link2() if "wave" is accepted in VHDL
> -- link1(): generation would fail if the circuit cannot accept 1 data per cycle
> -- link2(): generation never fails and the circuit is capable of accepting 1 data
per
> -- INPUT_CLOCK_NUMBER cycles
>
> entity CPC_1_2 is
>    generic (
>       input_data_width  : positive  := 64;                  -- optional
>       output_data_width : positive  := 128                  -- optional
>    );
>    port (
>       CLK   :  in std_logic;
>       WE_i  :  in std_logic;     -- '1': write enable to input registers A & B
>       Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A
>       Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B
>       WE_o_i:  in std_logic;  -- '1': write enable to output register C
>       Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C
>    );
> end CPC_1_2;
>
> architecture A_CPC_1_2 of CPC_1_2 is
>    signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A
>    signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B
>    signal   Rc :  signed(output_data_width-1 downto 0); -- output register C
>    signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic
>
> begin
>    Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC
>    Dc_o  <= unsigned(Rc);        -- output through output register
>
>    p_1 : process(CLK)
>    begin
>       if Rising_edge(CLK) then
>          if WE_i = '1' then      -- WE_i = '1' : latch input data
>             Ra <= Da_i;
>             Rb <= Db_i;
>          end if;
>
>          if WE_O_I = '1' then    -- WE_O_I = '1': latch output data
>             Rc <= Cl;
>          end if;
>       end if;
>    end process;
>
> --------------------------------------------------------------------------------
>
> end A_CPC_1_2;
>
> In summary, after HDL adopting my system, writing a wave-pipelined circuit is
simple as writing a one-cycle logic circuit.
>
> Thank you.
>
> Weng

What is SMB?

I think I understand the concept of wave pipelining.  It is just eliminating
the intermediate registers of a pipeline circuit and designing the
combinational logic so that the delays are even enough across the many paths
so the output can be clocked at a given time and will receive a stable
result from the input N clocks earlier.  In other words, the logic is
designed so that the changes rippling through the logic never catch up to
the changes created by the data entered 1 clock cycle earlier.  Nice if you
can do it.

I can see where this would be useful in an ASIC.  In ASICs FFs and logic
compete for space within the chip.  In FPGAs the ratio between FFs and logic
are fixed and predetermined.  So using logic without using the FFs that are
already there is not of much value.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
```
```On Fri, 19 Jan 2018 17:42:57 -0500
rickman <gnuarm.deletethisbit@gmail.com> wrote:

...

> I think I understand the concept of wave pipelining.  It is
> just eliminating the intermediate registers of a pipeline
> circuit and designing the combinational logic so that the
> delays are even enough across the many paths so the output can
> be clocked at a given time and will receive a stable result
> from the input N clocks earlier.  In other words, the logic is
> designed so that the changes rippling through the logic never
> catch up to the changes created by the data entered 1 clock
> cycle earlier.  Nice if you can do it.

Thanks, interesting, but sounds complex to get reliable
operation.

> I can see where this would be useful in an ASIC.  In ASICs FFs
> and logic compete for space within the chip.  In FPGAs the
> ratio between FFs and logic are fixed and predetermined.  So
> using logic without using the FFs that are already there is
> not of much value.

Generally true, but

1) You might be able to combine three stages that require 2/3 of
a clock cycle for maximum propagation delay, and get the result
in in the time of two clock cycles.

2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
each tile can be a latch or a LUT, so flops are not wasted.

Either way there must be a great deal of complex floor planning
and/or timing constraints needed to make this work. Automating
this would be amazing?

Jan Coombs

```
```Jan Coombs wrote on 1/20/2018 2:20 PM:
> On Fri, 19 Jan 2018 17:42:57 -0500
> rickman <gnuarm.deletethisbit@gmail.com> wrote:
>
>    ...
>
>> I think I understand the concept of wave pipelining.  It is
>> just eliminating the intermediate registers of a pipeline
>> circuit and designing the combinational logic so that the
>> delays are even enough across the many paths so the output can
>> be clocked at a given time and will receive a stable result
>> from the input N clocks earlier.  In other words, the logic is
>> designed so that the changes rippling through the logic never
>> catch up to the changes created by the data entered 1 clock
>> cycle earlier.  Nice if you can do it.
>
> Thanks, interesting, but sounds complex to get reliable
> operation.
>
>> I can see where this would be useful in an ASIC.  In ASICs FFs
>> and logic compete for space within the chip.  In FPGAs the
>> ratio between FFs and logic are fixed and predetermined.  So
>> using logic without using the FFs that are already there is
>> not of much value.
>
> Generally true, but
>
> 1) You might be able to combine three stages that require 2/3 of
> a clock cycle for maximum propagation delay, and get the result
> in in the time of two clock cycles.

If your stages are only using 2/3 of a clock, you can regroup the logic to
make it 1 clock each in two stages.  There is supposed to be software to
handle that for you although I've never used it.

> 2) If the Microsemi/Actel Igloo/Smartfusion FPGAs are used then
> each tile can be a latch or a LUT, so flops are not wasted.

There's your first mistake, no one uses Actel/Microsemi FPGAs.  They long
for the day they are as big as Lattice, lol!

> Either way there must be a great deal of complex floor planning
> and/or timing constraints needed to make this work. Automating
> this would be amazing?

Isn't that what the OP is claiming?  I'm surprised he could make this work
over PVT.  The actual stable time has to be on a clock edge, the same clock
edge under all conditions.  I wouldn't want to try that manually in a simple
circuit.

--

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998
```