comp.arch.fpga | My invention: Coding wave-pipelined circuits with buffering function in HDL| page 3

Reply by rickman ●January 22, 20182018-01-22

Weng Tianxiang wrote on 1/22/2018 1:39 PM:
>> The multiplier is not a good example to use as many FPGAs contain multiplier
>> blocks.  But then they are pipelined and so won't work in a non-pipelined
>> solution, so maybe you can show your technique even if it has little
>> practical value in this case.
>>
>> Rick C
>>
>
> What I patented in my patents is a method on how to code a wave-pipelined circuit in HDL (not only in VHDL, but all HDLs) by a circuit designer, nothing else. If you slightly change the code, a 64x64 bits floating multiplier can be generated!!!

I'm not sure what you are talking about.  The code example you gave is 
exactly the same code anyone would write for a multiplier.  The HDL is no 
different.  So how can the patent be about how to code the wave-pipelined 
circuit?  Or did I miss something in the code?


> If anybody uses HDL to code, he has nothing to do with PVT, never put PVT into consideration, not me, not you, nobody does it!!! That is other ones' business.

I have no idea what you are talking about.  HDL is used to design FPGAs and 
ASICs.  Part of that design process is meeting timing.  Someone, somewhere 
has to make the timing work.  The tool vendor provides timing data 
accessible through a timing analysis tool that can be applied to your 
synthesized design.  But if you wish to do wave-pipelined design the logic 
has to be constructed in a way to balance the timing delays so the 
uncertainty at the output of the combinatorial circuit fits within a clock 
cycle *including the variation in timing from PVT*!!!  So it is impossible 
to do wave-pipelined design without considering PVT effects on the timing.

I have no idea what you mean by it "is other ones' business".  This has to 
be defined for a wave-pipelined design to work.  THAT is where the work is, 
not in talking about the HDL which is the same as for a non pipelined design.


> Based on my method what you need to do is that you just describe the logic for the critical path, and call a library to finish your job, nothing else,  all others are left to Xilinx or Altera to do!

Then you have done nothing...


> If you are really interested in a real good FPGA example, I recommend you reading following one paper on website:
> Wave-pipelined intra-chip signaling for on-FPGA communications
> http://www.doc.ic.ac.uk/~wl/papers/10/integration10tm.pdf
>
> There are numerous circuits in FPGA that are worth being the wave-pipelined circuits.

I would like to read your patent to see just what you are patenting.

-- 

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Reply by Weng Tianxiang ●January 22, 20182018-01-22

On Friday, January 19, 2018 at 8:45:26 AM UTC-8, Weng Tianxiang wrote:
> 
> 1. If my CPC_1_2 code is presented to a synthesizer, the first question you may ask is how do you code your WPC (Wive-Pipelining Component). For clarity, I copied the CPC_1_2 code here again.
> 
> By the way, I claim that nobody can further simplify the CPC_1_2 code to deliver full information about a critical path to a synthesizer for generating a wave-pipelined circuit! If you can, please challenge my claim. 
> 
> entity CPC_1_2 is 
>    generic (   
>       input_data_width  : positive  := 64;                  -- optional 
>       output_data_width : positive  := 128                  -- optional 
>    ); 
>    port ( 
>       CLK   :  in std_logic; 
>       WE_i  :  in std_logic;     -- '1': write enable to input registers A & B 
>       Da_i  :  in signed(input_data_width-1 downto 0);      -- input data A 
>       Db_i  :  in signed(input_data_width-1 downto 0);      -- input data B 
>       WE_o_i:  in std_logic;  -- '1': write enable to output register C 
>       Dc_o  :  out unsigned(output_data_width -1 downto 0)  -- output data C 
>    ); 
> end CPC_1_2; 
> 
> architecture A_CPC_1_2 of CPC_1_2 is 
>    signal   Ra :  signed(input_data_width-1 downto 0);  -- input register A 
>    signal   Rb :  signed(input_data_width-1 downto 0);  -- input register B 
>    signal   Rc :  signed(output_data_width-1 downto 0); -- output register C 
>    signal   Cl :  signed(output_data_width-1 downto 0); -- combinational logic 
>     
> begin 
>    Cl    <= Ra * Rb;             -- combinational logic output, key part of CPC 
>    Dc_o  <= unsigned(Rc);        -- output through output register 
> 
>    p_1 : process(CLK) 
>    begin 
>       if Rising_edge(CLK) then 
>          if WE_i = '1' then      -- WE_i = '1' : latch input data 
>             Ra <= Da_i; 
>             Rb <= Db_i; 
>          end if; 
>           
>          if WE_O_I = '1' then    -- WE_O_I = '1': latch output data 
>             Rc <= Cl; 
>          end if; 
>       end if; 
>    end process; 
> end A_CPC_1_2; 
> 
> 2. Assume 3 situations:
> a) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per cycle, you should know how to code the WPC for the circuit. Because we have already assumed that the synthesizer is capable of generating the wave-pipelined circuit for it, leaving most difficult task to the synthesizer. By definition a WPC contains all remaining logic for the circuit except the CPC_1_2. 
> 
> b) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per 2 cycles, you should know how to code the WPC for the circuit.
> 
> c) If you know that each data needs 5 cycles to pass the 64*64 bits signed multiplexer and the circuit can accept one data per 2 cycles, but the designer wants the circuit to be able of accepting one data per cycle, not one data per 2 cycles, you should know how to code the WPC for the circuit with 2 copies of critical paths and each alternatively accepting an input data per 2 cycles. Actually all CPCs have 2 types of code patterns, CPC_1_2 is one of them and another CPC_3 is slightly complex, but is an off shelf coding pattern either.In this situation CPC_3 code would replace CPC_1_2 with same input and output interfaces.
> 
> Now the problem comes: how do you know all 3 unknown parameters before you code the WPC for the 64*64 bits signed multiplexer? I think that this is the key reason why so many wave-pipelined circuits have been generated, but none of the circuits designers can resolve the 50 years old open problem.
> 
> And the circuit may, should and can be any type of pipelined circuits!
> 
> To be continued.
> 
> Weng

All coding for a wave-pipelined circuit has 3 steps: 
1. Write a Critical Path Component (CPC) with defined interface; 

2. Call a Wave-Pipelining Component (WPC) provided by a system library; 

3. Call one of 3 link statement to link a CPC instantiation with a paired WPC instantiation to specify what your target is. 

Now first it is assumed that CPC_1_2 coding is finished, because when coding a new circuit it is very clear what is to code and the coding of a CPC for a wave-pipelined circuit never has any problem, leaving all special features related to critical path to a synthesizer to do, including correcting critical path unbalance logically. 

Now second I am trying to code its WPC. 

When coding the 64*64 bits signed multiplexer, I have listed 3 situations as shown above in each of which there is an unknown constant before coding its WPC.

In case 1) each data needs 5 cycles to pass the 64*64 bits signed multiplexer. The 5 cycles isn't known until a synthesizer has analyzed the critical path.

In case b) the circuit can accept a data per 2 cycles.

In case c) multiple copies of a same critical path is 2.

Conventional method is:
fix the number as 3, 4, 5, 6,...then write the WPC code, synthesizer the circuit, repeated again if the assumed value fails to generate a wave-pilelined circuit until it reaches a success, and so on.

I introduce a new concept WAVE CONSTANT:
A constant is defined as a wave constant in a WPC if its constant initial value is unknown and undetermined when the WPC is being coded, and will be assigned by a synthesizer after the synthesizer has analyzed the critical path. 

In contract with a wave constant A regular constant must be defined with a fixed initial value.

And it also requires that the synthesizer must first analyze the CPC, then analyze its paired WPC.

By doing so coding a wave-pipelined circuit will never have any problem!

The strange thing here is that a wave constant does not appear in its CPC, but the CPC's structure determines its initial value, then the synthesizer assigns the determined initial value to the wave constant which appears only in the paired WPC.

Based on above 3 situations, I introduced 3 wave constants:
a) Series_clock_number is the number of cycles for signals to travel the critical path.

b) Input_clock_number is the number of cycles, under which the circuit can accept one data per Input_Clock_number cycles.

c) Multiple_copy_number is the number of copies of a same critical path in order to meet a requirement for the circuit to accept one data per cycle. The requirement is required by a code designer.

By introducing a wave constant concept, code designers can smoothly and fully describe a wave-pipelined circuit in HDL without manual involvement.

Finally after the 3 WPC were coded, I found that all wave-pipelined circuits are divided into 2 categories and each of 2 categories shares a same WPC component without exception. Then the 2 types of WPC can form a system library and each of wave-pipelined circuits can call the library, avoiding coding same logic again and again.

To be continued.

Your comments are welcome.

Thank you.

Weng

Reply by Weng Tianxiang ●January 22, 20182018-01-22

> But if you wish to do wave-pipelined design the logic 
> has to be constructed in a way to balance the timing delays so the 
> uncertainty at the output of the combinatorial circuit fits within a clock 
> cycle *including the variation in timing from PVT*!!!  So it is impossible 
> to do wave-pipelined design without considering PVT effects on the timing.
> 
> Rick C
> 

The introduction of 3 link statements is used to inform a synthesizer that the CPC must be analyzed and generated as a wave-pipelined circuit instead of a regular one-cycle circuit. So the synthesizer would help you to generate logic that would balance the timing among all paths.

Can you do better than what a synthesizer can do?

When inventing, you must be smarter, not limited by what your experience tells you.

Weng

Reply by rickman ●January 22, 20182018-01-22

Weng Tianxiang wrote on 1/22/2018 6:19 PM:
>> But if you wish to do wave-pipelined design the logic
>> has to be constructed in a way to balance the timing delays so the
>> uncertainty at the output of the combinatorial circuit fits within a clock
>> cycle *including the variation in timing from PVT*!!!  So it is impossible
>> to do wave-pipelined design without considering PVT effects on the timing..
>>
>> Rick C
>>
>
> The introduction of 3 link statements is used to inform a synthesizer that the CPC must be analyzed and generated as a wave-pipelined circuit instead of a regular one-cycle circuit. So the synthesizer would help you to generate logic that would balance the timing among all paths.
>
> Can you do better than what a synthesizer can do?

I haven't seen what a synthesizer can do.  Does anyone make a synthesizer 
that does this?


> When inventing, you must be smarter, not limited by what your experience tells you.

Uh, if experience tells me something can't be done, why would I try to do 
it?  That's the utility of experience, you don't have to go down every blind 
alley.

I've yet to see the utility in this idea.  I would expect the speed 
improvements to be small, if any and as I've mentioned, unless you get an 
FPGA vendor to modify their chip designs along with the synthesis vendors to 
modify their software, all at no small cost, this will not offer any 
improvement in FPGAs.

Since you have not even taken a look at the issue of making this work over 
PVT variations, I'm pretty sure it is not possible to even make it work in 
today's FPGAs.  There is just too much variation in timing of a single path 
to wave-pipeline even a row of inverters.

-- 

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Reply by Weng Tianxiang ●January 22, 20182018-01-22

> I'm pretty sure it is not possible to even make it work in today's FPGAs.

If you don't know something, never say that it's impossible, in my point of view, it is beyond your specialty.

> I've yet to see the utility in this idea.  I would expect the speed 
improvements to be small, if any.

Intel has two versions of 64x64 bits floating multiplier, one used for compatibility with previous 8087 version, and another is version of MMX technology, a wave-pipelining technology. Based on web testing bench and Intel's literature, the wave-pipelined circuit version has 20% speed faster than its 8087 counterpart (4 cycle vs. 5 cycles). Additionally power consumption is dramatically reduced. A 64x64 bits floating multiplier has the maximum of 151 bits in one of 4 middle stages and you may calculate how many bits of registers have been saved!

> I've mentioned, unless you get an FPGA vendor to modify their chip designs along with the synthesis vendors to modify their software, all at no small cost, this will not offer any improvement in FPGAs. 

Defining a new part of HDL specially for wave-pipelined circuit in ASIC and FPGA and letting code designers own a corresponding simple and reliable designing method is one thing, and letting synthesizers implement the new part in HDL is another thing, as Jim, the chairman in VHDL, always asks people here to push the synthesizers to implement new part in 2008-VHDL. 

> I haven't seen what a synthesizer can do.  Does anyone make a synthesizer 
that does this? 

A synthesizer as a software needs an algorithm to do something fast and accurate, and there have been more than effective 20-30 algorithms over there, many of which were issued patents, a result you can get by simply using Google to search for, so it is reasonable that I assumed from the beginning of my project that the technique for synthesizing and generating a wave-pipelined circuit is fully matured now and might have been matured 10-20 years ago.

Weng

Reply by rickman ●January 23, 20182018-01-23

Weng Tianxiang wrote on 1/22/2018 10:33 PM:
>> I'm pretty sure it is not possible to even make it work in today's FPGAs.
>
> If you don't know something, never say that it's impossible, in my point of view, it is beyond your specialty.

FPGAs *are* my specialty.  I think you are showing you know little about 
actually working with FPGAs.  You don't seem to understand that the ratio of 
logic to FFs is fixed in any given FPGA so saving registers is not of great 
value.  You also don't seem to understand that you have too much speed 
variation over PVT to even use wave-pipelining in an FPGA.

Do you understand either of these two issues?  Do you have a way around 
these limitations?


>> I've yet to see the utility in this idea.  I would expect the speed
> improvements to be small, if any.
>
> Intel has two versions of 64x64 bits floating multiplier, one used for compatibility with previous 8087 version, and another is version of MMX technology, a wave-pipelining technology. Based on web testing bench and Intel's literature, the wave-pipelined circuit version has 20% speed faster than its 8087 counterpart (4 cycle vs. 5 cycles). Additionally power consumption is dramatically reduced. A 64x64 bits floating multiplier has the maximum of 151 bits in one of 4 middle stages and you may calculate how many bits of registers have been saved!

This has nothing to do with "FPGAs" which is the target I was referring to.


>> I've mentioned, unless you get an FPGA vendor to modify their chip designs along with the synthesis vendors to modify their software, all at no small cost, this will not offer any improvement in FPGAs.
>
> Defining a new part of HDL specially for wave-pipelined circuit in ASIC and FPGA and letting code designers own a corresponding simple and reliable designing method is one thing, and letting synthesizers implement the new part in HDL is another thing, as Jim, the chairman in VHDL, always asks people here to push the synthesizers to implement new part in 2008-VHDL.

Adding this to the HDL is trivial.  The ENTIRE hard part is getting a 
synthesizer to support this by doing all the hard work.


>> I haven't seen what a synthesizer can do.  Does anyone make a synthesizer
> that does this?
>
> A synthesizer as a software needs an algorithm to do something fast and accurate, and there have been more than effective 20-30 algorithms over there, many of which were issued patents, a result you can get by simply using Google to search for, so it is reasonable that I assumed from the beginning of my project that the technique for synthesizing and generating a wave-pipelined circuit is fully matured now and might have been matured 10-20 years ago.

For FPGAs???

I notice you completely snipped the part about PVT variations in timing 
which very likely will be the stake driven through the heart of this approach.

Bottom line - if wave-pipelining were an advantage in FPGAs or even 
practical with benefit, one of the FPGA companies would be promoting it.  If 
they could get a 20% speed improvement, they would be jumping through hoops 
as it would give them a *huge* competitive advantage over the other FPGA 
companies.

-- 

Rick C

Viewed the eclipse at Wintercrest Farms,
on the centerline of totality since 1998

Reply by Weng Tianxiang ●January 23, 20182018-01-23

> Bottom line - if wave-pipelining were an advantage in FPGAs or even 
> practical with benefit, one of the FPGA companies would be promoting it.  If 
> they could get a 20% speed improvement, they would be jumping through hoops 
> as it would give them a *huge* competitive advantage over the other FPGA 
> companies.
> 
> Rick C

I appreciate your above paragraph, at least a small step forward!

There are many Indian professors' papers on wave-pipelined circuits for FPGA. Here is one of them: Some Experiments about Wave Pipelining on FPGAs

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.42.2942&rep=rep1&type=pdf

Weng

Reply by HT-Lab ●January 23, 20182018-01-23

On 22/01/2018 12:18, rickman wrote:
> HT-Lab wrote on 1/21/2018 1:19 PM:
..
>>>
>>> There's your first mistake, no one uses Actel/Microsemi FPGAs.&nbsp; They 
>>> long
>>> for the day they are as big as Lattice, lol!
>>
>> Microsemi has been at the number 3 spot for as long as I use FPGA's 
>> (+/- 28
>> years starting with Actel's A1010). They are twice as large as Lattice.
>>
>> Here is a reference:
>>
>> https://www.eetimes.com/author.asp?doc_id=1331443
> 
> There's some BS somewhere...
> 
> http://www.fpgadeveloper.com/2011/07/list-and-comparison-of-fpga-companies.html 
> 
Comparing a blogger article from 2011 who took some NASDAQ vales against 
eetimes which using a marketing survey company results from 2017, good job!

> 
> More importantly, look at the numbers in your link.&nbsp; The 
> Actell/Microsemi numbers are going in the wrong direction!&nbsp; X, A and L 
> are headed upward year-to-year and Actel is headed down!
> 
Sure, but Microsemi is still number 3 which was my point. In my day job 
I speak to more companies using Microsemi than Lattice (both dwarf 
against Xilinx and Intel though). That is not to say that Lattice makes 
bad FPGA's it is just that they haven't carved out a particular niche 
area like Microsemi or say Achronix.

> While looking this up I found a link indicating the JTAG interface of 
> the ProASIC3 devices has a back door which would allow their security to 
> be bypassed.&nbsp; Security was their claim to fame and this could be a major 
> blow to the company.
> 

Made no impact on their business, they are still the number1 company for 
secure/space/avionics FPGA's. Also note ProASIC3 is nearly 10 years old.

Hans
www.ht-lab.com

Reply by Mark Curry ●January 23, 20182018-01-23

In article <93899eff-01a7-4e78-8076-17febc2c8f0c@googlegroups.com>,
Weng Tianxiang <wtxwtx@gmail.com> wrote:
>Hi,
>
>A wive-pipelined circuit has the same logic as its pipeline counterpart except that the wive-pipelined circuit has only one stage, a critical path
>from the input register passing through a piece of computational logic to the output register, and no intermediate registers.

Weng,

I read along and not commented here. But I find it's harder to ignore..
I've read up on some of the references you've posted. I think I've now got
a fairly good idea as to what this wave-pipelining thing is now. So
thanks for the refenences.

But you've repeated claimed that your patent doesn't need to deal with PVT
variations - that's a problem for the synthesizer...

PVT variations is the elephant in the room. It's why wave-pipelining (and
other asynchronous design techniques) have failed to grab any hold outside
research facilities. It's a very difficult problem to solve. And it's
only getting worse at each lower technology node.

Now some of the papers you cite offer some fairly clever solutions that FPGA
manufactures COULD use to try and enable more wave-pipelined solutions - the
one paper cited referred to small inline PLLs along the switch network to
allow delays to be more matched. Interesting solutions. But one that
the FPGA manufactures would have to take and implement. There's nothing
for us end FPGA users to use. The underlying technology just doesn't enable
end users to use wave-pipelining solutions in todays FPGAs. Because of the PVT
variation problem. (and simply calling it "PVT" variation falsely
implies that just those three variables matter. There's many, many more
variables that affect the variation distribution)

Now as to your patent claim. I'm unsure at all what you're trying to claim.
You list as an example a straightforward, and very basic pipelined datapath
example. One that matches thousands of others already in existance and prior
art. It's an input pipeline register, and large combinational cloud, and an
output pipeline register. Described in VHDL. I fail to see anything at all
novel there. But you claim that an undescribed, unbuilt tool could then
take such code and implement a wave-pipeline with it? That someone else
would have to build?

In another thread you seem to be claiming that the tool could automatically
determing the latency, and/or clock rate and/or "how many waves" are in
flight along the wave-pipeline circuit? That belies how design is done - it's
putting the cart before the horse. And is a common misconception of new users to
FPGAs designing even traditional pipelined design.

A common question that new users ask is "how fast can I make this pipelined
design run?". The experienced designer then answers - that's not how it's done.
The experienced designer has a specific problem that's trying to be solve - not
trying to see "how fast it can run". The designer must guide the tool towards a
solution with a latency / clock rate / "how many waves" as a design goal up front.
Not a derived solution output from the tool. The designer must know these up front
so as to design the entire solution. How does the designer know what
values are realistic goals? Experience. That's engineering.

So, my 3 cents. Wave-pipelines are in a class of asynchronous design techniques
that's of no use to current FPGA users. Perhaps if Xilinx or Altera (er, Intel)
or even some up and coming startup decides to utilize some of the techniques in the
cited papers, we may see something in the next couple of decades. Personally, I doubt it.

Regards,

Mark

Reply by Weng Tianxiang ●January 23, 20182018-01-23

Hi Mark,

Good question!

PVT is very complex problem beyond the ordinary people's reach, and I agree your opinion that PVT variations is the elephant in the room!

Intel MMX technology has been a huge success, achieving 20% faster speed against its pipelined circuit for 64*64 bit floating multiplier!

What is a technology Intel can do while Xilinx and Altera (Intel) cannot do?

FPGA is none but an ASCI.

Especially Altera is now part of Intel, if there is a bridge built between a code designer and a synthesizer, Altera absolutely can do it for its FPGA without doubt! That is my opinion.

All coding of my scheme has 3 steps:

1) Write a Critical Path Component (CPC) with defined interface;

2) Call a Wave-Pipelining Component (WPC) provided by a system library;

3) Call one of 3 link statements to link a CPC instantiation with a paired WPC instantiation to specify what your target is.

My scheme separates the critical path from 1-cycle regular logic. WPC is a regular 1-cycle component, link statement is a traditional statement used for special meanings and both of WPC and link statement have nothing to do with PVT.

Now only CPC component has argument between me and your group with suspicion.

What a synthesizer needs for generating a wave-pipelined circuit is the critical path logic, that is all, in any situation.

My scheme makes thing simpler: only CPC of a wave-pipelined circuit's 3 parts has to be wave-pipelined.

As I said before, if anybody uses HDL to code, he has nothing to do with PVT, never put PVT into consideration, not me, not you, nobody does it and full HDL has no grammar on PVT!!! That is other ones' business.

Here is my opinion:
1. PVT is the elephant in the room.
2. Intel MMX technology has been a huge success since 1997.
3. Altera now is part of Intel.
4. FPGA is none but a special ASIC with greater flexibility.
5. I built a bridge between a code designer and a synthesizer.

Now there are 2 suspicions:
1. Can a wave-pipelined circuit be reliably generated and used in FPGA for all FPGA users?

2. Is wave-pipelined circuit useless in FPGA?

I like to tell you 2 stories:
1. Intel is one of companies who own large amount of patents, but Intel has never applied for any patents, and never published any articles on the subject.

2. I learn that Chinese Huawei designs their cell phone chips embedded with the wave-pipelined circuits.

Based on Huawei situation, I think wave-pipelined circuits are broadly used in big companies for their ASIC.

I am trying to find someone in Altera and Xilinx to contact with me and hope he/she gives me an email.

Thank you.

Weng

Previous 1 234 5 Next

My invention: Coding wave-pipelined circuits with buffering function in HDL

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group