Here's an interesting synthesis result. I synthesized this with Vivado for Virtex-7: reg [68:0] x; reg x_neq_0; always@(posedge clk) x_neq_0 <= x!=0; // version 1 Then I rephrased the logic: reg [68:0] x; reg x_neq_0; always@(posedge clk) x_neq_0 <= |x; // version 2 These should be the same, right? Version 1 uses 23 3-input LUTs on the first level followed by a 23-long carry chain (6 CARRY4 blocks). This is twice as big as it should be. Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15 total. Neither is optimal. What I really want is a combination, 12 6-input LUTs followed by 3 CARRY4s. This is supposed to be the era of high-level synthesis...
Phrasing!
Started by ●November 19, 2016
Reply by ●November 20, 20162016-11-20
On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote:> Here's an interesting synthesis result. I synthesized this with Vivado > for Virtex-7: > > reg [68:0] x; > reg x_neq_0; > always@(posedge clk) x_neq_0 <= x!=0; // version 1 > > Then I rephrased the logic: > > reg [68:0] x; > reg x_neq_0; > always@(posedge clk) x_neq_0 <= |x; // version 2 > > These should be the same, right? > > Version 1 uses 23 3-input LUTs on the first level followed by a 23-long > carry chain (6 CARRY4 blocks). This is twice as big as it should be. > > Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15 > total. > > Neither is optimal. What I really want is a combination, 12 6-input > LUTs followed by 3 CARRY4s. > > This is supposed to be the era of high-level synthesis...I'm not enough of an FPGA guy to make really deep comments, but this looks like the state of C compilers about 20 or so years ago. When I started coding in C one had to write the code with an eye to the assembly that the thing was spitting out. Now, if you've got a good optimizer (and the gnu C optimizer is better than I am on all but a very few of the processors I've worked with recently), you just express your intent and the compiler makes it happen most efficiently. Clearly, that's not yet the case, at least for that particular synthesis tool. It's a pity. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Reply by ●November 20, 20162016-11-20
On 11/20/2016 5:43 PM, Tim Wescott wrote:> On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote: > >> Here's an interesting synthesis result. I synthesized this with Vivado >> for Virtex-7: >> >> reg [68:0] x; >> reg x_neq_0; >> always@(posedge clk) x_neq_0 <= x!=0; // version 1 >> >> Then I rephrased the logic: >> >> reg [68:0] x; >> reg x_neq_0; >> always@(posedge clk) x_neq_0 <= |x; // version 2 >> >> These should be the same, right? >> >> Version 1 uses 23 3-input LUTs on the first level followed by a 23-long >> carry chain (6 CARRY4 blocks). This is twice as big as it should be. >> >> Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15 >> total. >> >> Neither is optimal. What I really want is a combination, 12 6-input >> LUTs followed by 3 CARRY4s. >> >> This is supposed to be the era of high-level synthesis... > > I'm not enough of an FPGA guy to make really deep comments, but this > looks like the state of C compilers about 20 or so years ago. When I > started coding in C one had to write the code with an eye to the assembly > that the thing was spitting out. Now, if you've got a good optimizer > (and the gnu C optimizer is better than I am on all but a very few of the > processors I've worked with recently), you just express your intent and > the compiler makes it happen most efficiently. > > Clearly, that's not yet the case, at least for that particular synthesis > tool. It's a pity.'tis true, ’tis pity, And pity ’tis ’tis true -- Rick C
Reply by ●November 21, 20162016-11-21
On 20/11/16 22:43, Tim Wescott wrote:> On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote: > >> Here's an interesting synthesis result. I synthesized this with Vivado >> for Virtex-7: >> >> reg [68:0] x; >> reg x_neq_0; >> always@(posedge clk) x_neq_0 <= x!=0; // version 1 >> >> Then I rephrased the logic: >> >> reg [68:0] x; >> reg x_neq_0; >> always@(posedge clk) x_neq_0 <= |x; // version 2 >> >> These should be the same, right? >> >> Version 1 uses 23 3-input LUTs on the first level followed by a 23-long >> carry chain (6 CARRY4 blocks). This is twice as big as it should be. >> >> Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15 >> total. >> >> Neither is optimal. What I really want is a combination, 12 6-input >> LUTs followed by 3 CARRY4s. >> >> This is supposed to be the era of high-level synthesis... > > I'm not enough of an FPGA guy to make really deep comments, but this > looks like the state of C compilers about 20 or so years ago. When I > started coding in C one had to write the code with an eye to the assembly > that the thing was spitting out. Now, if you've got a good optimizer > (and the gnu C optimizer is better than I am on all but a very few of the > processors I've worked with recently), you just express your intent and > the compiler makes it happen most efficiently. > > Clearly, that's not yet the case, at least for that particular synthesis > tool. It's a pity.Of course sometimes you don't want optimisation. Consider, for example, bridging terms in an asynchronous circuit.
Reply by ●November 21, 20162016-11-21
> I'm not enough of an FPGA guy to make really deep comments, but this > looks like the state of C compilers about 20 or so years ago. When I > started coding in C one had to write the code with an eye to the assembly > that the thing was spitting out. Now, if you've got a good optimizer > (and the gnu C optimizer is better than I am on all but a very few of the > processors I've worked with recently), you just express your intent and > the compiler makes it happen most efficiently. >I know! I often feel like I'm a software guy, but stuck in the 80s, poring over every line generated by the assembler to make sure it's optimized.
Reply by ●November 21, 20162016-11-21
On Mon, 21 Nov 2016 10:07:41 +0000, Tom Gardner wrote:> On 20/11/16 22:43, Tim Wescott wrote: >> On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote: >> >>> Here's an interesting synthesis result. I synthesized this with >>> Vivado for Virtex-7: >>> >>> reg [68:0] x; >>> reg x_neq_0; >>> always@(posedge clk) x_neq_0 <= x!=0; // version 1 >>> >>> Then I rephrased the logic: >>> >>> reg [68:0] x; >>> reg x_neq_0; >>> always@(posedge clk) x_neq_0 <= |x; // version 2 >>> >>> These should be the same, right? >>> >>> Version 1 uses 23 3-input LUTs on the first level followed by a >>> 23-long carry chain (6 CARRY4 blocks). This is twice as big as it >>> should be. >>> >>> Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15 >>> total. >>> >>> Neither is optimal. What I really want is a combination, 12 6-input >>> LUTs followed by 3 CARRY4s. >>> >>> This is supposed to be the era of high-level synthesis... >> >> I'm not enough of an FPGA guy to make really deep comments, but this >> looks like the state of C compilers about 20 or so years ago. When I >> started coding in C one had to write the code with an eye to the >> assembly that the thing was spitting out. Now, if you've got a good >> optimizer (and the gnu C optimizer is better than I am on all but a >> very few of the processors I've worked with recently), you just express >> your intent and the compiler makes it happen most efficiently. >> >> Clearly, that's not yet the case, at least for that particular >> synthesis tool. It's a pity. > > Of course sometimes you don't want optimisation. Consider, for example, > bridging terms in an asynchronous circuit.OK. I give up -- what do you mean by "bridging terms"? In general, I would say that if this is an issue, then (as with the 'volatile' and 'mutable' keywords in C++), there should be a way in the language to express your intent to the synthesizer -- either a way to say "don't optimize this section", or a way to say "keep this signal no matter what", or a syntax that lets you lay down literal hardware, etc. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Reply by ●November 21, 20162016-11-21
Tim Wescott wrote:> On Mon, 21 Nov 2016 10:07:41 +0000, Tom Gardner wrote: > >> On 20/11/16 22:43, Tim Wescott wrote: >>> On Sat, 19 Nov 2016 14:15:18 -0800, Kevin Neilson wrote: >>> >>>> Here's an interesting synthesis result. I synthesized this with >>>> Vivado for Virtex-7: >>>> >>>> reg [68:0] x; >>>> reg x_neq_0; >>>> always@(posedge clk) x_neq_0 <= x!=0; // version 1 >>>> >>>> Then I rephrased the logic: >>>> >>>> reg [68:0] x; >>>> reg x_neq_0; >>>> always@(posedge clk) x_neq_0 <= |x; // version 2 >>>> >>>> These should be the same, right? >>>> >>>> Version 1 uses 23 3-input LUTs on the first level followed by a >>>> 23-long carry chain (6 CARRY4 blocks). This is twice as big as it >>>> should be. >>>> >>>> Version 2 is 3 levels of LUTs, 12 6-input LUTs on the first level, 15 >>>> total. >>>> >>>> Neither is optimal. What I really want is a combination, 12 6-input >>>> LUTs followed by 3 CARRY4s. >>>> >>>> This is supposed to be the era of high-level synthesis... >>> I'm not enough of an FPGA guy to make really deep comments, but this >>> looks like the state of C compilers about 20 or so years ago. When I >>> started coding in C one had to write the code with an eye to the >>> assembly that the thing was spitting out. Now, if you've got a good >>> optimizer (and the gnu C optimizer is better than I am on all but a >>> very few of the processors I've worked with recently), you just express >>> your intent and the compiler makes it happen most efficiently. >>> >>> Clearly, that's not yet the case, at least for that particular >>> synthesis tool. It's a pity. >> Of course sometimes you don't want optimisation. Consider, for example, >> bridging terms in an asynchronous circuit. > > OK. I give up -- what do you mean by "bridging terms"? > > In general, I would say that if this is an issue, then (as with the > 'volatile' and 'mutable' keywords in C++), there should be a way in the > language to express your intent to the synthesizer -- either a way to say > "don't optimize this section", or a way to say "keep this signal no > matter what", or a syntax that lets you lay down literal hardware, etc. >Bridging terms refers to terms that cover transitions in an asynchronous sequential circuit. Xilinx tools specifically do not honor this sort of logic and it really has no business in their FPGA's. However, if you insist on generating asynchronous sequential logic in a Xilinx FPGA, you will need to instantiate LUTs to get the coverage you're looking for. -- Gabor
Reply by ●November 21, 20162016-11-21
In article <9ae86fdc-dc6a-4d3f-b201-594fe2f6a3cd@googlegroups.com>, Kevin Neilson <kevin.neilson@xilinx.com> wrote:>> I'm not enough of an FPGA guy to make really deep comments, but this >> looks like the state of C compilers about 20 or so years ago. When I >> started coding in C one had to write the code with an eye to the assembly >> that the thing was spitting out. Now, if you've got a good optimizer >> (and the gnu C optimizer is better than I am on all but a very few of the >> processors I've worked with recently), you just express your intent and >> the compiler makes it happen most efficiently. >> >I know! I often feel like I'm a software guy, but stuck in the 80s, poring over every line generated by the assembler to make sure it's optimized.But, but "HLS", and "IP Integrator"... ;) I actually came back a bit let down from a recent Xilinx user's meeting at just how much focus Xilinx is putting on their 'high level' tools. I'm of the opinion that Xilinx is sinking a ton of resources into something that a small minority will ever use. (And will probably not last long either). To Xilinx, RTL design is dead... --Mark
Reply by ●November 21, 20162016-11-21
On Mon, 21 Nov 2016 21:19:50 +0000, Mark Curry wrote:> In article <9ae86fdc-dc6a-4d3f-b201-594fe2f6a3cd@googlegroups.com>, > Kevin Neilson <kevin.neilson@xilinx.com> wrote: >>> I'm not enough of an FPGA guy to make really deep comments, but this >>> looks like the state of C compilers about 20 or so years ago. When I >>> started coding in C one had to write the code with an eye to the >>> assembly that the thing was spitting out. Now, if you've got a good >>> optimizer (and the gnu C optimizer is better than I am on all but a >>> very few of the processors I've worked with recently), you just >>> express your intent and the compiler makes it happen most efficiently. >>> >>I know! I often feel like I'm a software guy, but stuck in the 80s, >>poring over every line generated by the assembler to make sure it's >>optimized. > > But, but "HLS", and "IP Integrator"... ;) > > I actually came back a bit let down from a recent Xilinx user's meeting > at just how much focus Xilinx is putting on their 'high level' tools. > I'm of the opinion that Xilinx is sinking a ton of resources into > something that a small minority will ever use. (And will probably not > last long either). To Xilinx, RTL design is dead... > > --MarkIf that small minority is the one with the most dollars behind it, then they win. Dunno if that's the case or not, but it seems like there's a lot of design of high-volume, cost-sensitive stuff that's done mostly by applications engineers these days. Or, Xilinx is wrong, and they'll spend a lot of money on uselessness. That's never happened before in the history of semiconductors, now has it? ;) -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Reply by ●November 21, 20162016-11-21
> I actually came back a bit let down from a recent Xilinx user's meeting at just how > much focus Xilinx is putting on their 'high level' tools. I'm of the opinion that > Xilinx is sinking a ton of resources into something that a small minority will > ever use. (And will probably not last long either). To Xilinx, RTL design is > dead... > > --MarkI wish they would just focus all their effort on the synthesizer and placer. The chips get better and better, but the software seems stuck. I think the high-level tools are not for serious users. You can only use them if you don't care about clock speed, and if you don't care about clock speed, you should be using a processor or something.






