comp.arch.fpga | Picking the best synthesis result before implementation

Out of curiosity, I wrote a script to explore with different options in the Vivado software (2014.4), especially on the synthesis options under SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after synthesis, just enough to get the timing estimate. I explore everything except the directive because it seems like you use the directive, you cannot manually set the options

My goal is to see if it will give me a better result before I move on to implementation. However, out of the 50 different results I see that a lot of the estimated worst slacks and timing scores are the same. About 40% of the results report the same values. I ran on 3 sample designs and it gave me the same thing.

So my question is, is there a way to differentiate what is a better synthesis result? What should I look at in the report?

Reply by Brian Drummond ●July 31, 20152015-07-31

On Thu, 30 Jul 2015 20:23:12 -0700, James07 wrote:

> Out of curiosity, I wrote a script to explore with different options in
> the Vivado software (2014.4), especially on the synthesis options under
> SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after
> synthesis, just enough to get the timing estimate. I explore everything
> except the directive because it seems like you use the directive, you
> cannot manually set the options
> 
> My goal is to see if it will give me a better result before I move on to
> implementation. However, out of the 50 different results I see that a
> lot of the estimated worst slacks and timing scores are the same. About
> 40% of the results report the same values. I ran on 3 sample designs and
> it gave me the same thing.
> 
> So my question is, is there a way to differentiate what is a better
> synthesis result? What should I look at in the report?

Did you also differentiate by resource usage? Same timing result and 
lower usage would count as better, but sometimes different settings will, 
after optimisation, yield the same result.

It's also worth trying ISE, with both the old and new VHDL parser (though 
switching parsers is more likely to dance round bugs than improve synth 
results). 

While Vivado is relatively new, ISE has been heavily tuned across the 
years and I wouldn't be surprised to find it sometimes gives better 
results.

If you try it, I'd be interested to see your conclusions.

-- Brian

Reply by GaborSzakacs ●July 31, 20152015-07-31

Brian Drummond wrote:
> On Thu, 30 Jul 2015 20:23:12 -0700, James07 wrote:
> 
>> Out of curiosity, I wrote a script to explore with different options in
>> the Vivado software (2014.4), especially on the synthesis options under
>> SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after
>> synthesis, just enough to get the timing estimate. I explore everything
>> except the directive because it seems like you use the directive, you
>> cannot manually set the options
>>
>> My goal is to see if it will give me a better result before I move on to
>> implementation. However, out of the 50 different results I see that a
>> lot of the estimated worst slacks and timing scores are the same. About
>> 40% of the results report the same values. I ran on 3 sample designs and
>> it gave me the same thing.
>>
>> So my question is, is there a way to differentiate what is a better
>> synthesis result? What should I look at in the report?
> 
> Did you also differentiate by resource usage? Same timing result and 
> lower usage would count as better, but sometimes different settings will, 
> after optimisation, yield the same result.
> 
> It's also worth trying ISE, with both the old and new VHDL parser (though 
> switching parsers is more likely to dance round bugs than improve synth 
> results). 

You can't use the old parser on 6 or 7 series parts.  It's OK to
use the newer parser for older parts, but the use_new_parser
switch is ignored for 6 or 7 series.  So in effect there's only
one XST implementation to try if you are using 7-series parts.
ISE does allow you to use SmartXplorer to investigate different
canned sets of options, though.  I usually find that you need to
individually tune the settings to get the best results.

> 
> While Vivado is relatively new, ISE has been heavily tuned across the 
> years and I wouldn't be surprised to find it sometimes gives better 
> results.
> 
> If you try it, I'd be interested to see your conclusions.
> 
> -- Brian

-- 
Gabor

Reply by Sharad ●August 2, 20152015-08-02

On Friday, July 31, 2015 at 11:23:18 AM UTC+8, James07 wrote:
> Out of curiosity, I wrote a script to explore with different options in the Vivado software (2014.4), especially on the synthesis options under SYNTH_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after synthesis, just enough to get the timing estimate. I explore everything except the directive because it seems like you use the directive, you cannot manually set the options
> 
> My goal is to see if it will give me a better result before I move on to implementation. However, out of the 50 different results I see that a lot of the estimated worst slacks and timing scores are the same. About 40% of the results report the same values. I ran on 3 sample designs and it gave me the same thing.
> 
> So my question is, is there a way to differentiate what is a better synthesis result? What should I look at in the report?

1. Lower area utilization with similar timing results would be considered good. However, it will be even better to take a look at the individual utilization of resources like LUTs, BRAM and DSP blocks. You may want to choose a synthesis result that allows you to add more features to your design in the future. Such features may require BRAM or DSP in different proportions. So, it might be good to see the synthesis results, especially area, with respect to expected feature changes in the future.

2. Power is another factor that you may consider when deciding which is a better synthesis result. If you have two synthesis results, where one uses a lot of LUTs while the other uses a lot of DSP blocks, it is very likely that the one with DSP blocks will dissipate lesser dynamic power. This is because DSP blocks are optimized hard IP blocks on the device.

3. Have you analysed your results with respect to pin assignment? If pin assignment is critical to how your FPGA will be placed on the board, you may want to see the synthesis results with that perspective. Under no pin assignment constraint, the tool automatically assigns pins to the design. Pin assignment constraint is not applied by the tool during "synthesis-only" run. But the default pin assignment and corresponding synthesis results can be analyzed with respect to your planned pin assignment.

4. If a large percentage of synthesis results give similar results, it also means that the tool is not finding many opportunities to perform various optimizations. It could be because your design is already very well architected or it could be that it needs to be re-architected if you are aiming for certain specific performance measures. As the designer, you know better which is the case with the design.

Reply by ●August 2, 20152015-08-02

On Friday, July 31, 2015 at 6:41:47 PM UTC+8, Brian Drummond wrote:
> Did you also differentiate by resource usage? Same timing result and=20
> lower usage would count as better, but sometimes different settings will,=
=20
> after optimisation, yield the same result.
>=20
As far as I can tell, the resource usage is almost the same and similar. I =
am taking another look. On the first glance, for the 40% I mentioned, they =
look almost the same, which is also partly why I can't tell these clones tr=
oopers apart.

> It's also worth trying ISE, with both the old and new VHDL parser (though=
=20
> switching parsers is more likely to dance round bugs than improve synth=
=20
> results).=20
>=20
> While Vivado is relatively new, ISE has been heavily tuned across the=20
> years and I wouldn't be surprised to find it sometimes gives better=20
> results.
>=20
> If you try it, I'd be interested to see your conclusions.
>=20
> -- Brian

Yes, I am intending to try it on ISE. The latest (and last!) ISE version 14=
.7 works on one of the older V7 devices. I will try that and see what is th=
e result, although I am not so sure if it gives estimated timing scores aft=
er synthesis. Need to look into it.

Reply by ●August 2, 20152015-08-02

On Sunday, August 2, 2015 at 12:41:43 PM UTC+8, Sharad wrote:
> 1. Lower area utilization with similar timing results would be considered good. However, it will be even better to take a look at the individual utilization of resources like LUTs, BRAM and DSP blocks. You may want to choose a synthesis result that allows you to add more features to your design in the future. Such features may require BRAM or DSP in different proportions. So, it might be good to see the synthesis results, especially area, with respect to expected feature changes in the future.
> 

> 2. Power is another factor that you may consider when deciding which is a better synthesis result. If you have two synthesis results, where one uses a lot of LUTs while the other uses a lot of DSP blocks, it is very likely that the one with DSP blocks will dissipate lesser dynamic power. This is because DSP blocks are optimized hard IP blocks on the device.
> 
> 3. Have you analysed your results with respect to pin assignment? If pin assignment is critical to how your FPGA will be placed on the board, you may want to see the synthesis results with that perspective. Under no pin assignment constraint, the tool automatically assigns pins to the design. Pin assignment constraint is not applied by the tool during "synthesis-only" run. But the default pin assignment and corresponding synthesis results can be analyzed with respect to your planned pin assignment.
> 

This is a good point. No, I haven't got to that step. Based on what I understand from the Vivado flow, that happens during place_design phase. Hmm... so perhaps the next step is to take that 40% results and continue running them till end of place_design, and check out the timing estimates. I guess the later it is in the flow, the more accurate it becomes.

> 4. If a large percentage of synthesis results give similar results, it also means that the tool is not finding many opportunities to perform various optimizations. It could be because your design is already very well architected or it could be that it needs to be re-architected if you are aiming for certain specific performance measures. As the designer, you know better which is the case with the design.

I wouldn't say it is already well-architected. Sometimes my hands are tied and I can't change the code. So I am exploring ways to work the tools to my advantage. Thanks for the helpful comments.

Reply by rickman ●August 2, 20152015-08-02

On 8/2/2015 6:56 AM, kt8128@gmail.com wrote:
> On Sunday, August 2, 2015 at 12:41:43 PM UTC+8, Sharad wrote:
>> 1. Lower area utilization with similar timing results would be
>> considered good. However, it will be even better to take a look at
>> the individual utilization of resources like LUTs, BRAM and DSP
>> blocks. You may want to choose a synthesis result that allows you
>> to add more features to your design in the future. Such features
>> may require BRAM or DSP in different proportions.. So, it might be
>> good to see the synthesis results, especially area, with respect to
>> expected feature changes in the future.
>>
>
>> 2. Power is another factor that you may consider when deciding
>> which is a better synthesis result. If you have two synthesis
>> results, where one uses a lot of LUTs while the other uses a lot of
>> DSP blocks, it is very likely that the one with DSP blocks will
>> dissipate lesser dynamic power. This is because DSP blocks are
>> optimized hard IP blocks on the device.
>>
>> 3. Have you analysed your results with respect to pin assignment?
>> If pin assignment is critical to how your FPGA will be placed on
>> the board, you may want to see the synthesis results with that
>> perspective. Under no pin assignment constraint, the tool
>> automatically assigns pins to the design. Pin assignment constraint
>> is not applied by the tool during "synthesis-only" run. But the
>> default pin assignment and corresponding synthesis results can be
>> analyzed with respect to your planned pin assignment.
>>
>
> This is a good point. No, I haven't got to that step. Based on what I
> understand from the Vivado flow, that happens during place_design
> phase. Hmm... so perhaps the next step is to take that 40% results
> and continue running them till end of place_design, and check out the
> timing estimates. I guess the later it is in the flow, the more
> accurate it becomes..

My experience is the timing numbers from synthesis are totally bogus. 
You need to do a place and route if you want to compare timing data. 
Even then you can get noticeable improvements in timing by running more 
than one routes with different settings.  So the connection back to your 
synthesis parameters is hard to explore without a lot of work.  Using 
one pass on place and route may show synthesis option A to be the best 
by 4% but when you explore the routing options you may find synthesis 
option B is now 7% better.

I think this problem space is very chaotic with small changes in initial 
conditions giving large changes in results.

I worked on a project once where the timing analysis tools were broken 
saying the project met timing when it didn't.  The design would fail on 
the bench until we hit it with cold spray.  I tried using manual 
placement to improve the routing, but everything I did to improve this 
feature made some other feature worse or even unroutable.

We automated a process of tweaking the initial seed parameter to get 
multiple runs each night.  The next day we would test those runs on the 
bench with a chip warmer.  Eventually we found a good design and shipped 
it.  Ever since then I have treated the entire compile-place-route 
process like an exploration of the Mandelbrot set.

>> 4. If a large percentage of synthesis results give similar results,
>> it also means that the tool is not finding many opportunities to
>> perform various optimizations. It could be because your design is
>> already very well architected or it could be that it needs to be
>> re-architected if you are aiming for certain specific performance
>> measures. As the designer, you know better which is the case with
>> the design.
>
> I wouldn't say it is already well-architected. Sometimes my hands are
> tied and I can't change the code. So I am exploring ways to work the
> tools to my advantage. Thanks for the helpful comments.

Is there a particular problem you are having with the results?  Is the 
design larger than you need?  If you haven't done a place-route I guess 
it can't be that it is too slow.  If you are just trying to "optimize" I 
suggest you don't bother and just move on to the place and route.  See 
what sorts of results you get before you spend time trying to optimize a 
design that may be perfectly good.

There is a rule about optimization.  It says *don't* unless you have to. 
  Optimizing for "this" can make it harder to get "that" working or at 
very least result in spending a lot of time on something that isn't 
important in the end.

-- 

Rick

Reply by ●August 2, 20152015-08-02

On Monday, August 3, 2015 at 12:59:14 AM UTC+8, rickman wrote:
> 
> My experience is the timing numbers from synthesis are totally bogus. 
> You need to do a place and route if you want to compare timing data. 
> Even then you can get noticeable improvements in timing by running more 
> than one routes with different settings.  So the connection back to your 
> synthesis parameters is hard to explore without a lot of work.  Using 
> one pass on place and route may show synthesis option A to be the best 
> by 4% but when you explore the routing options you may find synthesis 
> option B is now 7% better.
> 
> I think this problem space is very chaotic with small changes in initial 
> conditions giving large changes in results.

Yes, I understand that and have seen that myself. Part of it is why I am struggling to qualify what is a "good" synthesize result, with meeting timing as the end goal. For example, let say "A" synthesis set has 10% of meeting timing with various P&R settings. "B" synthesis set has only 5%. *Something* has got to be that difference. 

> 
> I worked on a project once where the timing analysis tools were broken 
> saying the project met timing when it didn't.  The design would fail on 
> the bench until we hit it with cold spray.  

This is hilarious! 

> Is there a particular problem you are having with the results?  Is the 
> design larger than you need?  If you haven't done a place-route I guess 
> it can't be that it is too slow.  If you are just trying to "optimize" I 
> suggest you don't bother and just move on to the place and route.  See 
> what sorts of results you get before you spend time trying to optimize a 
> design that may be perfectly good.

I have done place-route a couple of times and it takes around 8 hours. (1 hour for synthesis) I tried different directives as well and it gave me a variety of results. 

I understand how I am approaching this may not be practical in the grand scheme of things. BUT I got curious when I read in the V design methodology that if you get -300ps after post-synthesis, you can definitely meet timing. I also vaguely remember an illustration showing synthesis has a 10x effect on end results. I wonder how and who did these estimations. 

> 
> There is a rule about optimization.  It says *don't* unless you have to. 
>   Optimizing for "this" can make it harder to get "that" working or at 
> very least result in spending a lot of time on something that isn't 
> important in the end.

> 
> -- 
> 
> Rick

Reply by rickman ●August 3, 20152015-08-03

On 8/2/2015 10:14 PM, kt8128@gmail.com wrote:
> On Monday, August 3, 2015 at 12:59:14 AM UTC+8, rickman wrote:
>>
>> My experience is the timing numbers from synthesis are totally
>> bogus. You need to do a place and route if you want to compare
>> timing data. Even then you can get noticeable improvements in
>> timing by running more than one routes with different settings.  So
>> the connection back to your synthesis parameters is hard to explore
>> without a lot of work.  Using one pass on place and route may show
>> synthesis option A to be the best by 4% but when you explore the
>> routing options you may find synthesis option B is now 7% better.
>>
>> I think this problem space is very chaotic with small changes in
>> initial conditions giving large changes in results.
>
> Yes, I understand that and have seen that myself. Part of it is why I
> am struggling to qualify what is a "good" synthesize result, with
> meeting timing as the end goal. For example, let say "A" synthesis
> set has 10% of meeting timing with various P&R settings. "B"
> synthesis set has only 5%. *Something* has got to be that
> difference.

I think there is little about your synthesis result that can be easily 
measured in a meaningful way to predict the timing result of routing. 
That is what I mean about it being "chaotic".  It is much like 
predicting the weather more than a week out.  You can see general 
trends, but hard to predict any details with any accuracy.  So the 
weather man just doesn't try.

In FPGAs the synthesis result has no insight into routing so they just 
measure the logic delays and then add a standard factor for routing. 
Routing can be impacted by the logic partitioning in ways that are hard 
to predict.  I'd be willing to speculate it is a bit like the way they 
proved in general the task of predicting the run time of a computer 
algorithm will take as much run time as the algorithm itself.  So the 
best way to estimate run time is to run the task.  Best way to estimate 
routing result is to run routing.  Routing is often half the total path 
time, so without good info on that there is no decent guess to timing.

>> I worked on a project once where the timing analysis tools were
>> broken saying the project met timing when it didn't.  The design
>> would fail on the bench until we hit it with cold spray.
>
> This is hilarious!

this was also some time ago using the Altera Max+II tools when Quartus 
was the "current" tool.  Trouble was Altera didn't support the older 
devices with the new Quartus tool.  We were adding features to an 
existing product so we didn't have the luxury of using the new tools 
with new parts.  Eventually they relented and did support the older 
parts with Quartus, but it was well after our project was done.  I 
expect we weren't the only customer to want support for older products.

>> Is there a particular problem you are having with the results?  Is
>> the design larger than you need?  If you haven't done a place-route
>> I guess it can't be that it is too slow.  If you are just trying to
>> "optimize" I suggest you don't bother and just move on to the place
>> and route.  See what sorts of results you get before you spend time
>> trying to optimize a design that may be perfectly good.
>
> I have done place-route a couple of times and it takes around 8
> hours. (1 hour for synthesis) I tried different directives as well
> and it gave me a variety of results.

Must be a large project.  The project we were on would load up multiple 
runs on many CPUs overnight.  This would give us many trials to sort 
through the next day.  Best if this is done on a design that has passed 
all logic checks and even runs in the board with a reduced clock or cold 
spray.

> I understand how I am approaching this may not be practical in the
> grand scheme of things. BUT I got curious when I read in the V design
> methodology that if you get -300ps after post-synthesis, you can
> definitely meet timing. I also vaguely remember an illustration
> showing synthesis has a 10x effect on end results. I wonder how and
> who did these estimations.

I'm not sure what a "10x effect" means.  But sure, a bad synthesis will 
give you a bad timing result.  On large projects it is hard to deal with 
timing issues sometimes.  You might try breaking the project down to 
smaller pieces to see if they will meet timing separately.  Perhaps you 
will find a given module that is a problem and you can focus on code 
changes to improve the synthesis?  I don't think you can do tons just 
using tweaks to tool parameters.

Are your modules partitioned in a way that lets each one be checked for 
timing without lots of paths that cross?

-- 

Rick

Reply by Brian Drummond ●August 3, 20152015-08-03

On Sun, 02 Aug 2015 03:28:07 -0700, kt8128 wrote:

> On Friday, July 31, 2015 at 6:41:47 PM UTC+8, Brian Drummond wrote:

>> While Vivado is relatively new, ISE has been heavily tuned across the
>> years and I wouldn't be surprised to find it sometimes gives better
>> results.

> Yes, I am intending to try it on ISE. The latest (and last!) ISE version
> 14.7 works on one of the older V7 devices. I will try that and see what
> is the result, although I am not so sure if it gives estimated timing
> scores after synthesis. Need to look into it.

It does. If you can't see what you want in the summary, read the .syr 
(Synth report) file.

-- Brian

Previous12 Next

Picking the best synthesis result before implementation

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group