comp.arch.fpga | a clueless bloke tells Xilinx to get a move on

The following is an informal letter to Xilinx requesting their
continued efforts to increase the speed of their software tools. If
there are incorrect or missing statements, please correct me!

Dear Xilinx:

As many of us spend numerous hours of our life waiting for
Map/Par/Bitgen to finish, I hereby petition Xilinx, Inc., to consider
this issue (of their tool speed) to be of the highest priority. I am
now scared to purchase newer chips because I fear that their increased
size and complexity will only further delay my company's development
times. Please, please, please invest the time and money to make the
tools execute faster.

Have you considered the following ideas for speeding up the process?

1.	The largest benefit to speed would be obtained through making the
tools multithreaded. Upcoming multi-core processors will soon be
available on all workstation systems. What is it that is causing Xilinx
years on end to make their tools multithreaded? There is no excuse for
this. I assume the tools are written in C/C++. Cross platform C/C++
threading libraries make thread management and synchronization easy
(see boost.org).
2.	Use a different algorithm. I understand that the tools currently
rely on simulated Annealing algorithms for placement and routing. This
is probably a fine method historically, but we are arriving at the
point where all paths are constrained and the paths are complex (not
just vast in number). If there is no value in approximation, then the
algorithm loses its value.  Perhaps it is time to consider a Branch and
Bound algorithm instead. This has the advantage of being easily
threadable.
3.	SIMD instructions are available on most modern processors. Are we
taking full advantage of them? MMX, SSE1/2/3/4, etc.
4.	Modern compilers have much improved memory management and
compilation over those of previous years. Also, the underlying
libraries for memory management and file IO can have a huge impact on
speed. Which compiler are you using? Which libraries are you using?
Have you tried the latest PathScale or Intel compilers?
5.	In recent discussions about the speed of the map tool, I learned
that it took an unearthly five minutes to simply load and parse a 40MB
binary file on what is considered a fairly fast machine. It is
obviously doing a number of sanity checks on the file that are likely
unnecessary. It is also loading the chip description files at the same
time. Even still, that seems slow to me. Can we expand the file format
to include information about its own integrity? Can we increase the
file caches? Are we using good, modern parser technology? Can we add
command line parameters that will cause higher speed at the cost of
more memory usage and visa-versa? Speaking of command line parameters,
the software takes almost three seconds to show them. Why does it take
that long to simply initialize?
6.	Xilinx's chips are supposedly useful for acceleration. If so, make
a PCIe x4 board that accelerates the tools using some S3 chips and
SRAM. I'd pay $1000 for a board that gave a 5x improvement. (okay, so
that is way less than decent simulation tools, I confess I'm not
willing to pay big dolla....)
7.	Is Xilinx making its money on software or hardware? If it is not
making money on software, then consider making it open source. More
eyes on the code mean more speed. 

Sincerely,
An HDL peon

Reply by johnp ●October 5, 20062006-10-05

Brannon -

Although I'd like the tools to run faster, I think it is *far* more
important for Xilinx to fix the numerous bugs and crashes.

Yet again I've had to completely re-build a project because Navigator
corrupted the .ise file and the backup version.

Make the tools work, then speed them up.

John Providenza


Brannon wrote:
> The following is an informal letter to Xilinx requesting their
> continued efforts to increase the speed of their software tools. If
> there are incorrect or missing statements, please correct me!
>
> Dear Xilinx:
>
> As many of us spend numerous hours of our life waiting for
> Map/Par/Bitgen to finish, I hereby petition Xilinx, Inc., to consider
> this issue (of their tool speed) to be of the highest priority. I am
> now scared to purchase newer chips because I fear that their increased
> size and complexity will only further delay my company's development
> times. Please, please, please invest the time and money to make the
> tools execute faster.
>
> Have you considered the following ideas for speeding up the process?
>
> 1.	The largest benefit to speed would be obtained through making the
> tools multithreaded. Upcoming multi-core processors will soon be
> available on all workstation systems. What is it that is causing Xilinx
> years on end to make their tools multithreaded? There is no excuse for
> this. I assume the tools are written in C/C++. Cross platform C/C++
> threading libraries make thread management and synchronization easy
> (see boost.org).
> 2.	Use a different algorithm. I understand that the tools currently
> rely on simulated Annealing algorithms for placement and routing. This
> is probably a fine method historically, but we are arriving at the
> point where all paths are constrained and the paths are complex (not
> just vast in number). If there is no value in approximation, then the
> algorithm loses its value.  Perhaps it is time to consider a Branch and
> Bound algorithm instead. This has the advantage of being easily
> threadable.
> 3.	SIMD instructions are available on most modern processors. Are we
> taking full advantage of them? MMX, SSE1/2/3/4, etc.
> 4.	Modern compilers have much improved memory management and
> compilation over those of previous years. Also, the underlying
> libraries for memory management and file IO can have a huge impact on
> speed. Which compiler are you using? Which libraries are you using?
> Have you tried the latest PathScale or Intel compilers?
> 5.	In recent discussions about the speed of the map tool, I learned
> that it took an unearthly five minutes to simply load and parse a 40MB
> binary file on what is considered a fairly fast machine. It is
> obviously doing a number of sanity checks on the file that are likely
> unnecessary. It is also loading the chip description files at the same
> time. Even still, that seems slow to me. Can we expand the file format
> to include information about its own integrity? Can we increase the
> file caches? Are we using good, modern parser technology? Can we add
> command line parameters that will cause higher speed at the cost of
> more memory usage and visa-versa? Speaking of command line parameters,
> the software takes almost three seconds to show them. Why does it take
> that long to simply initialize?
> 6.	Xilinx's chips are supposedly useful for acceleration. If so, make
> a PCIe x4 board that accelerates the tools using some S3 chips and
> SRAM. I'd pay $1000 for a board that gave a 5x improvement. (okay, so
> that is way less than decent simulation tools, I confess I'm not
> willing to pay big dolla....)
> 7.	Is Xilinx making its money on software or hardware? If it is not
> making money on software, then consider making it open source. More
> eyes on the code mean more speed. 
> 
> Sincerely,
> An HDL peon

Reply by Ray Andraka ●October 5, 20062006-10-05

Brannon wrote:
> The following is an informal letter to Xilinx requesting their
> continued efforts to increase the speed of their software tools. If
> there are incorrect or missing statements, please correct me!
> 
> Dear Xilinx:
> 
> As many of us spend numerous hours of our life waiting for
> Map/Par/Bitgen to finish, I hereby petition Xilinx, Inc., to consider
> this issue (of their tool speed) to be of the highest priority. I am
> now scared to purchase newer chips because I fear that their increased
> size and complexity will only further delay my company's development
> times. Please, please, please invest the time and money to make the
> tools execute faster.
> 
> Have you considered the following ideas for speeding up the process?
> 
> 1.	The largest benefit to speed would be obtained through making the
> tools multithreaded. Upcoming multi-core processors will soon be
> available on all workstation systems. What is it that is causing Xilinx
> years on end to make their tools multithreaded? There is no excuse for
> this. I assume the tools are written in C/C++. Cross platform C/C++
> threading libraries make thread management and synchronization easy
> (see boost.org).
> 2.	Use a different algorithm. I understand that the tools currently
> rely on simulated Annealing algorithms for placement and routing. This
> is probably a fine method historically, but we are arriving at the
> point where all paths are constrained and the paths are complex (not
> just vast in number). If there is no value in approximation, then the
> algorithm loses its value.  Perhaps it is time to consider a Branch and
> Bound algorithm instead. This has the advantage of being easily
> threadable.
> 3.	SIMD instructions are available on most modern processors. Are we
> taking full advantage of them? MMX, SSE1/2/3/4, etc.
> 4.	Modern compilers have much improved memory management and
> compilation over those of previous years. Also, the underlying
> libraries for memory management and file IO can have a huge impact on
> speed. Which compiler are you using? Which libraries are you using?
> Have you tried the latest PathScale or Intel compilers?
> 5.	In recent discussions about the speed of the map tool, I learned
> that it took an unearthly five minutes to simply load and parse a 40MB
> binary file on what is considered a fairly fast machine. It is
> obviously doing a number of sanity checks on the file that are likely
> unnecessary. It is also loading the chip description files at the same
> time. Even still, that seems slow to me. Can we expand the file format
> to include information about its own integrity? Can we increase the
> file caches? Are we using good, modern parser technology? Can we add
> command line parameters that will cause higher speed at the cost of
> more memory usage and visa-versa? Speaking of command line parameters,
> the software takes almost three seconds to show them. Why does it take
> that long to simply initialize?
> 6.	Xilinx's chips are supposedly useful for acceleration. If so, make
> a PCIe x4 board that accelerates the tools using some S3 chips and
> SRAM. I'd pay $1000 for a board that gave a 5x improvement. (okay, so
> that is way less than decent simulation tools, I confess I'm not
> willing to pay big dolla....)
> 7.	Is Xilinx making its money on software or hardware? If it is not
> making money on software, then consider making it open source. More
> eyes on the code mean more speed. 
> 
> Sincerely,
> An HDL peon
> 

Xilinx has already sped up time to completion at the cost of poorer end 
performance on some high performance designs.  I'll take PAR that takes 
longer to complete but gets closer to the timing previous versions got 
on hand-placed designs over having to run a faster PAR numerous times in 
order to get a routing solution that meets timing.

I have to wonder whether the writer of this letter looked at his own 
design for the reasons PAR was taking too long.  Did he keep the levels 
of logic to a reasonable number for his desired timing target?  Did he 
duplicate logic to reduce high fanout nets?  Did he try any 
floorplanning for critical parts of the design?  Somehow I doubt it, yet 
those things can make a several orders of magnitude difference in the 
time to run PAR.

Reply by PeteS ●October 5, 20062006-10-05

johnp wrote:
> Brannon -
> 
> Although I'd like the tools to run faster, I think it is *far* more
> important for Xilinx to fix the numerous bugs and crashes.
> 
> Yet again I've had to completely re-build a project because Navigator
> corrupted the .ise file and the backup version.
> 
> Make the tools work, then speed them up.
> 
> John Providenza
> 
> 
> Brannon wrote:
> 
>>The following is an informal letter to Xilinx requesting their
>>continued efforts to increase the speed of their software tools. If
>>there are incorrect or missing statements, please correct me!
>>
>>Dear Xilinx:
>>
>>As many of us spend numerous hours of our life waiting for
>>Map/Par/Bitgen to finish, I hereby petition Xilinx, Inc., to consider
>>this issue (of their tool speed) to be of the highest priority. I am
>>now scared to purchase newer chips because I fear that their increased
>>size and complexity will only further delay my company's development
>>times. Please, please, please invest the time and money to make the
>>tools execute faster.
>>
>>Have you considered the following ideas for speeding up the process?
>>
>>1.	The largest benefit to speed would be obtained through making the
>>tools multithreaded. Upcoming multi-core processors will soon be
>>available on all workstation systems. What is it that is causing Xilinx
>>years on end to make their tools multithreaded? There is no excuse for
>>this. I assume the tools are written in C/C++. Cross platform C/C++
>>threading libraries make thread management and synchronization easy
>>(see boost.org).
>>2.	Use a different algorithm. I understand that the tools currently
>>rely on simulated Annealing algorithms for placement and routing. This
>>is probably a fine method historically, but we are arriving at the
>>point where all paths are constrained and the paths are complex (not
>>just vast in number). If there is no value in approximation, then the
>>algorithm loses its value.  Perhaps it is time to consider a Branch and
>>Bound algorithm instead. This has the advantage of being easily
>>threadable.
>>3.	SIMD instructions are available on most modern processors. Are we
>>taking full advantage of them? MMX, SSE1/2/3/4, etc.
>>4.	Modern compilers have much improved memory management and
>>compilation over those of previous years. Also, the underlying
>>libraries for memory management and file IO can have a huge impact on
>>speed. Which compiler are you using? Which libraries are you using?
>>Have you tried the latest PathScale or Intel compilers?
>>5.	In recent discussions about the speed of the map tool, I learned
>>that it took an unearthly five minutes to simply load and parse a 40MB
>>binary file on what is considered a fairly fast machine. It is
>>obviously doing a number of sanity checks on the file that are likely
>>unnecessary. It is also loading the chip description files at the same
>>time. Even still, that seems slow to me. Can we expand the file format
>>to include information about its own integrity? Can we increase the
>>file caches? Are we using good, modern parser technology? Can we add
>>command line parameters that will cause higher speed at the cost of
>>more memory usage and visa-versa? Speaking of command line parameters,
>>the software takes almost three seconds to show them. Why does it take
>>that long to simply initialize?
>>6.	Xilinx's chips are supposedly useful for acceleration. If so, make
>>a PCIe x4 board that accelerates the tools using some S3 chips and
>>SRAM. I'd pay $1000 for a board that gave a 5x improvement. (okay, so
>>that is way less than decent simulation tools, I confess I'm not
>>willing to pay big dolla....)
>>7.	Is Xilinx making its money on software or hardware? If it is not
>>making money on software, then consider making it open source. More
>>eyes on the code mean more speed. 
>>
>>Sincerely,
>>An HDL peon
> 
> 
The crashes are, no doubt, because of the increasing complexity of each 
part of the process required to be evaluated by the tools. The *nix way 
was always 'do one thing and do it well' which used to exemplify the 
Xilinx tools. As they have got more complex, they have added things to 
each tool, such that they are now doing more than one thing. Adding such 
complexity adds exponential sources of problems.

I suggest each tool be completely re-evaluated - and if it's doing more 
than one thing, separate those things back out - to 'Do one thing and do 
it well'.

That would very probably deal with a lot of the crashes, and ultimately 
speed.

Cheers

PeteS

Reply by Thomas Entner ●October 5, 20062006-10-05

I back up both requests:

Xilinx and Altera should make there tools faster, especially make usage of 
multi-cores. I hope that there is something coming soon, as this trend as 
clear since over a year. E.g. I wrote on 1st March 2005 to this newsgroup:
> I think we should all encourage the FPGA- and EDA-tool-vendors to adapt
> there software for parallel algorithms (especially place and route), as 
> the
> dual-cores are really coming soon and most of us will buy the fastest
> machine they can get for reasonable money. In fact, a parallel algorithm
> would already help a little bit today for P4s with hyper-threading.

Also Xilinx should improve their software quality, with every new ISE you 
get new errors into your designs that worked fine with previous releases. 
It's frustrating... In doubt I would also recommend to concentrate on the 
old bugs first before introducing new ones with multithreading... (my 
Xilinx-designs are not that large ;-)

Thomas

"johnp" <johnp3+nospam@probo.com> schrieb im Newsbeitrag 
news:1160077316.480007.91410@m7g2000cwm.googlegroups.com...
> Brannon -
>
> Although I'd like the tools to run faster, I think it is *far* more
> important for Xilinx to fix the numerous bugs and crashes.
>
> Yet again I've had to completely re-build a project because Navigator
> corrupted the .ise file and the backup version.
>
> Make the tools work, then speed them up.
>
> John Providenza
>
>
> Brannon wrote:
>> The following is an informal letter to Xilinx requesting their
>> continued efforts to increase the speed of their software tools. If
>> there are incorrect or missing statements, please correct me!
>>
>> Dear Xilinx:
>>
>> As many of us spend numerous hours of our life waiting for
>> Map/Par/Bitgen to finish, I hereby petition Xilinx, Inc., to consider
>> this issue (of their tool speed) to be of the highest priority. I am
>> now scared to purchase newer chips because I fear that their increased
>> size and complexity will only further delay my company's development
>> times. Please, please, please invest the time and money to make the
>> tools execute faster.
>>
>> Have you considered the following ideas for speeding up the process?
>>
>> 1. The largest benefit to speed would be obtained through making the
>> tools multithreaded. Upcoming multi-core processors will soon be
>> available on all workstation systems. What is it that is causing Xilinx
>> years on end to make their tools multithreaded? There is no excuse for
>> this. I assume the tools are written in C/C++. Cross platform C/C++
>> threading libraries make thread management and synchronization easy
>> (see boost.org).
>> 2. Use a different algorithm. I understand that the tools currently
>> rely on simulated Annealing algorithms for placement and routing. This
>> is probably a fine method historically, but we are arriving at the
>> point where all paths are constrained and the paths are complex (not
>> just vast in number). If there is no value in approximation, then the
>> algorithm loses its value.  Perhaps it is time to consider a Branch and
>> Bound algorithm instead. This has the advantage of being easily
>> threadable.
>> 3. SIMD instructions are available on most modern processors. Are we
>> taking full advantage of them? MMX, SSE1/2/3/4, etc.
>> 4. Modern compilers have much improved memory management and
>> compilation over those of previous years. Also, the underlying
>> libraries for memory management and file IO can have a huge impact on
>> speed. Which compiler are you using? Which libraries are you using?
>> Have you tried the latest PathScale or Intel compilers?
>> 5. In recent discussions about the speed of the map tool, I learned
>> that it took an unearthly five minutes to simply load and parse a 40MB
>> binary file on what is considered a fairly fast machine. It is
>> obviously doing a number of sanity checks on the file that are likely
>> unnecessary. It is also loading the chip description files at the same
>> time. Even still, that seems slow to me. Can we expand the file format
>> to include information about its own integrity? Can we increase the
>> file caches? Are we using good, modern parser technology? Can we add
>> command line parameters that will cause higher speed at the cost of
>> more memory usage and visa-versa? Speaking of command line parameters,
>> the software takes almost three seconds to show them. Why does it take
>> that long to simply initialize?
>> 6. Xilinx's chips are supposedly useful for acceleration. If so, make
>> a PCIe x4 board that accelerates the tools using some S3 chips and
>> SRAM. I'd pay $1000 for a board that gave a 5x improvement. (okay, so
>> that is way less than decent simulation tools, I confess I'm not
>> willing to pay big dolla....)
>> 7. Is Xilinx making its money on software or hardware? If it is not
>> making money on software, then consider making it open source. More
>> eyes on the code mean more speed.
>>
>> Sincerely,
>> An HDL peon
>

Reply by ●October 5, 20062006-10-05

Just wondering here...

What platform are you running the tools on?

I benchmarked several chip design tools on 'doze and 'nix.

If the tool stayed inside physical memory, they ran at about even
speeds.

I found that once physical memory was exhausted, the 'nix variant would
run at least 10x faster.  An hour-plus simulation would finish in under
5 minutes.

I traced the difference to the memory manager.  In the 'doze variant,
less than 10% of the processor was left for the application to use
while the swapper was running.

This may have changed in the last couple of years, but I doubt it.

GH

Reply by Brannon ●October 5, 20062006-10-05

> I have to wonder whether the writer of this letter looked at his own
> design for the reasons PAR was taking too long.  Did he keep the levels
> of logic to a reasonable number for his desired timing target?  Did he
> duplicate logic to reduce high fanout nets?  Did he try any
> floorplanning for critical parts of the design?  Somehow I doubt it, yet
> those things can make a several orders of magnitude difference in the
> time to run PAR.

My logic and fannouts are fine. I confess, though, that I have never
done floorplanning. I wouldn't even know where to start with it. I
don't even know what level floorplanning is done at. I rarely use XST;
I use my own EDIF generation tools. The tools I use tile out vast
amounts of logic recursively; attaching location constraints to them is
quite difficult (their names change each compile, the flattened logic
does not always look the same, etc.). My impression was that
floorplanning required constraints at some level and that it is
difficult using XST as well. Is that not true? I don't doubt that my
top-level tool choice is hindering my ability to take full advantage of
the Xilinx tools. How much time would it take to do a floorplan on a
four million gate project? At what stage during the development process
do you do it? I'm trying to increase my development time, not my
retargetability time.

Reply by Symon ●October 5, 20062006-10-05

Hi Brannon,
So, I guess we'd all like the tools to run faster, and you make some good 
suggestions.
However, I wonder how often you _need_ to do a PAR cycle? Please excuse me 
if I'm teaching you to suck eggs, but I just want to check you've considered 
a development process where you simulate things before PAR. This way, your 
logic errors are found in the simulator, not the real hardware. If you like 
to try stuff out as you go, maybe you could run the PAR each evening before 
heading out to the pub, that's what I sometimes do. :-)
HTH, Syms.

Reply by Brannon ●October 5, 20062006-10-05

When I was using VCS to simulate complicated stuff before, it took
several hours per run. I agree that the output was infinitely more
useful. However, have you seen the prices on such EDIF tools? You can
run a lot of 10 minute comiples before you pay for a $10k piece of
software. And that's a cheap one.

Symon wrote:
> Hi Brannon,
> So, I guess we'd all like the tools to run faster, and you make some good
> suggestions.
> However, I wonder how often you _need_ to do a PAR cycle? Please excuse me
> if I'm teaching you to suck eggs, but I just want to check you've considered
> a development process where you simulate things before PAR. This way, your
> logic errors are found in the simulator, not the real hardware. If you like
> to try stuff out as you go, maybe you could run the PAR each evening before
> heading out to the pub, that's what I sometimes do. :-)
> HTH, Syms.

Reply by Anonymous ●October 5, 20062006-10-05

I would settle for an incremental compile that worked without me spending
more time screwing with that then it takes to just do a fresh compile each
time. There's no reason that changing one pin assignment or one DCM
parameter should cost 10 minutes time.

You can talk about simulation ahead of time but (a) that doesn't work too
well when you're doing signal processing, it would take me a month to build
a testbench that would tell me what five minutes of a live system tells me
and (b) when it comes to integration time there are always nits and nats
that have to be tweaked (wrong pin assignment, CE to a chip is wrong
polarity, etc.) .

-Clark

Previous12 3 Next

a clueless bloke tells Xilinx to get a move on

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group