comp.arch.fpga | a clueless bloke tells Xilinx to get a move on| page 3

Reply by Austin Lesea ●October 9, 20062006-10-09

mk,

How well does it deal with memory?

Can you place 16 or 32 Gbytes in the box?

Austin


mk wrote:
> On Mon, 09 Oct 2006 08:55:58 -0700, Austin Lesea <austin@xilinx.com>
> wrote:
> 
>> Rajeev,
>>
>> Xilinx takes seriously any suggestions.
>>
>> We, of all people, with the introduction of the Virtex 5 LX330, know
>> that we need to somehow make everything work better, and faster.
>>
>> Note that due to the memory required, the LX330 can ONLY be complied on
>> a 64 bit Linux machine.... there are just too many logic cells, and too
>> much routing.  8 Gbytes is about what you need, and windoze can't handle
>> it (at all).
> 
> Austin, you are mistaken. There has been a 64 bit version of windows
> for more than a year now and a new updated version is going to be
> released before the end of the year (Vista is in RC2 stage now).
> Looking forward to a 64 bit version of ISE on Vista 64 which I am
> running now.

Reply by Falk Brunner ●October 9, 20062006-10-09

Austin Lesea schrieb:
> mk,
> 
> How well does it deal with memory?
> 
> Can you place 16 or 32 Gbytes in the box?

Much more important question.

How mature is it? The Xilinx software itself makes trouble enough. 
Adding a very beta stage operating system will bring a lot of "fun" to 
the software guys.

Regards
Falk

P.S. Woh a bit the suggested partioning. Incremental design is a must at 
this level of size/complexity. You dnot take a big staircase in one 
step, dont you? Unless you fall down ;-)

Reply by mk ●October 9, 20062006-10-09

On Mon, 09 Oct 2006 10:35:54 -0700, Austin Lesea <austin@xilinx.com>
wrote:

>mk,
>
>How well does it deal with memory?
>
>Can you place 16 or 32 Gbytes in the box?

Nope, when you install vista x64,  4 of the 8 2G dimms pop out of the
computer...
Seriously though, in my experience it's no different than linux. If
you have a two cpu socket machine with 8 dimms, you can make a quad
core machine with 16G and all of it is available to the 64 bit
processes.

Reply by mk ●October 9, 20062006-10-09

On Mon, 09 Oct 2006 19:54:44 +0200, Falk Brunner <Falk.Brunner@gmx.de>
wrote:

>Austin Lesea schrieb:
>> mk,
>> 
>> How well does it deal with memory?
>> 
>> Can you place 16 or 32 Gbytes in the box?
>
>Much more important question.
>
>How mature is it? The Xilinx software itself makes trouble enough. 
>Adding a very beta stage operating system will bring a lot of "fun" to 
>the software guys.

As I have mentioned vista is the second version of 64 bit windows.
It's stable enough to run ISE 8.2 to run to completion where it
wouldn't on win32 because there it gets only 2G address space whereas
on win64, 32 bit binaries get the full 4G address space.

Reply by ●October 10, 20062006-10-10

Austin Lesea wrote:
> Note that due to the memory required, the LX330 can ONLY be complied on
> a 64 bit Linux machine.... there are just too many logic cells, and too
> much routing.  8 Gbytes is about what you need, and windoze can't handle
> it (at all).

That says a lot about the required computing past Virtex-5, as things
scale up another factor of two or four. I assume the total cpu cycles
required for P&R are scaling even faster, making both memory and total
instruction counts both serious bottlenecks.

I assume you are supporting both Itanium IA64 and AMD _x64
architectures? Desktop 64bit machines this size aren't exactly
plentiful, server class machines are a bit more plentiful, but still
frequently pretty expensive. I have two quad, and three dual Itanium
Linux servers here in an MPI cluster, each with at least 8GB. Plus a
large MPI/PVM cluster farm of Intel P3 and P4 machines, but I suspect
that's pretty rare in this readership given the stiff costs of building
HPC clusters. None of them are easy to work near, as the required
airflow makes them very noisy individually, and aggregated into a
server room.

With 8GB data sets, many algorithms fail to scale since there isn't
enough data locality to make effective use of either L2 or L3 caches
which are at most in the 9MB range. Relatively random access to an 8GB
data set generally brings the processor to a grinding halt on memory,
with a net instruction rate about 1-2% of L1 cache performance.
Generally applications which have out grown 32 bit cpu caches, require
nearly a complete rewrite with new algorithms and new data structures
to gain enough locality to get effective/good memory performance in a
large address space 64 bit machine.

Almost always a major restructuring based on some flavor of divide and
conquer is required to bring the memory footprint back inside L2/L3
caches. Generally this requires re-writing most, or all of the C++ code
back into normal C to get rid of the dynamic allocation, randomly
allocated link lists, and other data structure changes necessary to
manage memory foot print and cache performance. Redesign for SMP and
Clusters becomes critical, as more CPU/Caches become available to
concurrently process a larger active working set than a fast cpu/core
with a large cache can handle alone.

Once your people start considering divide and conquer attacks to split
the operation up for SMP threads or MPI/PVM clusters, it's certainly
worth taking a better look at the problem for partioning it to run on a
N-WAY SMP 32bit machine AND clusters too. Most of the newer high end 32
bit Intel processors will also handle 8+ GB in a machine, but only 2GB
per process under Linux. Using divide and conquer the application
should easily function as a 4-8 process application with some modest
restructuring. With multiple cores/cpus now typical in high end SMP 32
bit machines, this should be Xilinix's primary strategy for a target
host environment.

I'm suprised that Xilinx didn't realize this before now, and roll out a
linux SMP/cluster version of it's tools before getting backed into the
64 bit corner.

>
> Regardless of the 'open source' debate, or any other postings, the FPGA
> software is much, much larger, and more complex than anyone seems to
> comprehend (one of the greatest barriers to entry for anyone thinking
> about competition).  As such, I am hopeful that some of the musing here
> will get folks thinking, but realisitically, I am doubtful as the people
> responsible are daily trying to make things faster, and better as part
> of their job.
>
> Austin

Sometime the "we've always done it that way, it's the only way" problem
becomes severe, as limited resources prevent considering solutions
requiring a major investment ... IE a major rewrite with new algorithms
and internal architectures.

Data set and algorithmic scaling into HPC class facilities isn't a
typical skill set for average programmers. Maybe one in a few thousand
programmers have experience at this level, and probably fewer with EDA
experience.

There are probably far more open source programmers with experience at
this level, than there are Xilinx EDA programmers that are comfortable
architecting HPC SMP/Cluster software solutions. Many of the same folks
have interest in high end reconfigurable computing too ... which
seriously needs tools capable of doing P&R in near real-time.

And then, there is always the NIH problem ....

Reply by ●October 11, 20062006-10-11

Brannon wrote:
> 7.	Is Xilinx making its money on software or hardware? If it is not
> making money on software, then consider making it open source. More
> eyes on the code mean more speed.

This is not just a Xilinx problem. Across the industry slow poor tools
have resulted from tight fisted IP policies regarding FPGA internal
design and routing data bases to build bit streams. High cost, low
performance.

And it means the tools are limited by the creativity and (lack of)
experience of the vendors in house tools programmers. A mix of NIH and
paranioa over disclosure are self defeating in selling FPGA chips in
high volume. We hear complaints by the vendor that they don't have
unlimited resources and must focus on selected key customer needs (AKA
high volume customers demands). This same lack of resources has
prevented innovative redesign of the tools to take advantage of
multicore processors and cluster technologies.

More importantly, the vendor doesn't have a broad systems view of their
own products, and has failed to capitalize on building low cost design
systems which are representive of the very market they are feeding ....
FPGA centric designs. Consider that a well executed motherboard built
around multiple FPGA's with PPC CPU cores could easily have far more
place and route performance than any equivalently priced PC workstation
by using the FPGA's as high speed parallel coprocessing routing
engines.  This isn't a new idea .... see
http://www.cs.caltech.edu/research/ic/pdf/fastroute_fpga2003.pdf

That they block both 3rd parties and open source from having access to
the FPGA internals and tools internals means their customers are
limited to what tools their limited resource development teams can
cobble togather.

With a more open disclosure, it would be interesting to see what both
open source and for-profit 3rd parties could do to make a market out of
providing high performance FPGA tools and integrated development
systems with FPGA assisted routing.

Reply by ●October 11, 20062006-10-11

Brannon wrote:
> 7.	Is Xilinx making its money on software or hardware? If it is not
> making money on software, then consider making it open source. More
> eyes on the code mean more speed.

As a side note, it's not either/or between being a hardware or software
company.

Most major Open Source products are staffed with paid developers from
multiple supporting For-Profit companies to leverage industry
development dollars as far as possible. Linux exists as a viable
commercial product because of hundreds of millions of dollars in
salaries paid by many (MANY) large hardware and software corporations
to develop the product.

They did this to get out of the other extreme, which is everyone having
a mediocre product due to limited development dolars because everyone
was reinventing the same wheel, and claiming theirs was somehow better.
A For-Profit project (UNIX) changed that model, and had everyone
supporting a common UNIX development goal, which over time, out grew
UNIX and became OpenSource in a number of UNIX clone forms.

Pooling paid labor from both FPGA/PLD companies, and major end user
companies with inhouse EDA programmers, plus educational and volunteer
labor does over time generate a better product. Mostly because of the
professional paid developers that are mutually committed to making it
the best for THIER companies use and sale.

Reply by Mark McDougall ●October 11, 20062006-10-11

fpga_toys@yahoo.com wrote:

> That they block both 3rd parties and open source from having access to
> the FPGA internals and tools internals means their customers are
> limited to what tools their limited resource development teams can
> cobble togather.
> 
> With a more open disclosure, it would be interesting to see what both
> open source and for-profit 3rd parties could do to make a market out of
> providing high performance FPGA tools and integrated development
> systems with FPGA assisted routing.

To this day, I'm flabbergasted that silicon vendors actually charge
money for their tools!?! If you want to push your silicon, then wouldn't
giving away tools only serve to sell more devices???

I'm *sure* more than one decision on Altera vs Xilinx has been made
purely on the cost and strength (or otherwise) of the tools!?!

As for open-source, I'd love to see it myself, but it would be a
vendor's nightmare! "My design simulates fine but doesn't work in
silicon... I'm using the red-hat disto of Quartus II v7.4.5 patch level
12 with the turbo annealing enhancements v2.1b4. Oh, and I found a bug
so I patched the timing analyzer myself too..."

Regards,

-- 
Mark McDougall, Engineer
Virtual Logic Pty Ltd, <http://www.vl.com.au>
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266

Reply by ●October 12, 20062006-10-12

Mark McDougall wrote:
> As for open-source, I'd love to see it myself, but it would be a
> vendor's nightmare! "My design simulates fine but doesn't work in
> silicon... I'm using the red-hat disto of Quartus II v7.4.5 patch level
> 12 with the turbo annealing enhancements v2.1b4. Oh, and I found a bug
> so I patched the timing analyzer myself too..."

It's not any different for high end server companies like IBM, HP, SGI,
.... etc where stability, crash free, operation are CRITCAL operational
points for million dollar hardware sales.

There is a reason each of these companies have legions of developers
supporting the open source products on salary.

On the other hand, if the vendors had absolutely crash free reliable
products that were feature rich and fast that met everyones needs,
there probably wouldn't be a discussion.

So, the reliability arguement I believe is a red herring. IBM, SGI, HP,
RedHat all ship stable open source products, which some argue are
significantly more stable and secure than proprietary alternatives in
the same high reliability server market, like Microsoft. It's probably
baseless to believe that any vendor would allow their inhouse
programming team supporting the open source EDA tools to have any less
of a quality initiative than the tools they might augment or replace.

With a much wider user base, and multiple vendor support, one would
expect that the broader testing and broader developer base to actually
produce better tools, better tested, and stable in comparison to cash
strapped proprietary efforts. The common parts of the product should
reach much better maturity and stability. As for vendor specific parts,
that's no different than today, since the vendors in house team will be
pretty much solely responsible for their chip support. Just as we see
IBM, SGI, HP, etc ... all solely responsibly for their systems
architecture and device driver code.

Reply by Simon ●October 20, 20062006-10-20

On 2006-10-09 08:55:58 -0700, Austin Lesea <austin@xilinx.com> said:

> Rajeev,
> 
> Xilinx takes seriously any suggestions.
> 
> We, of all people, with the introduction of the Virtex 5 LX330, know
> that we need to somehow make everything work better, and faster.
> 
> Note that due to the memory required, the LX330 can ONLY be complied on
> a 64 bit Linux machine.... there are just too many logic cells, and too
> much routing.  8 Gbytes is about what you need, and windoze can't handle
> it (at all).

Well then, it's about time you ported it to the Mac, isn't it ? A full 
64-bit OS, with quad top-of-the-range processors on the desktop machine 
and gobs of RAM using this new weirdo serial-network RAM interface...

Given that you've got a Linux version (which uses X), and an X server 
on the Mac (which also runs a pretty plain-vanilla Unix for its OS), 
the only real barrier ought to be the QA overhead... Perhaps not 
*quite* as simple as typing 'make', but almost certainly within the 
confines of an intern's summer job :)

Well ?

Simon :)

Previous 1 23Next

a clueless bloke tells Xilinx to get a move on

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group