FPGARelated.com
Forums

a clueless bloke tells Xilinx to get a move on

Started by Brannon October 5, 2006
mk,

How well does it deal with memory?

Can you place 16 or 32 Gbytes in the box?

Austin


mk wrote:
> On Mon, 09 Oct 2006 08:55:58 -0700, Austin Lesea <austin@xilinx.com> > wrote: > >> Rajeev, >> >> Xilinx takes seriously any suggestions. >> >> We, of all people, with the introduction of the Virtex 5 LX330, know >> that we need to somehow make everything work better, and faster. >> >> Note that due to the memory required, the LX330 can ONLY be complied on >> a 64 bit Linux machine.... there are just too many logic cells, and too >> much routing. 8 Gbytes is about what you need, and windoze can't handle >> it (at all). > > Austin, you are mistaken. There has been a 64 bit version of windows > for more than a year now and a new updated version is going to be > released before the end of the year (Vista is in RC2 stage now). > Looking forward to a 64 bit version of ISE on Vista 64 which I am > running now.
Austin Lesea schrieb:
> mk, > > How well does it deal with memory? > > Can you place 16 or 32 Gbytes in the box?
Much more important question. How mature is it? The Xilinx software itself makes trouble enough. Adding a very beta stage operating system will bring a lot of "fun" to the software guys. Regards Falk P.S. Woh a bit the suggested partioning. Incremental design is a must at this level of size/complexity. You dnot take a big staircase in one step, dont you? Unless you fall down ;-)
On Mon, 09 Oct 2006 10:35:54 -0700, Austin Lesea <austin@xilinx.com>
wrote:

>mk, > >How well does it deal with memory? > >Can you place 16 or 32 Gbytes in the box?
Nope, when you install vista x64, 4 of the 8 2G dimms pop out of the computer... Seriously though, in my experience it's no different than linux. If you have a two cpu socket machine with 8 dimms, you can make a quad core machine with 16G and all of it is available to the 64 bit processes.
On Mon, 09 Oct 2006 19:54:44 +0200, Falk Brunner <Falk.Brunner@gmx.de>
wrote:

>Austin Lesea schrieb: >> mk, >> >> How well does it deal with memory? >> >> Can you place 16 or 32 Gbytes in the box? > >Much more important question. > >How mature is it? The Xilinx software itself makes trouble enough. >Adding a very beta stage operating system will bring a lot of "fun" to >the software guys.
As I have mentioned vista is the second version of 64 bit windows. It's stable enough to run ISE 8.2 to run to completion where it wouldn't on win32 because there it gets only 2G address space whereas on win64, 32 bit binaries get the full 4G address space.
Austin Lesea wrote:
> Note that due to the memory required, the LX330 can ONLY be complied on > a 64 bit Linux machine.... there are just too many logic cells, and too > much routing. 8 Gbytes is about what you need, and windoze can't handle > it (at all).
That says a lot about the required computing past Virtex-5, as things scale up another factor of two or four. I assume the total cpu cycles required for P&R are scaling even faster, making both memory and total instruction counts both serious bottlenecks. I assume you are supporting both Itanium IA64 and AMD _x64 architectures? Desktop 64bit machines this size aren't exactly plentiful, server class machines are a bit more plentiful, but still frequently pretty expensive. I have two quad, and three dual Itanium Linux servers here in an MPI cluster, each with at least 8GB. Plus a large MPI/PVM cluster farm of Intel P3 and P4 machines, but I suspect that's pretty rare in this readership given the stiff costs of building HPC clusters. None of them are easy to work near, as the required airflow makes them very noisy individually, and aggregated into a server room. With 8GB data sets, many algorithms fail to scale since there isn't enough data locality to make effective use of either L2 or L3 caches which are at most in the 9MB range. Relatively random access to an 8GB data set generally brings the processor to a grinding halt on memory, with a net instruction rate about 1-2% of L1 cache performance. Generally applications which have out grown 32 bit cpu caches, require nearly a complete rewrite with new algorithms and new data structures to gain enough locality to get effective/good memory performance in a large address space 64 bit machine. Almost always a major restructuring based on some flavor of divide and conquer is required to bring the memory footprint back inside L2/L3 caches. Generally this requires re-writing most, or all of the C++ code back into normal C to get rid of the dynamic allocation, randomly allocated link lists, and other data structure changes necessary to manage memory foot print and cache performance. Redesign for SMP and Clusters becomes critical, as more CPU/Caches become available to concurrently process a larger active working set than a fast cpu/core with a large cache can handle alone. Once your people start considering divide and conquer attacks to split the operation up for SMP threads or MPI/PVM clusters, it's certainly worth taking a better look at the problem for partioning it to run on a N-WAY SMP 32bit machine AND clusters too. Most of the newer high end 32 bit Intel processors will also handle 8+ GB in a machine, but only 2GB per process under Linux. Using divide and conquer the application should easily function as a 4-8 process application with some modest restructuring. With multiple cores/cpus now typical in high end SMP 32 bit machines, this should be Xilinix's primary strategy for a target host environment. I'm suprised that Xilinx didn't realize this before now, and roll out a linux SMP/cluster version of it's tools before getting backed into the 64 bit corner.
> > Regardless of the 'open source' debate, or any other postings, the FPGA > software is much, much larger, and more complex than anyone seems to > comprehend (one of the greatest barriers to entry for anyone thinking > about competition). As such, I am hopeful that some of the musing here > will get folks thinking, but realisitically, I am doubtful as the people > responsible are daily trying to make things faster, and better as part > of their job. > > Austin
Sometime the "we've always done it that way, it's the only way" problem becomes severe, as limited resources prevent considering solutions requiring a major investment ... IE a major rewrite with new algorithms and internal architectures. Data set and algorithmic scaling into HPC class facilities isn't a typical skill set for average programmers. Maybe one in a few thousand programmers have experience at this level, and probably fewer with EDA experience. There are probably far more open source programmers with experience at this level, than there are Xilinx EDA programmers that are comfortable architecting HPC SMP/Cluster software solutions. Many of the same folks have interest in high end reconfigurable computing too ... which seriously needs tools capable of doing P&R in near real-time. And then, there is always the NIH problem ....
Brannon wrote:
> 7. Is Xilinx making its money on software or hardware? If it is not > making money on software, then consider making it open source. More > eyes on the code mean more speed.
This is not just a Xilinx problem. Across the industry slow poor tools have resulted from tight fisted IP policies regarding FPGA internal design and routing data bases to build bit streams. High cost, low performance. And it means the tools are limited by the creativity and (lack of) experience of the vendors in house tools programmers. A mix of NIH and paranioa over disclosure are self defeating in selling FPGA chips in high volume. We hear complaints by the vendor that they don't have unlimited resources and must focus on selected key customer needs (AKA high volume customers demands). This same lack of resources has prevented innovative redesign of the tools to take advantage of multicore processors and cluster technologies. More importantly, the vendor doesn't have a broad systems view of their own products, and has failed to capitalize on building low cost design systems which are representive of the very market they are feeding .... FPGA centric designs. Consider that a well executed motherboard built around multiple FPGA's with PPC CPU cores could easily have far more place and route performance than any equivalently priced PC workstation by using the FPGA's as high speed parallel coprocessing routing engines. This isn't a new idea .... see http://www.cs.caltech.edu/research/ic/pdf/fastroute_fpga2003.pdf That they block both 3rd parties and open source from having access to the FPGA internals and tools internals means their customers are limited to what tools their limited resource development teams can cobble togather. With a more open disclosure, it would be interesting to see what both open source and for-profit 3rd parties could do to make a market out of providing high performance FPGA tools and integrated development systems with FPGA assisted routing.
Brannon wrote:
> 7. Is Xilinx making its money on software or hardware? If it is not > making money on software, then consider making it open source. More > eyes on the code mean more speed.
As a side note, it's not either/or between being a hardware or software company. Most major Open Source products are staffed with paid developers from multiple supporting For-Profit companies to leverage industry development dollars as far as possible. Linux exists as a viable commercial product because of hundreds of millions of dollars in salaries paid by many (MANY) large hardware and software corporations to develop the product. They did this to get out of the other extreme, which is everyone having a mediocre product due to limited development dolars because everyone was reinventing the same wheel, and claiming theirs was somehow better. A For-Profit project (UNIX) changed that model, and had everyone supporting a common UNIX development goal, which over time, out grew UNIX and became OpenSource in a number of UNIX clone forms. Pooling paid labor from both FPGA/PLD companies, and major end user companies with inhouse EDA programmers, plus educational and volunteer labor does over time generate a better product. Mostly because of the professional paid developers that are mutually committed to making it the best for THIER companies use and sale.
fpga_toys@yahoo.com wrote:

> That they block both 3rd parties and open source from having access to > the FPGA internals and tools internals means their customers are > limited to what tools their limited resource development teams can > cobble togather. > > With a more open disclosure, it would be interesting to see what both > open source and for-profit 3rd parties could do to make a market out of > providing high performance FPGA tools and integrated development > systems with FPGA assisted routing.
To this day, I'm flabbergasted that silicon vendors actually charge money for their tools!?! If you want to push your silicon, then wouldn't giving away tools only serve to sell more devices??? I'm *sure* more than one decision on Altera vs Xilinx has been made purely on the cost and strength (or otherwise) of the tools!?! As for open-source, I'd love to see it myself, but it would be a vendor's nightmare! "My design simulates fine but doesn't work in silicon... I'm using the red-hat disto of Quartus II v7.4.5 patch level 12 with the turbo annealing enhancements v2.1b4. Oh, and I found a bug so I patched the timing analyzer myself too..." Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Mark McDougall wrote:
> As for open-source, I'd love to see it myself, but it would be a > vendor's nightmare! "My design simulates fine but doesn't work in > silicon... I'm using the red-hat disto of Quartus II v7.4.5 patch level > 12 with the turbo annealing enhancements v2.1b4. Oh, and I found a bug > so I patched the timing analyzer myself too..."
It's not any different for high end server companies like IBM, HP, SGI, .... etc where stability, crash free, operation are CRITCAL operational points for million dollar hardware sales. There is a reason each of these companies have legions of developers supporting the open source products on salary. On the other hand, if the vendors had absolutely crash free reliable products that were feature rich and fast that met everyones needs, there probably wouldn't be a discussion. So, the reliability arguement I believe is a red herring. IBM, SGI, HP, RedHat all ship stable open source products, which some argue are significantly more stable and secure than proprietary alternatives in the same high reliability server market, like Microsoft. It's probably baseless to believe that any vendor would allow their inhouse programming team supporting the open source EDA tools to have any less of a quality initiative than the tools they might augment or replace. With a much wider user base, and multiple vendor support, one would expect that the broader testing and broader developer base to actually produce better tools, better tested, and stable in comparison to cash strapped proprietary efforts. The common parts of the product should reach much better maturity and stability. As for vendor specific parts, that's no different than today, since the vendors in house team will be pretty much solely responsible for their chip support. Just as we see IBM, SGI, HP, etc ... all solely responsibly for their systems architecture and device driver code.
On 2006-10-09 08:55:58 -0700, Austin Lesea <austin@xilinx.com> said:

> Rajeev, > > Xilinx takes seriously any suggestions. > > We, of all people, with the introduction of the Virtex 5 LX330, know > that we need to somehow make everything work better, and faster. > > Note that due to the memory required, the LX330 can ONLY be complied on > a 64 bit Linux machine.... there are just too many logic cells, and too > much routing. 8 Gbytes is about what you need, and windoze can't handle > it (at all).
Well then, it's about time you ported it to the Mac, isn't it ? A full 64-bit OS, with quad top-of-the-range processors on the desktop machine and gobs of RAM using this new weirdo serial-network RAM interface... Given that you've got a Linux version (which uses X), and an X server on the Mac (which also runs a pretty plain-vanilla Unix for its OS), the only real barrier ought to be the QA overhead... Perhaps not *quite* as simple as typing 'make', but almost certainly within the confines of an intern's summer job :) Well ? Simon :)