There are 18 messages in this thread.
You are currently looking at messages 10 to 18.
>>I took the plunge and built up a 2nd PC using a Core2Duo. > >>Here are the specs: >>Old PC: P4 3GHz HT, 2GB DDR2-533 RAM, Gigabyte GA81915 mobo, stock >>cooler >>New PC: Core2Duo E6600, 2GB DDR2-800 RAM, ASUS P5B Mobo, ArcticFreezer7 >>cooler > >>Using a Spartan3 design running clean from scratch in ISE 8.2.3i >>Old PC: 82mins >>New PC: 35mins >>New PC (overclocked to 3.2GHz): 25mins > >>I'm really pleased with the Core2Duo and would recommend it. > > Conclusion dual cores (multiprocessor) benefits Xilinx ISE substantially? > No, cache size matters.... As far as I know, neither ISE nor Quartus use the second core, but both benefit from the huge cache. Thomas www.entner-electronics.com
Thomas Entner wrote: > >>I took the plunge and built up a 2nd PC using a Core2Duo. > > > >>Here are the specs: > >>Old PC: P4 3GHz HT, 2GB DDR2-533 RAM, Gigabyte GA81915 mobo, stock > >>cooler > >>New PC: Core2Duo E6600, 2GB DDR2-800 RAM, ASUS P5B Mobo, ArcticFreezer7 > >>cooler > > > >>Using a Spartan3 design running clean from scratch in ISE 8.2.3i > >>Old PC: 82mins > >>New PC: 35mins > >>New PC (overclocked to 3.2GHz): 25mins > > > >>I'm really pleased with the Core2Duo and would recommend it. > > > > Conclusion dual cores (multiprocessor) benefits Xilinx ISE substantially? > > > No, cache size matters.... As far as I know, neither ISE nor Quartus use the > second core, but both benefit from the huge cache. > > Thomas > > www.entner-electronics.com Not just regular L2 cache but the TLB or address cache matters even more I suspect but harder to characterize and explain. When the data set is still beyond even the bigger combined cache of a Dual, the increase in associative ways of the bigger TLB kicks in to reduce the incidence of the OS having to refill MMU page tables which can blow ns cache hits into several 100ns accesses for full cache miss. I ran a test on an older 2GHz Athlon XP2400 and a 2.6GHz D805 for a loop that just randomly accesses ints from a 512MB array using a mask to control the variability of address from 256 ints to the 128M max and for each case run the loop 1M times. I believe this represents the worst possible behaviour of any CAD application that must traverse huge graphs or trees that can not fit cache but easily fit DRAM. The D805 generally runs 30% faster as the clock suggests while the tests are entirely cache bound but the Athlon has 256K of L2 with 256 ways in the TLB. The 805 has 1MB of L2 in each core and I expect the TLB has 1k ways of associativity. Only 1 core is used. I expect the CoreDuo or 64b Athlons to perform somewhat better. For in cache times the loop iterates in 7ns or 10ns resp for D805 v xp2400. As the range of addresses increases past 64K the Athlon staircases to 60ns then out around 2M degrades to 80ns-150ns and at 128M range settles at 400ns per iteration over the original 10ns or 40 times slower to crawl memory. The D805 fairs some better, it tolerates another 2b of address but degrades to 60ns at 256K level then reaches 130ns at the 128M level. In other words when the L2 cache always misses, the D805 spends far less time patching up the TLB and MMU page tables. The D805 runs Windows2k with 1GB of DDR400 and the Athlon runs BeOS on 1GB of DDR266 but thats not real important. Conclusion is that paying for bigger TLBs is probably far better than more cpus since it just keeps the uni processor closer to its ideal performance for codes that have poor locality of reference. Adding more cores probably makes things worse as the quad core shows unless code is really multithreaded. John Jakson transputer guy______________________________
Thomas Entner wrote: >>> I took the plunge and built up a 2nd PC using a Core2Duo. >>> Here are the specs: >>> Old PC: P4 3GHz HT, 2GB DDR2-533 RAM, Gigabyte GA81915 mobo, stock >>> cooler >>> New PC: Core2Duo E6600, 2GB DDR2-800 RAM, ASUS P5B Mobo, ArcticFreezer7 >>> cooler >>> Using a Spartan3 design running clean from scratch in ISE 8.2.3i >>> Old PC: 82mins >>> New PC: 35mins >>> New PC (overclocked to 3.2GHz): 25mins >>> I'm really pleased with the Core2Duo and would recommend it. >> Conclusion dual cores (multiprocessor) benefits Xilinx ISE substantially? >> > No, cache size matters.... As far as I know, neither ISE nor Quartus use the > second core, but both benefit from the huge cache. > > Thomas > > www.entner-electronics.com > > I'm sure the second core will make a difference - while the one long task is occupying one core, other minor tasks will run on the other core. While these other tasks might only take a tiny proportion of the processor time, you avoid the penalties of task switching (like losing your cache) on the working processor.______________________________
On Mon, 06 Nov 2006 09:50:02 +0100, David Brown <d...@westcontrol.removethisbit.com> wrote: > >I'm sure the second core will make a difference - while the one long >task is occupying one core, other minor tasks will run on the other >core. While these other tasks might only take a tiny proportion of the >processor time, you avoid the penalties of task switching (like losing >your cache) on the working processor. > Assuming you set the thread affinity for the long task. If you observe top on linux or task manager on windows xp, vista you will se that the %99.9 cpu consuming task is being migrated from cpu to cpu quite frequently. I am not sure why the scheduler of either OS does this.
Relevant to several recent threads, Altera just announced their Stratix III and with it Quartus 6.1 of which the first bullet item is: "Multiprocessor support: Allowing parallel processing during compilation for computers with multiple processors results in a reduction in compile times. Quartus II software offers the first multiprocessor support from an FPGA vendor to take advantage of the new multiple-core processors." The actual software is available *now* (according to the press release). Trying to get it reveals that *now* is really December 4th :-) I look forward to see how it scales with multiple cores. Tommy______________________________
Hi Tommy, > I look forward to see how it scales with multiple cores. On two cores we've seen between 1.6X and 1.9X the performance (depending on the algorithm) for the parallelized sections of code, yielding up to a 20% compile time reduction. Adding more cores gives you big speed-ups on those portions of code -- but Amdahl's Law kicks in pretty fast. The remaining single-threaded algorithms become a larger portion of the run-time as you add processors, diminishing the overall returns. FPGAs are getting bigger faster than CPUs are getting faster; this has been true for a long time. Without innovation in the software, compile times would grow with each generation. Thankfully, we've been able to close this gap, and even improve our run-time (and memory consumption) over time. Multi-cores is just the next step in this evolution. Modern CAD systems such as Quartus II contain numerous algorithms, all of which contribute significantly to the run-time of the system. Each algorithm presents its own challenges for parallization (if that's a word). Over time as we parallelize more and more of the tool, the benefits and scalibility will increase. Memory consumption is also a challenge as FPGAs continue to scale in size. Keeping memory use in check yields many benefits -- cheaper machines, sticking with 32-bit OSes, and better cache locality (and hence run-time). You'll find QII 6.1 (even for Stratix III) performs well on this metric too. > The actual software is available *now* (according to the press > release). Trying to get it reveals that *now* is really December 4th > :-) Customers can get the software today via their local Altera sales representative or distributor sales office. General/full availibility is December 4th as you've indicated. Regards, Paul Leventis Altera Corp.
I came across the posting for the Stratix III the other day on their website. Short of putting engineering samples in everybody hands, you'd think they would want to coordinate the release of the new version of Quartus II with the announcement for the new devices so that engineers can see how their desings fair in the new software and devices. I only had a few minutes to look at the website but the new devices look like they have made them more granular and have doubled the frequency of their devices. I am a Quartus II user and my sales rep, Linda, has always done a good job of getting me a copy of the software. So, I have one morew thing to look forward to in December. Derek Tommy Thorn wrote: > Relevant to several recent threads, Altera just announced their Stratix > III and with it Quartus 6.1 of which the first bullet item is: > > "Multiprocessor support: Allowing parallel processing during > compilation for computers with multiple processors results in a > reduction in compile times. Quartus II software offers the first > multiprocessor support from an FPGA vendor to take advantage of the new > multiple-core processors." > > The actual software is available *now* (according to the press > release). Trying to get it reveals that *now* is really December 4th > :-) > > I look forward to see how it scales with multiple cores. > > Tommy
David Brown wrote: > I'm sure the second core will make a difference - while the one long > task is occupying one core, other minor tasks will run on the other > core. While these other tasks might only take a tiny proportion of the > processor time, you avoid the penalties of task switching (like losing > your cache) on the working processor. I started using a Mac Pro a few weeks ago - Dual Core2Duo Xeons, 2GB RAM running XP SP2. Although ISE isn't muti-threaded, I found a use for the 2nd processor yesterday - I ran a second instance of ISE. I'm working on a multi-chip design, and I synthesized one project while routing a second project. I set the affinity so that they executed on different processors (at least I think they were on different processors). I didn't benchmark the execution speed, but the time didn't seem out of line. --- Joe Samson Pixel Velocity______________________________