comp.arch.fpga | Need to speed up Stratix compiles.| page 2

Reply by Jim Granville ●March 3, 20042004-03-03

Paul Leventis (at home) wrote:
>>Seems to be a common misconception that 64bits just increases the amount
>>of addressable memory.  More importantly for most applications is that twice
>>the data is moved or operated on per clock cycle.
> 
> 64-bitness _is_ mostly about addressable memory -- it is rare that 64-bit
> integers help reduce run-time.  Please see my previous postings on the topic
> and some of the replies to it:
> 
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=Paul+Leventis+64-bit

  I think the OP was refering to the wider datapaths.
I don't know the cycle level details of the AMD or Intel 64 bit
but an obvious and simple speed gain can come from a wider HW fetch.
(even running < 64 bit opcodes ) and then a simple check if the next 
opcode / next data value is in that block.

  This works in systems where the CPU must wait for slower downstream 
memories, and even the smaller single chip microcontrollers are
starting to do this. eg Philips ARM uC has 128 bit FETCH.
  Clearly, random code or data will not be helped, but a large %
of code will be sequential.
  I'm not sure the AMD/Intel offerings hit the SIMD (Single 
instruction/multiple data ) of other cores, but even without that,
some HW gains would be expected.
-jg

Reply by Paul Leventis (at home) ●March 3, 20042004-03-03

Hi Jim,

>   I think the OP was refering to the wider datapaths.
> I don't know the cycle level details of the AMD or Intel 64 bit
> but an obvious and simple speed gain can come from a wider HW fetch.
> (even running < 64 bit opcodes ) and then a simple check if the next
> opcode / next data value is in that block.

Yes, wider memory interfaces/cache data lines can help, but as you say, this
is independent of op-code size.  If I recall correctly, AMD and Intel
processors already fetch 64-bit blocks, but this may have been increased.
The latest m/b chipsets for both families of processors use dual-channel DDR
(128-bits wide) and so I would not be surprised if they've increased the
size of fetches.

As vendors introduce 64-bit capable processors (such as Opteron), they often
also enhance various aspects of the CPU architecture in ways that help both
32- and 64-bit code.  And while the 64-bitness of x86-64 may not matter much
for speed, the doubling of the register files etc. could result in faster
performance.

It's every computer engineers dream to be a processor architect, isn't it?
:-)

Regards,

- Paul

Reply by Max ●March 3, 20042004-03-03

On Tue, 2 Mar 2004 19:57:58 -0600, Kenneth Land wrote:

>Seems to be a common misconception that 64bits just increases the amount of
>addressable memory.  

The only common misconception is that swapping for a 64-bit processor
in a desktop PC will lead to a large performance increase. It doesn't.
(Other than any gain from a higher clock speed, of course.)

Like to make a guess as to the extra overhead in a 64-bit version of
current OSs, btw?

>More importantly for most applications is that twice
>the data is moved or operated on per clock cycle.

Data is only data if it's meaningful. The use of 64-bit arithmetic
variables is comparatively rare in most applications. Certain
scientific and CAD packages do make heavy use of 64-bit floats, but I
doubt that's the case here (and high-end processors tend to use 80-bit
data paths around the FPU anyway). There's not a lot to be gained from
accessing memory in 64-bit chunks if you're only interested in 32 of
them (there is an effect on cache hits with vectors, but it's not
measurably worthwhile in practice).

There will be some effect on prefetch, but it depends on the state of
the L1 and L2 caches and the instruction pipeline(s) themselves. Tests
I've seen suggest an increase of memory bandwidth efficiency of only
around 1-2% at best.

If you want a 64-bitter to really earn it's corn, use it in something
like a database server with 64GB of RAM and a multi-TB disk farm. Give
the poor thing something *meaningful* to do with the extra 32 bits.
You'd still need 64-bit software though.

-- 
  Max

Reply by Max ●March 3, 20042004-03-03

On Tue, 2 Mar 2004 20:05:37 -0600, Kenneth Land wrote:

>
>On the disk speed issue I have one data point.  I upgraded my 1GHz PIII-M
>laptop drive from a slow 4200 RPM to the fastest 7200 RPM available (for
>laptops) and my Nios system build went from about 16 min. to about 15 min.
>Not worth the pain and expense of swapping the drive.

Not in a low-spec machine like that, no. The options in a laptop are
limited, and there's no way to increase the disk controller bandwidth.
But the effect on a powerful workstation of installing a RAID with a
high-bandwidth controller and drives such as U-320 SCSI can have a
dramatic impact. As always though, it depends on the application.

>On memory, I upgraded the memory in my 3.2 GHz P4 from 512 to 1GB and there
>was no noticable difference until I set the memory from 333MHz to 400MHz
>dual channel.  Then my system build went from 5 min. to 4 min. - 20%.

That doesn't mean a lot. You only need to add more memory if you're
running out of it ;o) 

-- 
  Max

Reply by ●March 3, 20042004-03-03

Max <mtj2@btopenworld.com> writes:

> If you want a 64-bitter to really earn it's corn, use it in something
> like a database server with 64GB of RAM and a multi-TB disk farm. Give

Or running synthesis, place & route, static timing analysis etc. on an
ASIC design requiring 6GB RAM.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Reply by rickman ●March 3, 20042004-03-03

"Paul Leventis (at home)" wrote:
> 
> Hi Jim,
> 
> >   I think the OP was refering to the wider datapaths.
> > I don't know the cycle level details of the AMD or Intel 64 bit
> > but an obvious and simple speed gain can come from a wider HW fetch.
> > (even running < 64 bit opcodes ) and then a simple check if the next
> > opcode / next data value is in that block.
> 
> Yes, wider memory interfaces/cache data lines can help, but as you say, this
> is independent of op-code size.  If I recall correctly, AMD and Intel
> processors already fetch 64-bit blocks, but this may have been increased.
> The latest m/b chipsets for both families of processors use dual-channel DDR
> (128-bits wide) and so I would not be surprised if they've increased the
> size of fetches.
> 
> As vendors introduce 64-bit capable processors (such as Opteron), they often
> also enhance various aspects of the CPU architecture in ways that help both
> 32- and 64-bit code.  And while the 64-bitness of x86-64 may not matter much
> for speed, the doubling of the register files etc. could result in faster
> performance.
> 
> It's every computer engineers dream to be a processor architect, isn't it?
> :-)

We can all speculate about the relative merits of processor
enhancements, but these machines are very complex and the only real way
to tell what helps is to try it.  Since we are not all ancient Greeks
philosophizing in our armchairs, it would be a good idea to pick a
design and to run it on a few different workstations, hopefully
including an AMD64.  

I have always been surprised that the FPGA vendors don't put some effort
into evaluating platforms and releasing the results.  I know this can be
a bit of a can of worms, but every time I look at buying a new machine,
the first question I research is how fast it will run the FPGA design
software.  Then I am often trying to speculate on my own since I don't
have much info to go on.  

I seem to recall that there at least used to be some available info on
how much memory was needed to optimize run time as a function of part
size.  But I haven't seen new info on that in quite a while.  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Reply by Pete Fraser ●March 3, 20042004-03-03

"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:404603D6.AA20F818@yahoo.com...

> We can all speculate about the relative merits of processor
> enhancements, but these machines are very complex and the only real way
> to tell what helps is to try it.  Since we are not all ancient Greeks
> philosophizing in our armchairs, it would be a good idea to pick a
> design and to run it on a few different workstations, hopefully
> including an AMD64.
>
> I have always been surprised that the FPGA vendors don't put some effort
> into evaluating platforms and releasing the results.

I had assumed that had happened already. Silly me.

Perhaps we'll just buy an AMD machine and see what it does,
but I thought somebody might have tried that already.

Anybody know how solid the Quartus II 4.0 Linux port is?
I can't get an answer out of Altera.

Reply by Max ●March 3, 20042004-03-03

On Wed, 03 Mar 2004 04:47:19 GMT, Paul Leventis (at home) wrote:

>Provided the peak memory consumption of Quartus for the compilation in
>question is less than the amount of physical memory in the system,
>increasing the amount of memory will not help compile time.  For non-trivial
>designs, a Quartus compile will be most heavily influenced by CPU speed, and
>then by memory sub-system speed -- disk speed will have little influence.

I suspected that might be the case, but I wasn't quite sure.
I'm more used to programming language tools that use library files
extensively, where a fast disk system (or a big ramdisk) can give very
worthwhile speed gains.

Is there any possibility of making Quartus multi-threaded? That
strikes me as the most likely way to get a dramatic performance
increase, though I know it's not always easy to achieve with heuristic
apps.

>CAD tools process a lot of data.  I don't know if a Xeon (bigger cache) is
>much faster than a normal P4 (smaller cache), but I wouldn't be surprised if
>this were the case for the same reason that a Xeon processor is supposedly
>better for server applications -- bigger cache helps applications whose data
>set doesn't fit into the cache.

While the extra cache is important in itself, much of the performance
gain of the Xeon is also due to the greater degree of parallelism and
deeper prefetch lookahead, thus making better use of memory bandwidth
throughout.

-- 
  Max

Reply by ●March 3, 20042004-03-03

Max <mtj2@btopenworld.com> writes:

> Is there any possibility of making Quartus multi-threaded? That
> strikes me as the most likely way to get a dramatic performance
> increase, though I know it's not always easy to achieve with heuristic
> apps.

I would like to get see synthesis and place and route tools I could
run on a cluster of cheap PC's. I would be happy with less than linear
speedups, e.g. using a 16-node cluster to get a 8x speedup.

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Reply by Max ●March 3, 20042004-03-03

On 03 Mar 2004 18:46:34 +0100, Petter Gustad wrote:

>I would like to get see synthesis and place and route tools I could
>run on a cluster of cheap PC's. I would be happy with less than linear
>speedups, e.g. using a 16-node cluster to get a 8x speedup.

I doubt you'd get anywhere near. Trying to implement those algorithms
efficiently on the sort of loosely-coupled architecture you propose
would be nigh-on impossible. It's not easy on a single SMP box, but
it's doable.

A quad Xeon (8 x CPU) box would cost less than four single decent-spec
machines anyway.

-- 
  Max

Previous 123 4 Next

Need to speed up Stratix compiles.

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group