Reply by June 29, 20052005-06-29
For those reading this thread, Richard sent us the t80 archive (thanks
Richard), so we could investigate this. The short answer is that the
core didn't have a timing constraint set, so recent versions of Quartus
(4.1 and later) just work to achieve routability, and do not fully
optimize its timing. Setting an aggressive clock period constraint
dramatically speeds up the core when run through Quartus.

Details:

I compiled the t80a core in Quartus II 5.0 SP1, and
achieved 42.74 MHz, which matches Richard's result.  However, I noticed
that
there are no timing requirements set -- in that case Quartus will
try to compile the design as fast as possible, and will not fully
optimize the design for timing.

I went to Assignments->Timing Settings and set "Default required Fmax"
to 100 MHz.  Then I recompiled.

With that assignment, Quartus achieves a frequency of 75.53 MHz for
this
design.

It is a general rule that you should set an aggressive (unachievable)
timing assignment when you want to see how fast a design can go.
Alternatively, you can choose Settings->Fitter Settings->Standard Fit,
which essentially makes the most common type of unachievable timing
requirement (Fmax on all clocks) automatically for you.

To get even more speed, you can also turn on physical synthesis (all
options)
under Assignments->Settings->Physical Synthesis Optimizations. On this
design, turning physical synthesis on (+ having a 100 MHz default Fmax)
yields a speed of 83.43 MHz.

Best regards,

Vaughn Betz
A;tera
[v b e t z (at) altera.com]

Reply by Ben Popoola June 25, 20052005-06-25
Paul Leventis (at home) wrote:
> Hi Ben, > > >>The website I have discovered below has a comparism of opencore CPUs >>implemented on Altera Cyclone, Lattice ECP and Actel ProASIC 3 devices >> >>http://www.fpga.ch/ipcores/results.php > > > Nice results (speaking as an Altera guy :-)). I don't agree with the > author's hypothesis that Synplify is the difference -- if Synplify were used > for Cyclone as well, I don't think the conclusion would change. Synplify is > a great synthesis tool. > > And the results should get even better if some or all of the various > physical synthesis options were enabled in Quartus II. > > Neat link -- thanks. > > Paul Leventis > Altera Corp. > >
The lattice ispLever tool comes with 3 different synthesis tools: Leonardo Spectrum, Synplify and Precision RTL synthesis. Using Synplify and Precision RTL synthesis on the same VHDL code without applying rigorous timing constraints shows a significant increase in fmax in favour of the RTL synthesis tool. So perhaps the author has a point.
Reply by Paul Leventis (at home) June 25, 20052005-06-25
Hi Ben,

> The website I have discovered below has a comparism of opencore CPUs > implemented on Altera Cyclone, Lattice ECP and Actel ProASIC 3 devices > > http://www.fpga.ch/ipcores/results.php
Nice results (speaking as an Altera guy :-)). I don't agree with the author's hypothesis that Synplify is the difference -- if Synplify were used for Cyclone as well, I don't think the conclusion would change. Synplify is a great synthesis tool. And the results should get even better if some or all of the various physical synthesis options were enabled in Quartus II. Neat link -- thanks. Paul Leventis Altera Corp.
Reply by Ben Popoola June 25, 20052005-06-25
Jedi wrote:
> Hello.. > > Is this normal that same core which performs well > for Altera Cyclone device can only run at half speed > on a LFEC20-5 device? > > Tried with several CPU cores from opencores.org > and LAttice LFEC20 shows mostly half the performance > as Cyclone... > > > rick
I cannot comment directly on your comparism as I have not performed the tests myself. As a comment however I would say that FPGA architectures are designed with certain characteristics in mind that may benefit certain coding styles and not the other. This is the reason why most FPGA vendors have a coding style guide to compliment their silicon. Without modifying off-the-shelf code to suit a particular FPGA it is very difficult to make a chalk and cheese comparism. The website I have discovered below has a comparism of opencore CPUs implemented on Altera Cyclone, Lattice ECP and Actel ProASIC 3 devices http://www.fpga.ch/ipcores/results.php Hope this helps Ben
Reply by Paul Leventis (at home) June 22, 20052005-06-22
Hi Rick,

> Hmm..actually t80 performance degraded continiously > in Altera Quartus since version 4.1 with same > standard settings (and no automatic RAM block placing). > Same is true for other similar CPU cores as well...
We do not observe this degradation. Which family are you compiling to, are you using (tight) timing constraints, and do you have any data you can post? Regards, Paul
Reply by Jedi June 20, 20052005-06-20
Jedi wrote:
> Luc wrote: > >> Rick, >> >> I can't speak for LatticeEC in specific, but I know that some >> designers tend to write their VHDL very specific for one family. Than >> it will be hard to get the same performance from another device. >> >> I.e. does the compiled design make use of the IO cell? Switching this >> option of can save quite some time (Clock to Out). >> >> Regards, >> >> Luc > > > Actually I test with an out-of-the-box t80 design... > > I know that Altera Quartus does some good job in > using RAM blocks instead of registers automatically > since version 4.1 or 4.2 8and old 2.2 I think) whereas > the backend tools in ispLever and Actel Libero don't. >
Hmm..actually t80 performance degraded continiously in Altera Quartus since version 4.1 with same standard settings (and no automatic RAM block placing). Same is true for other similar CPU cores as well... rick
Reply by Jedi June 20, 20052005-06-20
Luc wrote:
> Rick, > > I can't speak for LatticeEC in specific, but I know that some > designers tend to write their VHDL very specific for one family. Than > it will be hard to get the same performance from another device. > > I.e. does the compiled design make use of the IO cell? Switching this > option of can save quite some time (Clock to Out). > > Regards, > > Luc
Actually I test with an out-of-the-box t80 design... I know that Altera Quartus does some good job in using RAM blocks instead of registers automatically since version 4.1 or 4.2 8and old 2.2 I think) whereas the backend tools in ispLever and Actel Libero don't. A simple comparison would be to use a small binary counter and see how fast they can go... rick
Reply by Luc June 20, 20052005-06-20
Rick,

I can't speak for LatticeEC in specific, but I know that some
designers tend to write their VHDL very specific for one family. Than
it will be hard to get the same performance from another device.

I.e. does the compiled design make use of the IO cell? Switching this
option of can save quite some time (Clock to Out).

Regards,

Luc

On Mon, 20 Jun 2005 18:57:31 GMT, Jedi <me@aol.com> wrote:

>cas7406@yahoo.com wrote: >> have set the respective timing constraints? ispLEVER P&R tool is a >> timing driven tool. >> > >No...as I don't see the point in doing so when under default >settings Lattice LFEC is at least 50 % slower... > > >rick
Reply by Jedi June 20, 20052005-06-20
cas7406@yahoo.com wrote:
> have set the respective timing constraints? ispLEVER P&R tool is a > timing driven tool. >
No...as I don't see the point in doing so when under default settings Lattice LFEC is at least 50 % slower... rick
Reply by cas7...@yahoo.com June 20, 20052005-06-20
have set the respective timing constraints? ispLEVER P&R tool is a
timing driven tool.

rgds,

c

Jedi wrote:
> Hello.. > > Is this normal that same core which performs well > for Altera Cyclone device can only run at half speed > on a LFEC20-5 device? > > Tried with several CPU cores from opencores.org > and LAttice LFEC20 shows mostly half the performance > as Cyclone... > > > rick