FPGARelated.com
Forums

Cylone Problem with Large Shift Register

Started by John December 3, 2004
I have been running a shift register design through a web version of Quartus
4.1 (SP2). Depending on the size of shift register either the tools don't
complete (I waited 30 mins and gave up) or on smaller shifts of 720 I get a
design that is a large size and it takes a long time to implement.

Has anyone else seen this problem ? Or know of any tool switches that need
set to solve this ?

I have selected large enough Cyclone part and before anyone asks I am
running a reasonable machine. An Athlon64 3000 with 512 MByte of memory for
those that want the detail. I have run the same design (large version) on
Spartan-3 / ISE and it less than 3 minutes to do the same.

John


"John" <placename@remove_fpga_people.co.uk> wrote in message news:<1102069745.41459.0@iris.uk.clara.net>...
> I have been running a shift register design through a web version of Quartus > 4.1 (SP2). Depending on the size of shift register either the tools don't > complete (I waited 30 mins and gave up) or on smaller shifts of 720 I get a > design that is a large size and it takes a long time to implement. > > Has anyone else seen this problem ? Or know of any tool switches that need > set to solve this ? > > I have selected large enough Cyclone part and before anyone asks I am > running a reasonable machine. An Athlon64 3000 with 512 MByte of memory for > those that want the detail. I have run the same design (large version) on > Spartan-3 / ISE and it less than 3 minutes to do the same. > > John
John, We would like to investigate this further and help you. It would help if you would send me the source that you used or post it here. Thanks - Subroto Datta Altera Corp.
720 stages shift register need 45 Xilinx CLBs; but how many Altera LEs ?

Walter.


"Subroto Datta" <sdatta@altera.com> a &#4294967295;crit dans le message de
news:ca4d800d.0412031630.2e98268a@posting.google.com...
> "John" <placename@remove_fpga_people.co.uk> wrote in message
news:<1102069745.41459.0@iris.uk.clara.net>...
> > I have been running a shift register design through a web version of
Quartus
> > 4.1 (SP2). Depending on the size of shift register either the tools
don't
> > complete (I waited 30 mins and gave up) or on smaller shifts of 720 I
get a
> > design that is a large size and it takes a long time to implement. > > > > Has anyone else seen this problem ? Or know of any tool switches that
need
> > set to solve this ? > > > > I have selected large enough Cyclone part and before anyone asks I am > > running a reasonable machine. An Athlon64 3000 with 512 MByte of memory
for
> > those that want the detail. I have run the same design (large version)
on
> > Spartan-3 / ISE and it less than 3 minutes to do the same. > > > > John > > John, > We would like to investigate this further and help you. It would > help if you would send me the source that you used or post it here. > > Thanks > - Subroto Datta > Altera Corp.
I've slightly modified it and the 720 version isn't bad now but run times on
the following setup are still bad. It looks a bit pointless as a design but
as you can probably guess it was aimed at getting a real, in the field,
power consumption.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;


entity TEST_POWER is
    Port ( OUT_LINES : out std_logic_vector(260 downto 1);
           CLOCK : in std_logic;
           RESET : in std_logic);
end TEST_POWER;

architecture a0 of TEST_POWER is

CONSTANT VEC_SIZE : INTEGER:=7150;
CONSTANT IOSIZE   : INTEGER:=260;



SIGNAL XOR_INPUT : STD_LOGIC_VECTOR(VEC_SIZE DOWNTO 1);
SIGNAL SHIFT_REG : STD_LOGIC_VECTOR(VEC_SIZE DOWNTO 0);

begin

XORGEN : FOR I IN 1 TO VEC_SIZE GENERATE
BEGIN
 XOR_INPUT(I) <= SHIFT_REG(I) XOR SHIFT_REG(I-1);
END GENERATE;

 PROCESS(RESET,CLOCK)
 BEGIN
 IF (RESET = '1') THEN
  SHIFT_REG <= (OTHERS => '1');
 OUT_LINES <= (OTHERS => '0');
 ELSIF (CLOCK'EVENT AND CLOCK ='1') THEN
  SHIFT_REG <= XOR_INPUT & (NOT SHIFT_REG(0));
 OUT_LINES((IOSIZE - 1) downto 1) <= SHIFT_REG((IOSIZE - 1) DOWNTO 1);
 OUT_LINES((IOSIZE)) <= SHIFT_REG(VEC_SIZE);
 END IF;
 END PROCESS;

John

 end a0;
"Subroto Datta" <sdatta@altera.com> wrote in message
news:ca4d800d.0412031630.2e98268a@posting.google.com...
> "John" <placename@remove_fpga_people.co.uk> wrote in message
news:<1102069745.41459.0@iris.uk.clara.net>...
> > I have been running a shift register design through a web version of
Quartus
> > 4.1 (SP2). Depending on the size of shift register either the tools
don't
> > complete (I waited 30 mins and gave up) or on smaller shifts of 720 I
get a
> > design that is a large size and it takes a long time to implement. > > > > Has anyone else seen this problem ? Or know of any tool switches that
need
> > set to solve this ? > > > > I have selected large enough Cyclone part and before anyone asks I am > > running a reasonable machine. An Athlon64 3000 with 512 MByte of memory
for
> > those that want the detail. I have run the same design (large version)
on
> > Spartan-3 / ISE and it less than 3 minutes to do the same. > > > > John > > John, > We would like to investigate this further and help you. It would > help if you would send me the source that you used or post it here. > > Thanks > - Subroto Datta > Altera Corp.
It is only 45 CLBs if you don't use a reset. The point of this design was as
a power test, to fill a device, and subsequently get a genuine accurate
power reading.

Xilinx have a hugh advantage on shift registers with SRL16s which I believe
Altera can't easily mimic due to patent issues. Someone from Altera can tell
me if I am wrong in this.

The 720 version was done with the help of Wizards and not the code posted
elsewhere and came to about 1100 LEs. With a variation of the code posted it
started to get better.

John


"Walter Gallegos" <walter@chasque.apc.org> wrote in message
news:10r3bcj5p2pt259@news.supernews.com...
> 720 stages shift register need 45 Xilinx CLBs; but how many Altera LEs ? > > Walter. > > > "Subroto Datta" <sdatta@altera.com> a &#4294967295;crit dans le message de > news:ca4d800d.0412031630.2e98268a@posting.google.com... > > "John" <placename@remove_fpga_people.co.uk> wrote in message > news:<1102069745.41459.0@iris.uk.clara.net>... > > > I have been running a shift register design through a web version of > Quartus > > > 4.1 (SP2). Depending on the size of shift register either the tools > don't > > > complete (I waited 30 mins and gave up) or on smaller shifts of 720 I > get a > > > design that is a large size and it takes a long time to implement. > > > > > > Has anyone else seen this problem ? Or know of any tool switches that > need > > > set to solve this ? > > > > > > I have selected large enough Cyclone part and before anyone asks I am > > > running a reasonable machine. An Athlon64 3000 with 512 MByte of
memory
> for > > > those that want the detail. I have run the same design (large version) > on > > > Spartan-3 / ISE and it less than 3 minutes to do the same. > > > > > > John > > > > John, > > We would like to investigate this further and help you. It would > > help if you would send me the source that you used or post it here. > > > > Thanks > > - Subroto Datta > > Altera Corp. > >
> Xilinx have a hugh advantage on shift registers with SRL16s which I
believe
> Altera can't easily mimic due to patent issues. Someone from Altera can
tell
> me if I am wrong in this.
Hi John, I'd disagree that Altera is at any disadvantage on shift registers -- we just have a different approach. Building large shift registers out of LE registers is very inefficient in both area and power, and isn't the way we build them in Altera devices. Instead, large shift registers are automatically converted to RAM-based FIFOs. If you don't like to rely on automatic conversion, you can instantiate the altshift_taps megafunction yourself (it implements all sorts of RAM-based shift registers). I just coded up a 721-bit shift register in VHDL, and it takes 1 M4K RAM and 15 Logic Cells in Stratix -- vastly less area and power than the alternative of using 721 logic cell registers. This is also less area and power than the Xilinx SRL16 solution for large shift registers. For example, a 4096-bit shift register takes 1 M4K RAM and 17 Stratix Logic cells, which is a lot smaller than 256 SRL16s. In terms of power, the altshift_taps implementation results in only one entry in the RAM being read and one written each cycle (plus a small amount of switching in the FIFO pointer counters), instead of having each of 4096 registers toggle. I know in your case you're trying to make a high-power design to make a power measurement, but that is definitely not what most of our customers are trying to do, so the power efficiency of FIFOs is pretty compelling. Building clever structures like this out of RAM is why we have 3 different sizes of RAM, and lots of RAM, in our devices. The M512 lets us build moderate size shift registers efficiently, the M4K lets us build big ones efficiently, the MRAM lets us build very big shift registers efficiently, and the register cascade chain feature in the Stratix/StratixII lets us "recycle" unused registers (there are almost always a lot of registers left over in the FPGA devices, since most designs use more LUTs than registers) to build any small shift registers (e.g. 4-bit shift) needed. As for patents, Altera and Xilinx have cross-licensed their patent portfolios, so there's no patent barrier on this. However, for the reasons I've listed above we don't think SRL16 is compelling, so we've judged it not worth the area it adds to the LUT. Vaughn Altera v b e t z (at) altera.com [remove spaces and use proper @ to reach me]
It's amazing how everything can become a Xilinx vs Altera battle.
It seems to me that the original posting was not really looking for the
most compact solution. Both Altera and Xilinx can of course provide
RAM-based shift registers, and as long as you stay below 16K length,
the A and X solutions are indistinguishable.

But let me fix one bad misstatement:
It does of course take  45 SRL16s to implement a 720 bit shift
register, but these 45 SRL16s fit in lessthan six CLBs, since there are
eight LUTs in a CLB.
That takes less silicon area than any big RAM in either Altera or
Xilinx chips...

Peter Alfke

> It's amazing how everything can become a Xilinx vs Altera battle. > It seems to me that the original posting was not really looking for the > most compact solution.
Peter, If you check my post, you will see that I was replying to the poster's question of whether or not Altera was at a disadvantage on shift registers vs. Xilinx.
> Both Altera and Xilinx can of course provide > RAM-based shift registers, and as long as you stay below 16K length, > the A and X solutions are indistinguishable.
The area of a Xilinx BlockRAM (18 kbit RAM) is of course a lot larger than that of either an Altera M512 RAM (576 bits) or an Altera M4K RAM (4.5 kbit). All the extra RAM area beyond what the shift register needs is wasted, so having a variety of RAM sizes and a lot of smaller RAMs makes for a more efficient FIFO mapping of shift registers than having only a relatively small nuber of BlockRAMs of one size. Now, of course Xilinx has the SRL16's to implement smaller shift registers, so it's less crucial to have small RAMs around to build FIFOs for moderate size shift registers. The solutions are different, and you can argue about which is better, but they are certainly not indistinguisable.
> But let me fix one bad misstatement: > It does of course take 45 SRL16s to implement a 720 bit shift > register, but these 45 SRL16s fit in lessthan six CLBs, since there are > eight LUTs in a CLB. > That takes less silicon area than any big RAM in either Altera or > Xilinx chips...
An M4K RAM also takes less area than 6 CLBs, and can do a significantly bigger shift register than 720 bits. An M512 can't quite do 720 bits, (can do 512 bit shifts) and has an area that's somewhere in the 1 to 2 CLB range. So I don't agree with your argument that area is lower for SRL16's for this case. Vaughn Betz Altera v b e t z (at) altera.com [Remove spaces and put in proper @ to reach me] "Peter" <peter@xilinx.com> wrote in message news:1103074788.884558.87760@f14g2000cwb.googlegroups.com...
> It's amazing how everything can become a Xilinx vs Altera battle. > It seems to me that the original posting was not really looking for the > most compact solution. Both Altera and Xilinx can of course provide > RAM-based shift registers, and as long as you stay below 16K length, > the A and X solutions are indistinguishable. > > But let me fix one bad misstatement: > It does of course take 45 SRL16s to implement a 720 bit shift > register, but these 45 SRL16s fit in lessthan six CLBs, since there are > eight LUTs in a CLB. > That takes less silicon area than any big RAM in either Altera or > Xilinx chips... > > Peter Alfke >

Peter wrote:

> It's amazing how everything can become a Xilinx vs Altera battle. > It seems to me that the original posting was not really looking for the > most compact solution. Both Altera and Xilinx can of course provide > RAM-based shift registers, and as long as you stay below 16K length, > the A and X solutions are indistinguishable.
> But let me fix one bad misstatement: > It does of course take 45 SRL16s to implement a 720 bit shift > register, but these 45 SRL16s fit in lessthan six CLBs, since there are > eight LUTs in a CLB. > That takes less silicon area than any big RAM in either Altera or > Xilinx chips...
For the user, though, it is more complicated. If the RAMs are otherwise unused then they go to waste in the SRL16 case. If one is short on RAM resources, brand A may be at a disadvantage. In the beginning FPGA's had only one type of cell, and the only question was how many were requried. Now, one has to balance different designs based on the numbers of CLBs, RAMs, and anything else that may be added. Also, as pointed out in another post, different FPGAs may have different sized RAMs which may affect the optimal solutions to these problems. -- glen
Agreed. Years ago, when we offered unstructured "sandboxes" full of
LUTs and flip-flops, it was easy to benchmark and compare. Now all
FPGAs offer many features that are both more powerful and more
dedicated. This gives the user higher performance at a lower cost, but
it makes comparisons more complicated.

Users should take benchmarks published by any one of the competitors
with a big grain of salt, especially when they claim a vast superiority
of their own product. That is often Marketing at its worst.

If you are serious about evaluating X vs A, then look beyond the LUTs
and memories, dig deeper into the architecture, and investigate the
systems-oriented functions...
Peter Alfke