Lars wrote:
> Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I
> wanted to do this to see how the new technology performed, mainly to
> see if it was worth the trouble to upgrade some existing designs. We
> did this quite successfully some years back, stepping from Virtex-E to
> Virtex-II. The main obstacle then was the new size Block RAM going from
> 4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components
> untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in
> the chip was sufficient, all we had to do was to update the
> LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed
> to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant!

FOr the most part, a VirtexII design can be pretty much dropped into a 
virtex 4.  You hit on one of the places you will have trouble: the slice 
M/slice L thing.  The V4 CLB structure is substantially similar to the 
V2 structure except only even columns have the logic for LUT ram.  Thus 
if you have an RPM with SRL16's or RAM16's placed in it, those have to 
go in even columns.  There is also a bug in the mapper that causes 
problems if an RPM macro with memory elements straddles a BRAM or DSP 
column such it thinks that that any memory elements to the right of the 
DSP/BRAM column are in the wrong type of column even if they aren't. 
The work-around is to break the RPM up into smaller sub-RPMs that fit 
between the BRAM/DSP columns.

The other place you will have difficulty is if you have instantiated 
MULT18x18 primitives in the design, as these have to be converted to 
DSP48's.  With only one register like the Mult18x18s, you will be 
disappointed with the performance, but it will work with a 1:1 replacement.

OK, so paying attention to these two issues will get your design into a 
Virtex4, but you won't reap the full benefit.  You'll find the fabric 
carry chains are not any faster than the same speed grade (and in some 
cases are actually slower) V2.  Also, the clock to output times on the 
BRAM without an added output register and unpipelined multiplier are not 
any faster.  To get the performance promised, you need to turn on the 
pipelining in these elements so that the multiplier has a 3 clock 
pipeline (input, middle and output registers) and the BRAM a 2 clock 
pipeline (there is an added output register).

The big gains in V4 for signal processing type stuff are had with the 
DSP 48 slice's adder, which is quite a bit faster than the fabric carry 
chains.  Unfortunately, using it is basically a clean sheet redesign 
because you also need to use the pipeline registers there to get the speed.

So in short, you can put your V2 design into V4 without a lot of effort, 
but you will likely be disappointed when it doesn't run any faster.  In 
order to get the speed advantages, you need to redesign to the architecture.

Aha! I knew that, but the access to that particular memory cell in my
decaying brain was not operating at the time. That would make it hard
to re-target CoreLib components I suppose...

Thank's for setting me straight!
/Lars

One problem you have is that in Virtex-4 only half of the slices can support 
lut used as memory. In V2 all slices could be used. We have seen similar 
things in Spartan-3 particularly if you have used elements such as 32x1 ram. 
Alternative you may have tried to use a memory type lut where there isn't 
one due to using a RPM or constraint that simply isn't valid.

John Adair
Enterpoint Ltd. - Home of MINI-CAN. The Spartan-3 CAN Bus Development Board.
http://www.enterpoint.co.uk


"Lars" <larthe@gmail.com> wrote in message 
news:1136376416.676830.143490@g49g2000cwa.googlegroups.com...
> Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I
> wanted to do this to see how the new technology performed, mainly to
> see if it was worth the trouble to upgrade some existing designs. We
> did this quite successfully some years back, stepping from Virtex-E to
> Virtex-II. The main obstacle then was the new size Block RAM going from
> 4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components
> untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in
> the chip was sufficient, all we had to do was to update the
> LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed
> to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant!
>
> So I hoped it would be even better this time, since the Block RAMs are
> the same size, but there seems to be more to this than meets the eye. I
> commented out all the LOC-constraints in the ucf and had a go, after
> resynthesizing to XC4VLX instead of XC2V. But alas, I get a fatal error
> in MAP, complaining about SLICEL and SLICEM types of components. I
> suspect that this has to do with some of our CoreLib components, since
> they are the only place where there might be RLOC constraints in the
> EDIF, but before I go and re-generate all these I am curious to know if
> there is an easier way.
>
> I am not out to squeeze the full performance out of the XC4VLX right
> now, but would like a "ball-park" figure of what might be expected in
> terms of utilization and speed, before we go ahead and commit to a
> full-scale conversion. That is why I don't want to spend too much time.
>
> Regards,
> /Lars
>

Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I
wanted to do this to see how the new technology performed, mainly to
see if it was worth the trouble to upgrade some existing designs. We
did this quite successfully some years back, stepping from Virtex-E to
Virtex-II. The main obstacle then was the new size Block RAM going from
4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components
untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in
the chip was sufficient, all we had to do was to update the
LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed
to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant!

So I hoped it would be even better this time, since the Block RAMs are
the same size, but there seems to be more to this than meets the eye. I
commented out all the LOC-constraints in the ucf and had a go, after
resynthesizing to XC4VLX instead of XC2V. But alas, I get a fatal error
in MAP, complaining about SLICEL and SLICEM types of components. I
suspect that this has to do with some of our CoreLib components, since
they are the only place where there might be RLOC constraints in the
EDIF, but before I go and re-generate all these I am curious to know if
there is an easier way.

I am not out to squeeze the full performance out of the XC4VLX right
now, but would like a "ball-park" figure of what might be expected in
terms of utilization and speed, before we go ahead and commit to a
full-scale conversion. That is why I don't want to spend too much time.

Regards,
/Lars