Forums

Remapping from Virtex-II to Virtex-4

Started by Lars January 4, 2006
Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I
wanted to do this to see how the new technology performed, mainly to
see if it was worth the trouble to upgrade some existing designs. We
did this quite successfully some years back, stepping from Virtex-E to
Virtex-II. The main obstacle then was the new size Block RAM going from
4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components
untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in
the chip was sufficient, all we had to do was to update the
LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed
to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant!

So I hoped it would be even better this time, since the Block RAMs are
the same size, but there seems to be more to this than meets the eye. I
commented out all the LOC-constraints in the ucf and had a go, after
resynthesizing to XC4VLX instead of XC2V. But alas, I get a fatal error
in MAP, complaining about SLICEL and SLICEM types of components. I
suspect that this has to do with some of our CoreLib components, since
they are the only place where there might be RLOC constraints in the
EDIF, but before I go and re-generate all these I am curious to know if
there is an easier way.

I am not out to squeeze the full performance out of the XC4VLX right
now, but would like a "ball-park" figure of what might be expected in
terms of utilization and speed, before we go ahead and commit to a
full-scale conversion. That is why I don't want to spend too much time.

Regards,
/Lars

One problem you have is that in Virtex-4 only half of the slices can support 
lut used as memory. In V2 all slices could be used. We have seen similar 
things in Spartan-3 particularly if you have used elements such as 32x1 ram. 
Alternative you may have tried to use a memory type lut where there isn't 
one due to using a RPM or constraint that simply isn't valid.

John Adair
Enterpoint Ltd. - Home of MINI-CAN. The Spartan-3 CAN Bus Development Board.
http://www.enterpoint.co.uk


"Lars" <larthe@gmail.com> wrote in message 
news:1136376416.676830.143490@g49g2000cwa.googlegroups.com...
> Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I > wanted to do this to see how the new technology performed, mainly to > see if it was worth the trouble to upgrade some existing designs. We > did this quite successfully some years back, stepping from Virtex-E to > Virtex-II. The main obstacle then was the new size Block RAM going from > 4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components > untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in > the chip was sufficient, all we had to do was to update the > LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed > to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant! > > So I hoped it would be even better this time, since the Block RAMs are > the same size, but there seems to be more to this than meets the eye. I > commented out all the LOC-constraints in the ucf and had a go, after > resynthesizing to XC4VLX instead of XC2V. But alas, I get a fatal error > in MAP, complaining about SLICEL and SLICEM types of components. I > suspect that this has to do with some of our CoreLib components, since > they are the only place where there might be RLOC constraints in the > EDIF, but before I go and re-generate all these I am curious to know if > there is an easier way. > > I am not out to squeeze the full performance out of the XC4VLX right > now, but would like a "ball-park" figure of what might be expected in > terms of utilization and speed, before we go ahead and commit to a > full-scale conversion. That is why I don't want to spend too much time. > > Regards, > /Lars >
Aha! I knew that, but the access to that particular memory cell in my
decaying brain was not operating at the time. That would make it hard
to re-target CoreLib components I suppose...

Thank's for setting me straight!
/Lars

Lars wrote:
> Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I > wanted to do this to see how the new technology performed, mainly to > see if it was worth the trouble to upgrade some existing designs. We > did this quite successfully some years back, stepping from Virtex-E to > Virtex-II. The main obstacle then was the new size Block RAM going from > 4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components > untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in > the chip was sufficient, all we had to do was to update the > LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed > to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant!
FOr the most part, a VirtexII design can be pretty much dropped into a virtex 4. You hit on one of the places you will have trouble: the slice M/slice L thing. The V4 CLB structure is substantially similar to the V2 structure except only even columns have the logic for LUT ram. Thus if you have an RPM with SRL16's or RAM16's placed in it, those have to go in even columns. There is also a bug in the mapper that causes problems if an RPM macro with memory elements straddles a BRAM or DSP column such it thinks that that any memory elements to the right of the DSP/BRAM column are in the wrong type of column even if they aren't. The work-around is to break the RPM up into smaller sub-RPMs that fit between the BRAM/DSP columns. The other place you will have difficulty is if you have instantiated MULT18x18 primitives in the design, as these have to be converted to DSP48's. With only one register like the Mult18x18s, you will be disappointed with the performance, but it will work with a 1:1 replacement. OK, so paying attention to these two issues will get your design into a Virtex4, but you won't reap the full benefit. You'll find the fabric carry chains are not any faster than the same speed grade (and in some cases are actually slower) V2. Also, the clock to output times on the BRAM without an added output register and unpipelined multiplier are not any faster. To get the performance promised, you need to turn on the pipelining in these elements so that the multiplier has a 3 clock pipeline (input, middle and output registers) and the BRAM a 2 clock pipeline (there is an added output register). The big gains in V4 for signal processing type stuff are had with the DSP 48 slice's adder, which is quite a bit faster than the fabric carry chains. Unfortunately, using it is basically a clean sheet redesign because you also need to use the pipeline registers there to get the speed. So in short, you can put your V2 design into V4 without a lot of effort, but you will likely be disappointed when it doesn't run any faster. In order to get the speed advantages, you need to redesign to the architecture.
Thank you Ray, that was a good summary! Seems like we have our work cut
out for us if we want the full potential, and I beleive I have to
re-think the usefulness of my original intent of a quick "ball-park
figure"...
/Lars

Hi

One thing more.
As I saw moment before,
DCM
- you can't use/set CLK_FEEDBACK="2X"
- only "1X" or "NONE"

regards

Jerzy Gbur