Hi all, I have a design (written in VHDL) targetting the Spartan 6 series, and it's oversubscribed for LUTs. Can anyone recommend good resources to read? I've already spent a little time looking around the design in ISE's schematic viewer, but with tens of thousands of LUTs it's not exactly a fast process going from that angle, and if possible I'd rather avoid getting into a lot of explicit instantiation of primitives. I've already read the ISE documentation on how to write expressions that the synthesizer can recognize as particular patterns, but unfortunately most of my design is just brute-force combinational logic (a lot of basic boolean operations and additions on fairly wide values) arranged into a pipeline, so the special patterns don't really apply (I don't have counters, or RAMs, or shift registers, or what have you). This is with ISE 13.1, in case it matters, the most recent AFAIK. I do have the option of moving to a larger chip if necessary, but would strongly prefer not to as the one I'm using is the largest supported by WebPack. I've looked at chips in other families, and WebPack seems to top out at similar LUT counts in all the families. Thanks! Chris
Area Optimization
Started by ●June 11, 2011
Reply by ●June 11, 20112011-06-11
Christopher Head <chead@is.invalid> wrote:> I have a design (written in VHDL) targetting the Spartan 6 series, and > it's oversubscribed for LUTs. Can anyone recommend good resources to > read? I've already spent a little time looking around the design in > ISE's schematic viewer, but with tens of thousands of LUTs it's not > exactly a fast process going from that angle, and if possible I'd > rather avoid getting into a lot of explicit instantiation of primitives.> I've already read the ISE documentation on how to write expressions > that the synthesizer can recognize as particular patterns, but > unfortunately most of my design is just brute-force combinational logic > (a lot of basic boolean operations and additions on fairly wide values) > arranged into a pipeline, so the special patterns don't really apply (I > don't have counters, or RAMs, or shift registers, or what have you).One of the tricks, which I don't believe the the tools will do automatically, is use the BRAMs in place of logic. That is, use a BRAM as a big look-up table. Since BRAMs are synchronous, you have to fit it in with your pipeline logic, but that shouldn't be so hard to do. -- glen
Reply by ●June 11, 20112011-06-11
Chris Are a pile of techniques that can reduce size and a lot depends on the original HDL design and coding style. We do this as one of our services and I have seen designs reduced to 40% of the original design in some extreme cases. Obtaining a 20% reduction to 80% of the original is more typical. As with any engineering prpblem the first thing to do is to identify where your problem might be. I would typically use Floorplanner to identify which modules in your design are the largest. The largest is probably got the most chance of giving you most. On the simple level try speed and area driven synthesis. Area mode does not always give the smallest result. You can also use choice of sythesisers to get different results if you have those available to you. Typically you might get 5-10% out of these techniques but I have seen some extreme sythesiser results giving a X3 variation on some logic. One other thing on synthesis that can make a reasonable difference is the setting for you state machine encoding. Try playing with different settings. If the XST switch isn't broken again try anything but One Hot encoding. XST programmers have a fixation for One Hot encoding and it one gives the best results in less than 25% of designs. Moving to the next level and much more extreme is to look at your HDL. Here you can look for shift registers that can go to SRL16/32 technology in Xilinx parts. That can save a lot. Old techniques like using illegial states in a state machine to reduce logic decoded can also be beneficial. Other techniques like using RAM for multiple related registers may also get you a reduction. John Adair Enterpoint Ltd. - Home of Drigmorn4. The Spartan-6 FPGA Embedded processor Board. On Jun 11, 6:54=A0am, Christopher Head <ch...@is.invalid> wrote:> Hi all, > I have a design (written in VHDL) targetting the Spartan 6 series, and > it's oversubscribed for LUTs. Can anyone recommend good resources to > read? I've already spent a little time looking around the design in > ISE's schematic viewer, but with tens of thousands of LUTs it's not > exactly a fast process going from that angle, and if possible I'd > rather avoid getting into a lot of explicit instantiation of primitives. > > I've already read the ISE documentation on how to write expressions > that the synthesizer can recognize as particular patterns, but > unfortunately most of my design is just brute-force combinational logic > (a lot of basic boolean operations and additions on fairly wide values) > arranged into a pipeline, so the special patterns don't really apply (I > don't have counters, or RAMs, or shift registers, or what have you). > > This is with ISE 13.1, in case it matters, the most recent AFAIK. > > I do have the option of moving to a larger chip if necessary, but would > strongly prefer not to as the one I'm using is the largest supported by > WebPack. I've looked at chips in other families, and WebPack seems to > top out at similar LUT counts in all the families. > > Thanks! > Chris
Reply by ●June 11, 20112011-06-11
On Jun 11, 1:54=A0am, Christopher Head <ch...@is.invalid> wrote:> Hi all, > I have a design (written in VHDL) targetting the Spartan 6 series, and > it's oversubscribed for LUTs. Can anyone recommend good resources to > read? I've already spent a little time looking around the design in > ISE's schematic viewer, but with tens of thousands of LUTs it's not > exactly a fast process going from that angle, and if possible I'd > rather avoid getting into a lot of explicit instantiation of primitives. > > I've already read the ISE documentation on how to write expressions > that the synthesizer can recognize as particular patterns, but > unfortunately most of my design is just brute-force combinational logic > (a lot of basic boolean operations and additions on fairly wide values) > arranged into a pipeline, so the special patterns don't really apply (I > don't have counters, or RAMs, or shift registers, or what have you). > > This is with ISE 13.1, in case it matters, the most recent AFAIK. > > I do have the option of moving to a larger chip if necessary, but would > strongly prefer not to as the one I'm using is the largest supported by > WebPack. I've looked at chips in other families, and WebPack seems to > top out at similar LUT counts in all the families. > > Thanks! > ChrisHave you turned on the area optimization control? Most synthesizers have a trade off between speed and area. Most of the time they seem to default to optimizing for speed. That can easily get you 10% in most designs. As to techniques, first you need to find out where your LUTs are being used. Rather than using tools for that, compile your code one module at a time or in smaller groups of modules. I usually code from the bottom up and test every module in the simulator. So it is not hard to also do a compile and see how large each one is. Then you might be able to see which ones are larger than you expect and can look at how to improve them. Rick
Reply by ●June 12, 20112011-06-12
>Hi all, >I have a design (written in VHDL) targetting the Spartan 6 series, and >it's oversubscribed for LUTs. Can anyone recommend good resources to >read?Xilinx white paper WP231 is a good read. It is mainly for speed but shows why doing things like using an asynchronous reset is a really bad plan for both speed and area. If you really don't care about speed then have you considered converting your parallel data paths into serial? Serial adders are really really small. John Eaton --------------------------------------- Posted through http://www.FPGARelated.com
Reply by ●June 13, 20112011-06-13
Christopher Head <chead@is.invalid> writes:> Hi all, > I have a design (written in VHDL) targetting the Spartan 6 series, and > it's oversubscribed for LUTs. Can anyone recommend good resources to > read? I've already spent a little time looking around the design in > ISE's schematic viewer, but with tens of thousands of LUTs it's not > exactly a fast process going from that angle, and if possible I'd > rather avoid getting into a lot of explicit instantiation of primitives.Have you first established which parts of you design are responsible for the most LUT usage? If not, I wrote FPGAOptim when I was in a similar situation to help with just that: http://www.conekt.co.uk/capabilities/49-fpga-optim Drop me an email via that webpage and I'll get a download link to you. Alternatively, these days Planahead can provide a view on LUT usage, and the logfiles also have some information. Once you know which blocks to optimise, you've had good answers from others already. In my most recent case (a video processing application) there's sections of code which only have to update once per video line - they are prime targets for resource sharing. As John Adair said, reducing by 20% is usually easily doable. With deep knowledge of what's going on and the tradeoffs that are acceptable, I've achieved 40-50% in the past. Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.co.uk/capabilities/39-electronic-hardware
Reply by ●June 13, 20112011-06-13
On Jun 12, 7:22=A0pm, "jt_eaton" <z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote:> >Hi all, > >I have a design (written in VHDL) targetting the Spartan 6 series, and > >it's oversubscribed for LUTs. Can anyone recommend good resources to > >read? > > Xilinx white paper WP231 is a good read. It is mainly for speed but shows > why doing things like using an asynchronous reset is a really bad plan fo=r> both =A0speed and area. > > If you really don't care about speed then have you considered converting > your parallel data paths into serial? Serial adders are really really > small. > > John EatonHi John, Thanks for that pointer. I have always been a believer in using the async reset and now I see that this may not always be the best way to reset a design. But the devil is in the details. I wonder if this still applies to non-Xilinx designs? Rick
Reply by ●June 13, 20112011-06-13
>On Jun 12, 7:22=A0pm, "jt_eaton" ><z3qmtr45@n_o_s_p_a_m.n_o_s_p_a_m.gmail.com> wrote:> >Thanks for that pointer. I have always been a believer in using the >async reset and now I see that this may not always be the best way to >reset a design. But the devil is in the details. I wonder if this >still applies to non-Xilinx designs? > >Rick >It applies it all designs. Designers who started their careers with asynchronous logic carried it with them when Design for Synthesis and synchronous design became a requirement but it has never been the best choice. Many designers make the mistake of thinking that because they need an asynchronous reset system that they must design it using asynchronous logic. That is simply not true. We design synchronous systems that are black box equivalent to asynchronous systems all the time. The main thing that you need to realize about reset system design is that the purpose of the reset system is not to reset the system when a trigger event occurs. It's purpose is to NOT reset the system when a trigger event is NOT occuring. The same is true for airbag controllers.The job of an airbag controller is not to deploy the bag when the car is in a accident, it's job is to not deploy the bag when the car is not having an accident. Any system where the expected number of uses is small and the effects of the usage is large will follow this rule. Remember the 1st StarWars movie? They built DeathStar with an emergency exhaust port that provided a direct path from the reactor core to the surface. It was ray shielded but could not be particle shielded. Bad plan. An asynchronous reset has a direct path from a pad into every flip-flop in the entire chip. It is analog shielded but not digitally shielded. Bad plan. Resets in a real product (not a simulation) are really rare events. If a reset is delayed by 20 microseconds then nobody will notice. If a product that you are using suddenly resets itself then you will likely notice. Spend a few hundred cycles on a digital filter before you do something drastic. John Eaton --------------------------------------- Posted through http://www.FPGARelated.com
Reply by ●June 14, 20112011-06-14
Reply by ●June 14, 20112011-06-14
>John >the best is to design to never reset ! >You can create a design that will work with no resets at all. The problem is that the verification suite will take a few eons to finish. John --------------------------------------- Posted through http://www.FPGARelated.com





