Hi everybody, after a long discussion about State Minimisation with ISE http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/35d66d3783725017/d282151c06a511ae?lnk=raot&hl=de#d282151c06a511ae I was looking for other tools supporting this feature. The result was disappointing. None of the FPGA synthesis tools seams to support this feature, while every tool was capable of extracting FSMs and creating safe-FSMs if desired. But none seems to support State Minimization. Anyone knows a good reason why? Or have I missed a tool? (exept design compiler, which is for ASICS) Best regards Eilert
FSM State Minimization on FPGAs
Started by ●June 20, 2006
Reply by ●June 20, 20062006-06-20
How about: other than in academia, there's not a whole lot of use for the capability, and identifying redundant states (or chains of states) is not an easy task (read: extends run-time). I'd rather they work on something that benefits more users... Andy backhus wrote:> Hi everybody, > > after a long discussion about State Minimisation with ISE > > http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/35d66d3783725017/d282151c06a511ae?lnk=raot&hl=de#d282151c06a511ae > > I was looking for other tools supporting this feature. > The result was disappointing. > None of the FPGA synthesis tools seams to support this feature, while > every tool was capable of extracting FSMs and creating safe-FSMs if > desired. But none seems to support State Minimization. > > > Anyone knows a good reason why? > Or have I missed a tool? (exept design compiler, which is for ASICS) > > Best regards > Eilert
Reply by ●June 20, 20062006-06-20
On 20 Jun 2006 05:56:06 -0700, "Andy" <jonesandy@comcast.net> wrote:>How about: other than in academia, there's not a whole lot of use for >the capability, and identifying redundant states (or chains of states) >is not an easy task (read: extends run-time).I'm glad someone else said that before me :-) After the long and intriguing thread on state minimisation, I had been thinking about various state machines - trivial and not so trivial - that I've designed over the past few years; and I can't think of even one where automatic state merging would have been helpful. The kind of "sequence recogniser" state machine that was used as an example isn't very realistic, is it? Have I missed something important? -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
Reply by ●June 20, 20062006-06-20
backhus <nix@nirgends.xyz> wrote:>after a long discussion about State Minimisation with ISE >I was looking for other tools supporting this feature. >The result was disappointing. >None of the FPGA synthesis tools seams to support this feature, while >every tool was capable of extracting FSMs and creating safe-FSMs if >desired. But none seems to support State Minimization. >Anyone knows a good reason why?About the only times I've had a state machine that could been minimized I would have been looking for the switch to turn off state minimization, if that was implemented. Suppose I had a statemachine that needed to run at close to full FPGA speed. This implies that the design needs to have a single level of logic between registers. With N-input LUTs, then a given state can have only N (input states + conditions). By splitting states, then a faster clock speed and functional identical state machine can be produced. As an example, suppose I was designing for 4-input LUTs, and I had two input states, and two condition bits for a state. One-hot is assumed. The input logic to state3 would be: state3 :: (state1, state2, (state3, c1, c2)) This is five inputs, which would require two levels of logic with 4-LUTs. I could improve speed by splitting state 3 into two states, state3a and state3b: state3a :: (state1, (state3a, c1, c2)) state3b :: (state2, (state3b, c1, c2)) Any output equations and next state equations would have state3 replaced with (state3a or state3b). A "state minimization feature" would collapse these states back into state3, making the FPGA run slower. There are some similar state duplication tactics that can be done with binary coded statemachines. Adding states can be useful with slower and more complex machines as well, by replacing a state that is too complex to work at the target clock speed with two states that are less complex, and (hopefully) will work at the target clock speed. -- Phil Hays (Xilinx, but speaking for himself)
Reply by ●June 20, 20062006-06-20
backhus wrote:> None of the FPGA synthesis tools seams to support this feature, while > every tool was capable of extracting FSMs and creating safe-FSMs if > desired. But none seems to support State Minimization. > Anyone knows a good reason why?1. Such minimization is risky, rarely applicable and can consume lots of time. 2. State machines consume a very small percentage of fpga resources in most designs. 3. Making Fmax is often a more compelling problem given the capacity trends in FPGAs. -- Mike Treseler> Or have I missed a tool? (exept design compiler, which is for ASICS)I don't think so. Is this reduction on by default in design compiler? -- Mike Treseler
Reply by ●June 20, 20062006-06-20
backhus wrote:> Hi everybody, > > after a long discussion about State Minimisation with ISE > > http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/35d66d3783725017/d282151c06a511ae?lnk=raot&hl=de#d282151c06a511ae > > > I was looking for other tools supporting this feature. > The result was disappointing. > None of the FPGA synthesis tools seams to support this feature, while > every tool was capable of extracting FSMs and creating safe-FSMs if > desired. But none seems to support State Minimization. > > > Anyone knows a good reason why? > Or have I missed a tool? (exept design compiler, which is for ASICS)You need to define what/where you expect Minimize ?. Johnathon makes a good point, that state merge minimize (which seems to be your main focus?) is not a common problem. I can see risks, and we already suffer from tools that are 'too clever by half' - if the tools missed that the merge states were being used somewhere, you would have a nasty bug. What about a time-keeper state : one that appears empty/redundant to a logic parser, but is actually needed for timing - should we let the tools loose on those ? There are also many different State Minimize yardsticks : This week, I have a state engine that uses a Johnson counter, and found (by the eyeball inspection optimize method) that extending (rather than merge) the state decode, reduced the decode logic overhead => higher packing density. In this case, that mattered - more often, it is a don't care. Johnson was also better overall packing, than Gray or Binary, but I'm not sure I'd expect the tools to be able to make those calls. I suppose a tool that took a state engine, and spawned a huge number of possible solutions and then reported the six best, could be usefull ? -jg
Reply by ●June 20, 20062006-06-20
Jonathan Bromley wrote:> On 20 Jun 2006 05:56:06 -0700, "Andy" <jonesandy@comcast.net> wrote: > >is not an easy task (read: extends run-time).Unless you're doing guided P/R, synthesis run-time is typically a smaller fraction ( < 20% ? ) of total build time for the bitstream. And to my way of thinking, if the synth tool implements state minimization properly, it would be a user selectable option for each FSM, just as XST currently allows selection of encoding algorithm for each FSM. So the effort need be expended only on those FSMs that benefit from it.> been helpful. The kind of "sequence recogniser" state machine > that was used as an example isn't very realistic, is it?I thought it was very plausible as a _simplified_ example. Sequence recognizers are important in all sorts of coding schemes that are used today (MPEG/JPEG etc.), which contain specific markers delineating starts of blocks. I took a good hard look at Eilert's example, and could not come up with a circuit that beats the 3 LUT + 3 FF solution (there's a challenge). And the technique is certainly not limited to sequence recognizers, but to any sort of FSM which may have equivalent states. If state collapsing provides the smallest/fastest circuit in some cases, I think it would be very nice to have it as a synthesis option. Would anyone _not_ want it as an option? Further ISE results: I stood Eilert's LSB-first circuit on its head, and tried the MSB-first version. Before state reduction: 15 states: Auto and Sequential (produced same netlist): FFs / LUTs / est Fmax: 4 / 9 / 400 MHz (2 LUTs in front of each FF, and 1 LUT decoding 2 FFs) Compact: FFs / LUTs / est Fmax: 4 / 5 / 536 MHz (1 LUT in front of each FF, and 1 LUT decoding all 4 FFs) After state reduction: 8 states: Auto: FFs / LUTs / est Fmax: 3 / 3 / 553 MHz ( 1 LUT in front of 2 FFs, direct input to 3rd FF, and Compact / Sequential / Gray: FFs / LUTs / est Fmax: 3 / 4 (!) / 553 MHz (1 LUT in front of each FF, and 1 LUT decoding all 3 FFs) The reason posted these ( and I hope they haven't spoiled any of Eilert's lab exercises) was because I was surprised to see the "Auto" encoding produce better results than the "Compact" encoding. Know your tools! Best Regards, John
Reply by ●June 20, 20062006-06-20
JustJohn wrote:> If state collapsing provides the smallest/fastest circuit in some > cases, I think it would be very nice to have it as a synthesis option. > Would anyone _not_ want it as an option?The option would be fine. Paying $10,000 for it wouldn't be. -- Mike Treseler
Reply by ●June 20, 20062006-06-20
JustJohn wrote: <snip>> If state collapsing provides the smallest/fastest circuit in some > cases, I think it would be very nice to have it as a synthesis option. > Would anyone _not_ want it as an option?That would be usefull, but it should also have the ability to include some defined state-decode in the 'optimise bucket', so you get smallest size/best speed of the machine, and critical decode paths. In my most recent FSM, I coded a johnson ctr ( not an option in XST ?), and there were decode savings on the optional mapping of the unused/unmapped states. What could be usefull, is a tool that simply reports wasted/merge-able states.> > Further ISE results: > I stood Eilert's LSB-first circuit on its head, and tried the MSB-first > version. > > Before state reduction: 15 states: > Auto and Sequential (produced same netlist): > FFs / LUTs / est Fmax: 4 / 9 / 400 MHz > (2 LUTs in front of each FF, and 1 LUT decoding 2 FFs) > Compact: > FFs / LUTs / est Fmax: 4 / 5 / 536 MHz > (1 LUT in front of each FF, and 1 LUT decoding all 4 FFs) > > After state reduction: 8 states: > Auto: > FFs / LUTs / est Fmax: 3 / 3 / 553 MHz > ( 1 LUT in front of 2 FFs, direct input to 3rd FF, and > Compact / Sequential / Gray: > FFs / LUTs / est Fmax: 3 / 4 (!) / 553 MHz > (1 LUT in front of each FF, and 1 LUT decoding all 3 FFs) > > The reason posted these ( and I hope they haven't spoiled any of > Eilert's lab exercises) was because I was surprised to see the "Auto" > encoding produce better results than the "Compact" encoding. Know your > tools!In one case, but not the other ? -jg
Reply by ●June 20, 20062006-06-20
Jim Granville wrote:> JustJohn wrote: > > The reason posted these ( and I hope they haven't spoiled any of > > Eilert's lab exercises) was because I was surprised to see the "Auto" > > encoding produce better results than the "Compact" encoding. Know your > > tools! > > In one case, but not the other ?Yes - for the LSB first "10-15 detector" FSM w/ pre-encoding state reduction, Compact encoding gave 3 LUTs Auto encoding gave 4 LUTs With the MSB first, Compact encoding gave 4 LUTs Auto encoding gave 3 LUTs I don't mind that the results are not exactly as advertised, as long as I know to "expect the unexpected". John






