FPGARelated.com
Forums

Forking in One-Hot FSMs

Started by Kevin Neilson May 2, 2008
Having two bits hot in a one-hot FSM would normally be a bad thing.  But 
I was wondering if anybody does this purposely, in order to fork, which 
might be a syntactically nicer way to have a concurrent FSM.  This would 
imply that multiple states in the FSM could be active at once.  This 
would be an example:

parameter STATE1=1, STATE2=2, STATE33,... // state defs
casex (state)
...
if (state[STATE1]) begin
   if (condition)
     begin
       m             <= a*b;
       state[STATE3] <= 1;    // fork into 2 new states
       state[STATE4] <= 1;
       state[STATE1] <= 0;    // leave current state
     end
end
if (state[STATE3]) begin     // DSP48 Adder Stage
       p             <= m+c;
       state[STATE3] <= 0;    // this fork dies
end
if (state[STATE4]) begin
       m             <= a2*b2;
       state[STATE3] <= 1;    // fork into 2 new states
       state[STATE5] <= 1;
       state[STATE4] <= 0;    // leave current state
end

In this case I have a pipeline (as in a DSP48) which I can keep 
continuously fed.  A separate fork of the SM runs the pipeline.  I can 
turn on two one-hot bits (essentially ORing the states) to fork into 
multiple states.  One fork eventually kills itself.  This might be nicer 
than having a separate concurrent FSM.  There may be a better syntax 
that still allows a case statement.  I just wondered if this is a common 
or useful technique.
-Kevin
Kevin Neilson wrote:
> > parameter STATE1=1, STATE2=2, STATE3=3,... // state defs
> reg [31:0] state;
> if (state[STATE1]) begin > if (condition) > begin > m <= a*b; > state[STATE3] <= 1; // fork into 2 new states > state[STATE4] <= 1; > state[STATE1] <= 0; // leave current state > end > end > if (state[STATE3]) begin // DSP48 Adder Stage > p <= m+c; > state[STATE3] <= 0; // this fork dies > end > if (state[STATE4]) begin > m <= a2*b2; > state[STATE3] <= 1; // fork into 2 new states > state[STATE5] <= 1; > state[STATE4] <= 0; // leave current state > end >
Sorry; there was not supposed to be a case statement here.
But why not combine these two states into one states?and let that
states to do the pipline stuff?
Your coding may let your design slower and may not be implemented as
state machine in the final design.

On May 2, 2:54=A0pm, Kevin Neilson <kevin_neil...@removethiscomcast.net>
wrote:
> Kevin Neilson wrote: > > > parameter STATE1=3D1, STATE2=3D2, STATE3=3D3,... // state defs > > =A0> reg [31:0] state; > > > if (state[STATE1]) begin > > =A0 if (condition) > > =A0 =A0 begin > > =A0 =A0 =A0 m =A0 =A0 =A0 =A0 =A0 =A0 <=3D a*b; > > =A0 =A0 =A0 state[STATE3] <=3D 1; =A0 =A0// fork into 2 new states > > =A0 =A0 =A0 state[STATE4] <=3D 1; > > =A0 =A0 =A0 state[STATE1] <=3D 0; =A0 =A0// leave current state > > =A0 =A0 end > > end > > if (state[STATE3]) begin =A0 =A0 // DSP48 Adder Stage > > =A0 =A0 =A0 p =A0 =A0 =A0 =A0 =A0 =A0 <=3D m+c; > > =A0 =A0 =A0 state[STATE3] <=3D 0; =A0 =A0// this fork dies > > end > > if (state[STATE4]) begin > > =A0 =A0 =A0 m =A0 =A0 =A0 =A0 =A0 =A0 <=3D a2*b2; > > =A0 =A0 =A0 state[STATE3] <=3D 1; =A0 =A0// fork into 2 new states > > =A0 =A0 =A0 state[STATE5] <=3D 1; > > =A0 =A0 =A0 state[STATE4] <=3D 0; =A0 =A0// leave current state > > end > > Sorry; there was not supposed to be a case statement here.
Aiken wrote:
> But why not combine these two states into one states?and let that > states to do the pipline stuff? > Your coding may let your design slower and may not be implemented as > state machine in the final design. >
The example might not show this well, but you may want to fork from several different states and the length of the fork, before it dies, could be several states. So if I just have the single state, I would have to manually branch out through the state tree and figure out all states I could possibly be in while the "fork" would be operating and then add the fork logic to all those states. If that makes sense. Anyway, it's completely unmaintainable, because when you add a new state to the machine you would have to figure out if the pipeline is supposed to be full at that time and remember to add in the logic for that. -Kevin
Kevin,

> Having two bits hot in a one-hot FSM would normally be a bad thing. But I > was wondering if anybody does this purposely, in order to fork, which > might be a syntactically nicer way to have a concurrent FSM.
I wonder about this too. I am currently doing a pipeline and some code is shown below. So I wrote out the states without an array so when the ModelSim comes up I don't have to expand the states to see them. I also group signals that I want to see associated with the states in the declarative region so I don't have to futz too much with the ModelSim. Some states stay on longer with the state_count condition. I read one header record on state31 and follow it with three of another type of data record with state32. Now the "branching" happens because these states address a NoBL SRAM and there is a two cycle lag between the address and the data. Not show below, I also have clock delays on these states, state32_1, state32_2, and so on so when the address goes out on state32, I then have data to process on state32_2. In my Zen thinking about this, I always have a When state associated with every What. It actually gets deeper than that because there are FIFOs involved as well. You'll need FIFOs in your design if you are going to tackle a Sobel function. Here the trick is to start thinking about your processses starting from the READ data and figure out how many delays you need to deliver an answer, then figure out where the WRITE data should marry into the flow. I have now states out to _7. Perhaps someone could suggest a better term than state machine "forking"? And if there is some guidelines on how to code ane debug pipelined architecture. I'm with Kevin, it get's real messy, real soon. Brad Smallridge AiVision inner_cell_state_machine:process(clk) begin if(clk'event and clk='1')then inner_cell_restart <='0'; if(reset='1')then state30<='0'; state31<='0'; state32<='0'; state33<='0'; state34<='0'; state35<='0'; state36<='0'; state37<='0'; state38<='0';state39<='0'; inner_cell_rd_ad <= std_logic_vector(to_unsigned(inner_cell_start,18)); inner_cell_wr_ad <= std_logic_vector(to_unsigned(inner_cell_start,18)); state_count <= (others=>'0'); elsif(state29='1')then -- state29 automatically turns off from init_state_machine state30<='1'; --State30 Initial inner cell state elsif(state30='1')then state30<='0'; state31<='1'; state_count <= (others=>'0'); -- State31 Read the inner cell elsif(state31='1')then state31 <='0'; state32<='1'; inner_cell_rd_ad <= inner_cell_rd_ad+1; -- State32 Read the inner cell connections elsif(state32='1')then if(state_count=2)then state32<='0'; state33<='1'; state_count <= (others=>'0'); else state_count <= state_count+1; end if; inner_cell_rd_ad <= inner_cell_rd_ad+1; -- State33 Wait for SRAM to deliver first connection elsif(state33='1')then state33<='0'; state34<='1'; state_count <= (others=>'0'); -- State34 Read connection elsif(state34='1')then if(state_count=2)then state34<='0'; state35<='1'; state_count <= (others=>'0'); else state_count <= state_count+1; end if; . . .
Kevin Neilson wrote:
> Having two bits hot in a one-hot FSM would normally be a bad thing. > But I was wondering if anybody does this purposely, in order to fork, > which might be a syntactically nicer way to have a concurrent FSM.
DEC used that style of design in the PDP-16 Register Transfer Modules. Possibly also in the control units of some of their asynchronous processors such as the PDP-6 and KA10.
"Brad Smallridge" <bradsmallridge@dslextreme.com> wrote in message 
news:Y9MSj.9245$sd4.3805@fe109.usenetserver.com...
> Kevin, > >> Having two bits hot in a one-hot FSM would normally be a bad thing.
'One hot' is a particular implementation of a FSM, but from a logic perspective (i.e. how you go about designing your state machine, the states needed, the branching, etc.) means absolutely nothing.
>> But I was wondering if anybody does this purposely, in order to fork, >> which might be a syntactically nicer way to have a concurrent FSM.
'Concurrent' state machines though is simply another way of saying state machines that either are totally independent of one another, or only loosely connected (i.e. there is some signalling going on between the two, but you can *usually* futz with one without breaking the other). As I mentioned in more detail in my response on 'Style for Highly-Pipelined State Machines', I only really see two basic approaches: - The first is some form of counting or adding states where you have a known fixed number of states between 'doing this' and 'doing that'. This method works but it really quickly leads to rather complicated code that is difficult to understand and (because of the complexity) probably has some logic holes as well that may take some time to surface. In certain designs though this method is just fine and the results are fine and easy to maintain. The problem though is when the realization sets in that the code is getting out of control and how to manage it (which was the point of the other thread). - The second method uses request/acknowledge handshaking between the 'concurrent' state machines. This method scales very nicely from a design perspective and is just as efficient from an implementation perspective as well. Bottom line here is to realize that a 'fork in an FSM' is really a call to think of it as two separate state machines that have a communication/signalling requirement and don't try to force your mental model as being 'one' state machine. After all, the entire design can be considered to be a single state machine...but it is generally of no use to think of it that way from a design perspective. <snip>
> Now the "branching" happens because these states address > a NoBL SRAM and there is a two cycle lag between the > address and the data. Not show below, I also have clock > delays on these states, state32_1, state32_2, and so on > so when the address goes out on state32, I then have data > to process on state32_2. > > In my Zen thinking about this, > I always have a When state associated with every What. > > It actually gets deeper than that because there are FIFOs > involved as well. You'll need FIFOs in your design if > you are going to tackle a Sobel function. Here the trick > is to start thinking about your processses starting from > the READ data and figure out how many delays you need to > deliver an answer, then figure out where the WRITE data > should marry into the flow. I have now states out to _7. >
Try looking at it now from a somewhat different perspective. Let's say you have one state machine who's sole purpose is to generate read and write commands and addresses to the memory but not to process the data at all. In addition there is a second state machine who's sole purpose is to process the data that gets read back from memory and produce some sort of result, that maybe goes to memory, maybe goes somewhere else, it doesn't matter. If the 'address generator' state machine needs the results from some computation in order to procede on, then it sends a read request to the 'data processor' state machine, and waits until it gets the acknowledge. It may sound queer, but what that does is allows you to design with any sort of latency and does not require any apriori knowledge of how many clock ticks it will take to get that result back, the address generator state machine is waiting for the acknowledge back from the data processor state machine (which in turn is waiting for the data to come back from the memory). Now you could argue that the data processor state machine can't just *process data*, it most likely needs to know what it is supposed to be doing with it, and that knowledge likely lives in the address generator state machine. Fair enough, but all that means is that the address generator needs to be able to send commands over to the data processor. This could be done in an ad hoc manner by setting some signal (or more likely multiple signals) that are inputs to the data processor. This method works fine. You can also view this interface somewhat more abstractly as the data processor having a command port that is written to by the address generator....and again, a simple request/acknowledge handshake is all you need here as well. In many cases though the simple signal(s) is sufficient, but in other cases, the two state machines interact a bit more closely and a more well defined communication channel between the two will clear up a lot from the perspective of getting to a functional design that is easy to maintain.
> Perhaps someone could suggest a better term than state > machine "forking"?
Separate state machines that signal each other in some fashion.
> And if there is some guidelines on how > to code ane debug pipelined architecture. I'm with Kevin, > it get's real messy, real soon. >
Ponder a bit on breaking up things as I've suggested starting with something small and see how it comes out. It may take a bit to get used to, but the end result is smaller easier to understand and debug state machines (albeit more of them) that communicate over well defined internal interfaces. You'll also find that changes (like switching the Nobl SRAM to DRAM as an example) can be accomodated without having to change *everything*. Kevin Jennings
Brad Smallridge wrote:

 > Perhaps someone could suggest a better term than state
 > machine "forking"? And if there is some guidelines on how
 > to code and debug pipelined architecture. I'm with Kevin,
 > it gets real messy, real soon.

There is no requirement that a process/block must update only
a single register named 'State'.

When I look at large, textbook style state machine
examples, like the ones in this thread, I imagine
a much simpler process that updates several smaller registers.
Maybe an input reg, output reg, a couple of counters
and a few well-named booleans.

    -- Mike Treseler
> You'll also find that changes (like switching the Nobl SRAM to DRAM as an > example) can be accomodated without having to change *everything*.
That has been on my mind because there is a DRAM on my board. Not only will the DRAM require more cycles but perhaps too a varying number of cycles depending on the sequentiality or randomness of the addressing. I have seen controllers on the Xilinx site, but nothing, that talks about several ports, and how the hand shaking is handled. My FAE has said that some multiport examples are availble. Brad Smallridge AiVision
On May 5, 12:13=A0pm, "Brad Smallridge" <bradsmallri...@dslextreme.com>
wrote:
> > You'll also find that changes (like switching the Nobl SRAM to DRAM as a=
n
> > example) can be accomodated without having to change *everything*. > > That has been on my mind because there is a DRAM on my board. Not only > will the DRAM require more cycles but perhaps too a varying number of > cycles depending on the sequentiality or randomness of the addressing. >
Except for the most special case examples, DRAM access will be a variable delay because of page changes and memory refresh. Trying to design a state machine that is simply trying to *access* memory for some algorithmic purpose would likely result in a difficult to maintain design. Designing a request/acknowledge interface to some other process or entity (in this case the 'other' being a DRAM controller) results in a much easier to maintain design. Using the exact same interface signal functionality whether one is talking to internal FPGA memory, NoBL or SDRAM or SPI results in a design that can be reused, retargeted and improved upon if necessary. Using the same signal naming functionality as an existing documented specification (i.e. Avalon, Wishbone) allows others to (re)use your design without getting bogged down in details that they are not currently interested in and allows them (and you when you re-use the design) to be more productive. Figure out where you are and where you want to be in the design productivity chain. The synthesis cost in terms of logic resource is zero, the upfront learning cost will start to pay back in the form of quicker debug and reusable designs. Kevin Jennings