FPGARelated.com
Forums

RISC implementation questions

Started by Patrick March 26, 2007
Hi there

I have some general question for implementing a general RISC
architecture.
I have coded so far the fetch, decode, execute and writeback stage.

1) Next step is to implement forwarding. Do I have here a 2:1
multiplexer in front of the alu that
takes as input the output of the alu of the former cycle and the
source register and the decode stage
then sets the multiplexer select signal ?

2) How is it working with a NOP instruction? Does there the alu
"execute" for example a R0 = R0 + R0. As R0 is always zero this doesnt
have any effect. Or is there somehow an additional signal from the
decode stage that tells the alu to do nothing?

3) Normally the writeback is done in the first half of the clock cycle
whereas the registers are read in the second half in the decode
register. Does this mean that the decode logic just works in the
second half of the clock cycle or does it do some stuff in the first
clock cycle and then just read out the operands in the second half of
the cycle?

I know some easy questions but would be helpful for understanding to
know this ;)

Cheers,
Patrick

"Patrick" <grabherp23@yahoo.de> wrote

> 1) Next step is to implement forwarding. Do I have here a 2:1 > multiplexer in front of the alu that > takes as input the output of the alu of the former cycle and the > source register and the decode stage > then sets the multiplexer select signal ?
Typically the 2:1 muxes are *ahead* of the ALU input operand registers. Call them A and B. Then one pipeline recurrence might be A -> ALU -> result mux -> A fwd mux -> A etc, assuming you have no MEM pipeline stage.
> 2) How is it working with a NOP instruction? Does there the alu > "execute" for example a R0 = R0 + R0.
Yes.
> As R0 is always zero this doesnt > have any effect. Or is there somehow an additional signal from the > decode stage that tells the alu to do nothing?
No, rather it probably does an add of 0 + 0.
> 3) Normally the writeback is done in the first half of the clock cycle > whereas the registers are read in the second half in the decode > register. Does this mean that the decode logic just works in the > second half of the clock cycle or does it do some stuff in the first > clock cycle and then just read out the operands in the second half of > the cycle?
It all depends upon your datapath and pipeline design. Once the IR is latched in FFs it is decoded "continuously". See also: http://fpgacpu.org/papers/xsoc-series-drafts.pdf http://fpgacpu.org/papers/soc-gr0040-paper.pdf Jan Gray
Thanks for your answers Jan, one more issue

> Then one pipeline recurrence might be A -> ALU -> result mux -> A fwd mux -> > A etc, assuming you have no MEM pipeline stage.
In the end there will be a Mem Pipeline. In that case I have a 3:1 mux and the decode logic selects then the correspoding value to use, either the forward of the alu, of the mem or of the register file. Is that correct? Cheers, Patrick
"Patrick" <grabherp23@yahoo.de> wrote in message
> Thanks for your answers Jan, one more issue > >> Then one pipeline recurrence might be A -> ALU -> result mux -> A fwd >> mux -> >> A etc, assuming you have no MEM pipeline stage. > > In the end there will be a Mem Pipeline. In that case I have a 3:1 mux > and the decode logic selects > then the correspoding value to use, either the forward of the alu, of > the mem or of the register file. Is that > correct?
Yes, Patrick, that's exactly right. Note, if you have a shifter, or jump-and-link, or anything else that produces a value into a register, you may need to mux that result in and forward that as well. Another good reference is Computer Organization and Design by Patterson and Henessey. Have fun! Jan.
Patrick schrieb:
> 2) How is it working with a NOP instruction? Does there the alu > "execute" for example a R0 = R0 + R0. As R0 is always zero this doesnt > have any effect. Or is there somehow an additional signal from the > decode stage that tells the alu to do nothing?
That depends on your architecture. If you use flags like ZERO or CARRY which are set on every ALU operation coding NOP as "ADD r0, r0, r0" might not be a good idea. Otherwise, as your register r0 is read-only, you can do this and get your NOP for free in terms of required opcodes. Likewise you can emulate register moves with "ADD r_dest, r_source, r0". If you can encode your NOP instruction as "0...0" life will be easier as internal FPGA memories cells are typically set to 0 on configuration. Best regards Andreas
Patrick wrote:
> Hi there > > I have some general question for implementing a general RISC > architecture. > I have coded so far the fetch, decode, execute and writeback stage. > > 1) Next step is to implement forwarding. Do I have here a 2:1 > multiplexer in front of the alu that > takes as input the output of the alu of the former cycle and the > source register and the decode stage > then sets the multiplexer select signal ? > > 2) How is it working with a NOP instruction? Does there the alu > "execute" for example a R0 = R0 + R0. As R0 is always zero this doesnt > have any effect. Or is there somehow an additional signal from the > decode stage that tells the alu to do nothing? > > 3) Normally the writeback is done in the first half of the clock cycle > whereas the registers are read in the second half in the decode > register. Does this mean that the decode logic just works in the > second half of the clock cycle or does it do some stuff in the first > clock cycle and then just read out the operands in the second half of > the cycle? > > I know some easy questions but would be helpful for understanding to > know this ;) > > Cheers, > Patrick >
Have a look at "Logic and Computer Design Fundamentals" by Mano and Kime, 2nd edition pages 542-562 Ben
> Otherwise, as your register r0 is read-only, you can do this and get > your NOP for free in terms of required opcodes. Likewise you can emulate > register moves with "ADD r_dest, r_source, r0".
I am just implementing the backend of the processor and some issues came up where I am not so sure how to handle them in a proper way. Lets assume I have two Execution Units (1 cycle delay), one memory pipeline (2 cycle delay) and one multiplier(n cycle delay). Lets say I wanna have two write back ports to the register file, so normally they are occupied by the two integer execution units. If there is a load in the memory pipeline then I need in the end one of the write ports and its not possible that both EXECUTION units write a result back. In other words, I cant do an NOP with ADD r0, r0, r0 as then two entities (if the mempipe has dealt with a load instruction) try to write to one write ports. So I assume I need here two additional signals that tell me that either the output of the alu is valid and should be written into the regfile or that the load is finished and has fetch a valid value from the datacache that is ready to be written back into the regfile. Is this a good approach or complete nonsense? Cheers
Patrick,

Why have a NOP instruction?  If NOP is a problem, take it out.

After all, it isn't exactly there to do anything.

I am sure that the compiler can create a "no function" set of
instructions, if these have to be supported.

For example, circular shift right, then circular shift left (result is
the same as before, including all flags).

Austin
Austin Lesea wrote:
> Patrick, > > Why have a NOP instruction? If NOP is a problem, take it out. > > After all, it isn't exactly there to do anything. > > I am sure that the compiler can create a "no function" set of > instructions, if these have to be supported. > > For example, circular shift right, then circular shift left (result is > the same as before, including all flags).
Yes, but commonly a NOP is one cycle, as one use is for simple timing delay patches - one could also support NOP2, NOP3, on some cores, that are more memory efficent for longer delays. You can certainly alias onto any 'do nothing single cycle opcode' which may have been what Austin was meaning ? -jg
> You can certainly alias onto any 'do nothing single cycle opcode' which > may have been what Austin was meaning ?
The problem is as follows: I have got two write ports to the register file. In my architecture I have two execution units, a memory access unit and a multiplier. So in the worst case scenario, it could happen that all these four units want to write to the regfile. This causes a structural hazard as I only have two write ports. My question is now whats the best way to deal with this situation? My approach was, that if OP = 0, the alu stage outputs a signal to the writeback stage that there is no meaningful outport available and I dont need the write port. So instead of having a NOP that writes in the WB stage 0 into R0 I just use an additional control signal so that the WB stage doesnt use the write port and it would be available for the memory access pipeline for instance. So either the alu output should be written back or the memory access uses the writeport. I have to make sure that not both of the units want to access the same write port in the same clock cycle. Is this okay to handle this with additional control signals or is there another way to do that? Or can I use here some kind kind of resolved signal where either the the output of the alu or the output of the memory access unit determines the value to be written?