comp.arch.fpga | Advice to a newbie| page 3

Reply by Mike Perkins ●May 29, 20162016-05-29

On 28/05/2016 17:41, Tim Wescott wrote:

<snip>

> I'm pretty sure I had not yet seen, nor independently conceive, the RISC-
> ish push & pop of multiple registers in one instruction.

Just an observation, but RISC instruction sets, and I'm largely basing 
my assumption on ARM, generally requires a few 'fast' instructions to do 
anything useful.

If you want a single stack or pop of multiple instructions, then you 
would probably need a CISC CPU.

YMMV

-- 
Mike Perkins
Video Solutions Ltd
www.videosolutions.ltd.uk

Reply by rickman ●May 29, 20162016-05-29

On 5/28/2016 12:41 PM, Tim Wescott wrote:
> On Sat, 28 May 2016 10:06:47 -0400, rickman wrote:
>>
>> I ended up dropping further work to a large extent.  I did play with
>> some ideas a few years ago regarding an architecture that included
>> fields in the instruction for small offsets from the stack pointer to
>> try to combine the advantages of register and stack machines.  I think
>> it has potential for working well at the hardware level.  I just don't
>> know enough about compiler writing to program this device from a
>> compiler.  Maybe I'll get back to this again some day.
>
> Ages ago I had a notion about combining the advantages of register and
> stack machines, which was to call the region of 16 addresses around the
> stack "registers", and to have the processor automagically cache them on
> a context switch.  The idea was that the code itself wouldn't have to
> specify registers to save on push and pop because the processor would do
> it automatically.
>
> I'm pretty sure I had not yet seen, nor independently conceive, the RISC-
> ish push & pop of multiple registers in one instruction.

The register stacking approach you describe is not much different from 
the TMS990 mini and TMS9900 micro computers.  They didn't have general 
purpose registers on chip, rather they had a pointer into memory which 
defined the general registers.  Subroutine calls could be done by saving 
the workspace pointer, status register and program counter in the new 
registers allowing the context switches in a very minimal amount of 
time.  This was the BLWP instruction.

It was also possible to use the simpler BL instruction which did not 
change the workspace pointer and use other instructions to modify the 
workspace pointer as if it were a stack pointer.  A bit slower than 
desired, but workable giving not just stacks, but stack frames from 
registers.

Of course the limitation of this approach is the speed of memory which 
started out not much slower than registers, but quickly became a speed 
burden.  This has come full circle in FPGAs where internal memory is not 
significantly slower than registers.

-- 

Rick C

Reply by ●May 29, 20162016-05-29

On Sunday, May 29, 2016 at 7:16:51 AM UTC-5, rickman wrote:
> On 5/28/2016 12:41 PM, Tim Wescott wrote:
> > On Sat, 28 May 2016 10:06:47 -0400, rickman wrote:
> >>
> >> I ended up dropping further work to a large extent.  I did play with
> >> some ideas a few years ago regarding an architecture that included
> >> fields in the instruction for small offsets from the stack pointer to
> >> try to combine the advantages of register and stack machines.  I think
> >> it has potential for working well at the hardware level.  I just don't
> >> know enough about compiler writing to program this device from a
> >> compiler.  Maybe I'll get back to this again some day.
> >
> > Ages ago I had a notion about combining the advantages of register and
> > stack machines, which was to call the region of 16 addresses around the
> > stack "registers", and to have the processor automagically cache them on
> > a context switch.  The idea was that the code itself wouldn't have to
> > specify registers to save on push and pop because the processor would do
> > it automatically.
> >
> > I'm pretty sure I had not yet seen, nor independently conceive, the RISC-
> > ish push & pop of multiple registers in one instruction.
> 
> The register stacking approach you describe is not much different from 
> the TMS990 mini and TMS9900 micro computers.  They didn't have general 
> purpose registers on chip, rather they had a pointer into memory which 
> defined the general registers.  Subroutine calls could be done by saving 
> the workspace pointer, status register and program counter in the new 
> registers allowing the context switches in a very minimal amount of 
> time.  This was the BLWP instruction.
> 
> It was also possible to use the simpler BL instruction which did not 
> change the workspace pointer and use other instructions to modify the 
> workspace pointer as if it were a stack pointer.  A bit slower than 
> desired, but workable giving not just stacks, but stack frames from 
> registers.
> 
> Of course the limitation of this approach is the speed of memory which 
> started out not much slower than registers, but quickly became a speed 
> burden.  This has come full circle in FPGAs where internal memory is not 
> significantly slower than registers.
> 
> -- 
> 
> Rick C

]> > Ages ago I had a notion about combining the advantages of register and
]> > stack machines, which was to call the region of 16 addresses around the
]> > stack "registers", and to have the processor automagically cache them on
]> > a context switch.  The idea was that the code itself wouldn't have to
]> > specify registers to save on push and pop because the processor would do
]> > it automatically.

In the context of a FPGA high performance implementation (of a soft core processor), there seem to be two/three cases:

1) "small" embedded processor where stack requirements are known in advance so that LUT RAM can serve as a register file/stack(s), and the instruction processing adds offsets to two or more register pointers.  Pops & pushes modify the register pointers.

2) Larger applications that need a larger stack(s).  One can either spill and refill the register file from main memory, or one can use block RAM to hold the entire stack(s), main memory being more distant than the block RAM.

A third approach could be to have an associate cache of the block RAM stack(s) such that cache registers "automatically" spill and refill.  Not sure of how to implement this efficiently on an FPGA?

Jim Brakefield

Reply by rickman ●May 29, 20162016-05-29

On 5/29/2016 7:07 PM, jim.brakefield@ieee.org wrote:
> On Sunday, May 29, 2016 at 7:16:51 AM UTC-5, rickman wrote:
>> On 5/28/2016 12:41 PM, Tim Wescott wrote:
>>> On Sat, 28 May 2016 10:06:47 -0400, rickman wrote:
>>>>
>>>> I ended up dropping further work to a large extent.  I did play with
>>>> some ideas a few years ago regarding an architecture that included
>>>> fields in the instruction for small offsets from the stack pointer to
>>>> try to combine the advantages of register and stack machines.  I think
>>>> it has potential for working well at the hardware level.  I just don't
>>>> know enough about compiler writing to program this device from a
>>>> compiler.  Maybe I'll get back to this again some day.
>>>
>>> Ages ago I had a notion about combining the advantages of register and
>>> stack machines, which was to call the region of 16 addresses around the
>>> stack "registers", and to have the processor automagically cache them on
>>> a context switch.  The idea was that the code itself wouldn't have to
>>> specify registers to save on push and pop because the processor would do
>>> it automatically.
>>>
>>> I'm pretty sure I had not yet seen, nor independently conceive, the RISC-
>>> ish push & pop of multiple registers in one instruction.
>>
>> The register stacking approach you describe is not much different from
>> the TMS990 mini and TMS9900 micro computers.  They didn't have general
>> purpose registers on chip, rather they had a pointer into memory which
>> defined the general registers.  Subroutine calls could be done by saving
>> the workspace pointer, status register and program counter in the new
>> registers allowing the context switches in a very minimal amount of
>> time.  This was the BLWP instruction.
>>
>> It was also possible to use the simpler BL instruction which did not
>> change the workspace pointer and use other instructions to modify the
>> workspace pointer as if it were a stack pointer.  A bit slower than
>> desired, but workable giving not just stacks, but stack frames from
>> registers.
>>
>> Of course the limitation of this approach is the speed of memory which
>> started out not much slower than registers, but quickly became a speed
>> burden.  This has come full circle in FPGAs where internal memory is not
>> significantly slower than registers.
>>
>> --
>>
>> Rick C
>
> ]> > Ages ago I had a notion about combining the advantages of register and
> ]> > stack machines, which was to call the region of 16 addresses around the
> ]> > stack "registers", and to have the processor automagically cache them on
> ]> > a context switch.  The idea was that the code itself wouldn't have to
> ]> > specify registers to save on push and pop because the processor would do
> ]> > it automatically.
>
> In the context of a FPGA high performance implementation (of a soft core processor), there seem to be two/three cases:
>
> 1) "small" embedded processor where stack requirements are known in advance so that LUT RAM can serve as a register file/stack(s), and the instruction processing adds offsets to two or more register pointers.  Pops & pushes modify the register pointers.
>
> 2) Larger applications that need a larger stack(s).  One can either spill and refill the register file from main memory, or one can use block RAM to hold the entire stack(s), main memory being more distant than the block RAM.
>
> A third approach could be to have an associate cache of the block RAM stack(s) such that cache registers "automatically" spill and refill.  Not sure of how to implement this efficiently on an FPGA?

I'm not sure what you are trying to address here.  "Large" applications 
can still be implemented on an FPGA if it is big enough.  The larger 
FPGAs have enormous amounts of RAM on chip, as much as 10's of MBs.  It 
would be a large application that needed more than that.  Still, if you 
weren't using one of the really large chips you might not have enough on 
chip RAM for a general stack for a C programmed processor.  But it would 
be a really small FPGA that didn't have enough RAM for a register stack.

When you talk about "spilling" the stack I think we are talking two 
different things.  If your registers are in memory, "spilling" the stack 
is just a matter of changing the pointer.  That's what they do in the TI 
processor.  The only thing they do wrong is to load the register pointer 
from a fixed address rather than using an offset to the present value. 
Using this approach there is no need to copy data from registers to 
stack.  Even if it is automatic it takes a long time to do all the 
memory accesses.

-- 

Rick C

Reply by Tim Wescott ●May 30, 20162016-05-30

On Sun, 29 May 2016 10:35:19 +0100, Mike Perkins wrote:

> On 28/05/2016 17:41, Tim Wescott wrote:
> 
> <snip>
> 
>> I'm pretty sure I had not yet seen, nor independently conceive, the
>> RISC-
>> ish push & pop of multiple registers in one instruction.
> 
> Just an observation, but RISC instruction sets, and I'm largely basing
> my assumption on ARM, generally requires a few 'fast' instructions to do
> anything useful.
> 
> If you want a single stack or pop of multiple instructions, then you
> would probably need a CISC CPU.

From the ARM architecture v7m reference manual, POP instruction:

"Pop Multiple Registers loads a subset, or possibly all, of the general-
purpose registers R0-R12 and the PC or the LR from the stack"

In Thumb, the instruction is 7 bits (1011110) followed by a nine-bit 
bitfield specifying which registers to pop.  PUSH is similar.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by Rick C. Hodgin ●May 30, 20162016-05-30

On Saturday, May 28, 2016 at 7:37:10 PM UTC-4, Rick C. Hodgin wrote:
> On Friday, May 27, 2016 at 11:10:26 PM UTC-4, rickman wrote:
> > This does not directly address your stated issues, but there is a 
> > workshop Saturday.  Notable is that it will use the same starter kit you 
> > have.  I believe you can participate via the Internet.  It might be 
> > interesting to you since it is about CPU design.  Here is a post I made 
> > about this in another group.
> > 
> > Dr. Ting will be leading a workshop on using a Lattice FPGA to implement 
> > an emulation of the 8080 instruction set which will run Forth.
> > 
> > http://www.meetup.com/SV-FIG/events/229926249/
> > 
> > I believe you need to be a member of Meetup to see this page.  I'm not 
> > sure but you may need to be a member of the SVFIG meetup group as well. 
> > There is no charge to join either.
> 
> Thank you for posting this information, Rick C.  I've watched some of
> the content that's available on YouTube from the event.  It's very
> interesting.

I went to Lattice's website and also bought a Brevia2 development kit.
I was able to download their Diamond software and get a license.dat file,
and I found that someone from the Meetup posted Ting's project files
online:

    https://github.com/DRuffer/ep8080

I have been able to get the project loaded, but I haven't gotten to the
part where it synthesizes yet.  Still going through the videos:

    Morning session, a lot of ISA and architecture review:
    https://www.youtube.com/watch?v=rhgCrnF036Y

    Afternoon session, development, design, and synthesis:
    https://www.youtube.com/watch?v=vLzEFU2GvYc

DRuffer was able to get the Forth code to run as well, and he includes
his working JEDEC file:

    https://github.com/DRuffer/ep8080/tree/master/ep80

Best regards,
Rick C. Hodgin

Reply by Rick C. Hodgin ●May 30, 20162016-05-30

On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote:
> I went to Lattice's website and also bought a Brevia2 development kit.
> I was able to download their Diamond software and get a license.dat file,
> and I found that someone from the Meetup posted Ting's project files
> online:
> 
>     https://github.com/DRuffer/ep8080
> 
> I have been able to get the project loaded, but I haven't gotten to the
> part where it synthesizes yet.  Still going through the videos:
> 
>     Morning session, a lot of ISA and architecture review:
>     https://www.youtube.com/watch?v=rhgCrnF036Y
> 
>     Afternoon session, development, design, and synthesis:
>     https://www.youtube.com/watch?v=vLzEFU2GvYc
> 
> DRuffer was able to get the Forth code to run as well, and he includes
> his working JEDEC file:
> 
>     https://github.com/DRuffer/ep8080/tree/master/ep80

I may be missing an obvious link, but if anybody knows where I can get
the PPT files used in these presentations, please pot a link:

    ep8080 architecture morning sessions:
    Feb.27.2016:  https://www.youtube.com/watch?v=-DYKuBmSGaE
    Mar.26.2016:  https://www.youtube.com/watch?v=XO0VqKhsPQE
    Apr.23.2016:  https://www.youtube.com/watch?v=s9cnnPiQtn8

Thank you in advance.

Best regards,
Rick C. Hodgin

Reply by Rick C. Hodgin ●May 30, 20162016-05-30

On Monday, May 30, 2016 at 12:03:28 PM UTC-4, Rick C. Hodgin wrote:
> On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote:
> > I went to Lattice's website and also bought a Brevia2 development kit.
> > I was able to download their Diamond software and get a license.dat file,
> > and I found that someone from the Meetup posted Ting's project files
> > online:
> > 
> >     https://github.com/DRuffer/ep8080
> > 
> > I have been able to get the project loaded, but I haven't gotten to the
> > part where it synthesizes yet.  Still going through the videos:
> > 
> >     Morning session, a lot of ISA and architecture review:
> >     https://www.youtube.com/watch?v=rhgCrnF036Y
> > 
> >     Afternoon session, development, design, and synthesis:
> >     https://www.youtube.com/watch?v=vLzEFU2GvYc
> > 
> > DRuffer was able to get the Forth code to run as well, and he includes
> > his working JEDEC file:
> > 
> >     https://github.com/DRuffer/ep8080/tree/master/ep80
> 
> I may be missing an obvious link, but if anybody knows where I can get
> the PPT files used in these presentations, please pot a link:
> 
>     ep8080 architecture morning sessions:
>     Feb.27.2016:  https://www.youtube.com/watch?v=-DYKuBmSGaE
>     Mar.26.2016:  https://www.youtube.com/watch?v=XO0VqKhsPQE
>     Apr.23.2016:  https://www.youtube.com/watch?v=s9cnnPiQtn8

Also, if anyone has a block diagram or logical component layout of some
kind, one which shows the internal components and how they are all hooked
up through this ep8080 design, please post that info as well.

Best regards,
Rick C. Hodgin

Reply by Cecil Bayona ●May 30, 20162016-05-30

On 5/30/2016 11:41 AM, Rick C. Hodgin wrote:
> On Monday, May 30, 2016 at 12:03:28 PM UTC-4, Rick C. Hodgin wrote:
>> On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote:

>>>
>>>     https://github.com/DRuffer/ep8080/tree/master/ep80
>>
>> I may be missing an obvious link, but if anybody knows where I can get
>> the PPT files used in these presentations, please pot a link:
>>
>>     ep8080 architecture morning sessions:
>>     Feb.27.2016:  https://www.youtube.com/watch?v=-DYKuBmSGaE
>>     Mar.26.2016:  https://www.youtube.com/watch?v=XO0VqKhsPQE
>>     Apr.23.2016:  https://www.youtube.com/watch?v=s9cnnPiQtn8
>
> Also, if anyone has a block diagram or logical component layout of some
> kind, one which shows the internal components and how they are all hooked
> up through this ep8080 design, please post that info as well.
>
> Best regards,
> Rick C. Hodgin
>

I would also be interested in those items, there are several nice 
looking soft CPUs available for use with Forth , the common thread among 
is lack of documentation.
-- 
Cecil - k5nwa

Reply by rickman ●May 30, 20162016-05-30

On 5/30/2016 1:02 PM, Cecil Bayona wrote:
> On 5/30/2016 11:41 AM, Rick C. Hodgin wrote:
>> On Monday, May 30, 2016 at 12:03:28 PM UTC-4, Rick C. Hodgin wrote:
>>> On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote:
>
>>>>
>>>>     https://github.com/DRuffer/ep8080/tree/master/ep80
>>>
>>> I may be missing an obvious link, but if anybody knows where I can get
>>> the PPT files used in these presentations, please pot a link:
>>>
>>>     ep8080 architecture morning sessions:
>>>     Feb.27.2016:  https://www.youtube.com/watch?v=-DYKuBmSGaE
>>>     Mar.26.2016:  https://www.youtube.com/watch?v=XO0VqKhsPQE
>>>     Apr.23.2016:  https://www.youtube.com/watch?v=s9cnnPiQtn8
>>
>> Also, if anyone has a block diagram or logical component layout of some
>> kind, one which shows the internal components and how they are all hooked
>> up through this ep8080 design, please post that info as well.
>>
>> Best regards,
>> Rick C. Hodgin
>>
>
> I would also be interested in those items, there are several nice
> looking soft CPUs available for use with Forth , the common thread among
> is lack of documentation.

The best way to learn about the structure of the ep8080 would be to draw 
a block diagram from the VHDL code.   I looked at the code when I 
debugged the problem I found and it is not so complex.  There are 
separate registers for the user accessible registers as well as the 
internal registers like the PSW.  There is a process for the control 
signals enabling the registers and controlling the various other 
functions in the CPU such as multiplexers and carry bits, etc.  There 
are the multiplexers and the other data path logic.

To draw the block diagram, I would follow the data path from the 
registers backwards to the sources.  I believe you will find there is a 
small multiplexer on the input of each register and two larger muxes 
controlled by the source and destination fields of the instruction 
opcode.  I can't say much about the rest, I didn't dig in to understand 
it all.  Once you have mapped out the data path, you can trace the 
control flow through the control logic to understand how the opcode is 
interpreted.

-- 

Rick C

Previous 1 234 5 6 Next

Advice to a newbie

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group