FPGARelated.com
Forums

Advice to a newbie

Started by Cecil Bayona May 27, 2016
On 28/05/2016 17:41, Tim Wescott wrote:

<snip>

> I'm pretty sure I had not yet seen, nor independently conceive, the RISC- > ish push & pop of multiple registers in one instruction.
Just an observation, but RISC instruction sets, and I'm largely basing my assumption on ARM, generally requires a few 'fast' instructions to do anything useful. If you want a single stack or pop of multiple instructions, then you would probably need a CISC CPU. YMMV -- Mike Perkins Video Solutions Ltd www.videosolutions.ltd.uk
On 5/28/2016 12:41 PM, Tim Wescott wrote:
> On Sat, 28 May 2016 10:06:47 -0400, rickman wrote: >> >> I ended up dropping further work to a large extent. I did play with >> some ideas a few years ago regarding an architecture that included >> fields in the instruction for small offsets from the stack pointer to >> try to combine the advantages of register and stack machines. I think >> it has potential for working well at the hardware level. I just don't >> know enough about compiler writing to program this device from a >> compiler. Maybe I'll get back to this again some day. > > Ages ago I had a notion about combining the advantages of register and > stack machines, which was to call the region of 16 addresses around the > stack "registers", and to have the processor automagically cache them on > a context switch. The idea was that the code itself wouldn't have to > specify registers to save on push and pop because the processor would do > it automatically. > > I'm pretty sure I had not yet seen, nor independently conceive, the RISC- > ish push & pop of multiple registers in one instruction.
The register stacking approach you describe is not much different from the TMS990 mini and TMS9900 micro computers. They didn't have general purpose registers on chip, rather they had a pointer into memory which defined the general registers. Subroutine calls could be done by saving the workspace pointer, status register and program counter in the new registers allowing the context switches in a very minimal amount of time. This was the BLWP instruction. It was also possible to use the simpler BL instruction which did not change the workspace pointer and use other instructions to modify the workspace pointer as if it were a stack pointer. A bit slower than desired, but workable giving not just stacks, but stack frames from registers. Of course the limitation of this approach is the speed of memory which started out not much slower than registers, but quickly became a speed burden. This has come full circle in FPGAs where internal memory is not significantly slower than registers. -- Rick C
On Sunday, May 29, 2016 at 7:16:51 AM UTC-5, rickman wrote:
> On 5/28/2016 12:41 PM, Tim Wescott wrote: > > On Sat, 28 May 2016 10:06:47 -0400, rickman wrote: > >> > >> I ended up dropping further work to a large extent. I did play with > >> some ideas a few years ago regarding an architecture that included > >> fields in the instruction for small offsets from the stack pointer to > >> try to combine the advantages of register and stack machines. I think > >> it has potential for working well at the hardware level. I just don't > >> know enough about compiler writing to program this device from a > >> compiler. Maybe I'll get back to this again some day. > > > > Ages ago I had a notion about combining the advantages of register and > > stack machines, which was to call the region of 16 addresses around the > > stack "registers", and to have the processor automagically cache them on > > a context switch. The idea was that the code itself wouldn't have to > > specify registers to save on push and pop because the processor would do > > it automatically. > > > > I'm pretty sure I had not yet seen, nor independently conceive, the RISC- > > ish push & pop of multiple registers in one instruction. > > The register stacking approach you describe is not much different from > the TMS990 mini and TMS9900 micro computers. They didn't have general > purpose registers on chip, rather they had a pointer into memory which > defined the general registers. Subroutine calls could be done by saving > the workspace pointer, status register and program counter in the new > registers allowing the context switches in a very minimal amount of > time. This was the BLWP instruction. > > It was also possible to use the simpler BL instruction which did not > change the workspace pointer and use other instructions to modify the > workspace pointer as if it were a stack pointer. A bit slower than > desired, but workable giving not just stacks, but stack frames from > registers. > > Of course the limitation of this approach is the speed of memory which > started out not much slower than registers, but quickly became a speed > burden. This has come full circle in FPGAs where internal memory is not > significantly slower than registers. > > -- > > Rick C
]> > Ages ago I had a notion about combining the advantages of register and ]> > stack machines, which was to call the region of 16 addresses around the ]> > stack "registers", and to have the processor automagically cache them on ]> > a context switch. The idea was that the code itself wouldn't have to ]> > specify registers to save on push and pop because the processor would do ]> > it automatically. In the context of a FPGA high performance implementation (of a soft core processor), there seem to be two/three cases: 1) "small" embedded processor where stack requirements are known in advance so that LUT RAM can serve as a register file/stack(s), and the instruction processing adds offsets to two or more register pointers. Pops & pushes modify the register pointers. 2) Larger applications that need a larger stack(s). One can either spill and refill the register file from main memory, or one can use block RAM to hold the entire stack(s), main memory being more distant than the block RAM. A third approach could be to have an associate cache of the block RAM stack(s) such that cache registers "automatically" spill and refill. Not sure of how to implement this efficiently on an FPGA? Jim Brakefield
On 5/29/2016 7:07 PM, jim.brakefield@ieee.org wrote:
> On Sunday, May 29, 2016 at 7:16:51 AM UTC-5, rickman wrote: >> On 5/28/2016 12:41 PM, Tim Wescott wrote: >>> On Sat, 28 May 2016 10:06:47 -0400, rickman wrote: >>>> >>>> I ended up dropping further work to a large extent. I did play with >>>> some ideas a few years ago regarding an architecture that included >>>> fields in the instruction for small offsets from the stack pointer to >>>> try to combine the advantages of register and stack machines. I think >>>> it has potential for working well at the hardware level. I just don't >>>> know enough about compiler writing to program this device from a >>>> compiler. Maybe I'll get back to this again some day. >>> >>> Ages ago I had a notion about combining the advantages of register and >>> stack machines, which was to call the region of 16 addresses around the >>> stack "registers", and to have the processor automagically cache them on >>> a context switch. The idea was that the code itself wouldn't have to >>> specify registers to save on push and pop because the processor would do >>> it automatically. >>> >>> I'm pretty sure I had not yet seen, nor independently conceive, the RISC- >>> ish push & pop of multiple registers in one instruction. >> >> The register stacking approach you describe is not much different from >> the TMS990 mini and TMS9900 micro computers. They didn't have general >> purpose registers on chip, rather they had a pointer into memory which >> defined the general registers. Subroutine calls could be done by saving >> the workspace pointer, status register and program counter in the new >> registers allowing the context switches in a very minimal amount of >> time. This was the BLWP instruction. >> >> It was also possible to use the simpler BL instruction which did not >> change the workspace pointer and use other instructions to modify the >> workspace pointer as if it were a stack pointer. A bit slower than >> desired, but workable giving not just stacks, but stack frames from >> registers. >> >> Of course the limitation of this approach is the speed of memory which >> started out not much slower than registers, but quickly became a speed >> burden. This has come full circle in FPGAs where internal memory is not >> significantly slower than registers. >> >> -- >> >> Rick C > > ]> > Ages ago I had a notion about combining the advantages of register and > ]> > stack machines, which was to call the region of 16 addresses around the > ]> > stack "registers", and to have the processor automagically cache them on > ]> > a context switch. The idea was that the code itself wouldn't have to > ]> > specify registers to save on push and pop because the processor would do > ]> > it automatically. > > In the context of a FPGA high performance implementation (of a soft core processor), there seem to be two/three cases: > > 1) "small" embedded processor where stack requirements are known in advance so that LUT RAM can serve as a register file/stack(s), and the instruction processing adds offsets to two or more register pointers. Pops & pushes modify the register pointers. > > 2) Larger applications that need a larger stack(s). One can either spill and refill the register file from main memory, or one can use block RAM to hold the entire stack(s), main memory being more distant than the block RAM. > > A third approach could be to have an associate cache of the block RAM stack(s) such that cache registers "automatically" spill and refill. Not sure of how to implement this efficiently on an FPGA?
I'm not sure what you are trying to address here. "Large" applications can still be implemented on an FPGA if it is big enough. The larger FPGAs have enormous amounts of RAM on chip, as much as 10's of MBs. It would be a large application that needed more than that. Still, if you weren't using one of the really large chips you might not have enough on chip RAM for a general stack for a C programmed processor. But it would be a really small FPGA that didn't have enough RAM for a register stack. When you talk about "spilling" the stack I think we are talking two different things. If your registers are in memory, "spilling" the stack is just a matter of changing the pointer. That's what they do in the TI processor. The only thing they do wrong is to load the register pointer from a fixed address rather than using an offset to the present value. Using this approach there is no need to copy data from registers to stack. Even if it is automatic it takes a long time to do all the memory accesses. -- Rick C
On Sun, 29 May 2016 10:35:19 +0100, Mike Perkins wrote:

> On 28/05/2016 17:41, Tim Wescott wrote: > > <snip> > >> I'm pretty sure I had not yet seen, nor independently conceive, the >> RISC- >> ish push & pop of multiple registers in one instruction. > > Just an observation, but RISC instruction sets, and I'm largely basing > my assumption on ARM, generally requires a few 'fast' instructions to do > anything useful. > > If you want a single stack or pop of multiple instructions, then you > would probably need a CISC CPU.
From the ARM architecture v7m reference manual, POP instruction: "Pop Multiple Registers loads a subset, or possibly all, of the general- purpose registers R0-R12 and the PC or the LR from the stack" In Thumb, the instruction is 7 bits (1011110) followed by a nine-bit bitfield specifying which registers to pop. PUSH is similar. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On Saturday, May 28, 2016 at 7:37:10 PM UTC-4, Rick C. Hodgin wrote:
> On Friday, May 27, 2016 at 11:10:26 PM UTC-4, rickman wrote: > > This does not directly address your stated issues, but there is a > > workshop Saturday. Notable is that it will use the same starter kit you > > have. I believe you can participate via the Internet. It might be > > interesting to you since it is about CPU design. Here is a post I made > > about this in another group. > > > > Dr. Ting will be leading a workshop on using a Lattice FPGA to implement > > an emulation of the 8080 instruction set which will run Forth. > > > > http://www.meetup.com/SV-FIG/events/229926249/ > > > > I believe you need to be a member of Meetup to see this page. I'm not > > sure but you may need to be a member of the SVFIG meetup group as well. > > There is no charge to join either. > > Thank you for posting this information, Rick C. I've watched some of > the content that's available on YouTube from the event. It's very > interesting.
I went to Lattice's website and also bought a Brevia2 development kit. I was able to download their Diamond software and get a license.dat file, and I found that someone from the Meetup posted Ting's project files online: https://github.com/DRuffer/ep8080 I have been able to get the project loaded, but I haven't gotten to the part where it synthesizes yet. Still going through the videos: Morning session, a lot of ISA and architecture review: https://www.youtube.com/watch?v=rhgCrnF036Y Afternoon session, development, design, and synthesis: https://www.youtube.com/watch?v=vLzEFU2GvYc DRuffer was able to get the Forth code to run as well, and he includes his working JEDEC file: https://github.com/DRuffer/ep8080/tree/master/ep80 Best regards, Rick C. Hodgin
On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote:
> I went to Lattice's website and also bought a Brevia2 development kit. > I was able to download their Diamond software and get a license.dat file, > and I found that someone from the Meetup posted Ting's project files > online: > > https://github.com/DRuffer/ep8080 > > I have been able to get the project loaded, but I haven't gotten to the > part where it synthesizes yet. Still going through the videos: > > Morning session, a lot of ISA and architecture review: > https://www.youtube.com/watch?v=rhgCrnF036Y > > Afternoon session, development, design, and synthesis: > https://www.youtube.com/watch?v=vLzEFU2GvYc > > DRuffer was able to get the Forth code to run as well, and he includes > his working JEDEC file: > > https://github.com/DRuffer/ep8080/tree/master/ep80
I may be missing an obvious link, but if anybody knows where I can get the PPT files used in these presentations, please pot a link: ep8080 architecture morning sessions: Feb.27.2016: https://www.youtube.com/watch?v=-DYKuBmSGaE Mar.26.2016: https://www.youtube.com/watch?v=XO0VqKhsPQE Apr.23.2016: https://www.youtube.com/watch?v=s9cnnPiQtn8 Thank you in advance. Best regards, Rick C. Hodgin
On Monday, May 30, 2016 at 12:03:28 PM UTC-4, Rick C. Hodgin wrote:
> On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote: > > I went to Lattice's website and also bought a Brevia2 development kit. > > I was able to download their Diamond software and get a license.dat file, > > and I found that someone from the Meetup posted Ting's project files > > online: > > > > https://github.com/DRuffer/ep8080 > > > > I have been able to get the project loaded, but I haven't gotten to the > > part where it synthesizes yet. Still going through the videos: > > > > Morning session, a lot of ISA and architecture review: > > https://www.youtube.com/watch?v=rhgCrnF036Y > > > > Afternoon session, development, design, and synthesis: > > https://www.youtube.com/watch?v=vLzEFU2GvYc > > > > DRuffer was able to get the Forth code to run as well, and he includes > > his working JEDEC file: > > > > https://github.com/DRuffer/ep8080/tree/master/ep80 > > I may be missing an obvious link, but if anybody knows where I can get > the PPT files used in these presentations, please pot a link: > > ep8080 architecture morning sessions: > Feb.27.2016: https://www.youtube.com/watch?v=-DYKuBmSGaE > Mar.26.2016: https://www.youtube.com/watch?v=XO0VqKhsPQE > Apr.23.2016: https://www.youtube.com/watch?v=s9cnnPiQtn8
Also, if anyone has a block diagram or logical component layout of some kind, one which shows the internal components and how they are all hooked up through this ep8080 design, please post that info as well. Best regards, Rick C. Hodgin
On 5/30/2016 11:41 AM, Rick C. Hodgin wrote:
> On Monday, May 30, 2016 at 12:03:28 PM UTC-4, Rick C. Hodgin wrote: >> On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote:
>>> >>> https://github.com/DRuffer/ep8080/tree/master/ep80 >> >> I may be missing an obvious link, but if anybody knows where I can get >> the PPT files used in these presentations, please pot a link: >> >> ep8080 architecture morning sessions: >> Feb.27.2016: https://www.youtube.com/watch?v=-DYKuBmSGaE >> Mar.26.2016: https://www.youtube.com/watch?v=XO0VqKhsPQE >> Apr.23.2016: https://www.youtube.com/watch?v=s9cnnPiQtn8 > > Also, if anyone has a block diagram or logical component layout of some > kind, one which shows the internal components and how they are all hooked > up through this ep8080 design, please post that info as well. > > Best regards, > Rick C. Hodgin >
I would also be interested in those items, there are several nice looking soft CPUs available for use with Forth , the common thread among is lack of documentation. -- Cecil - k5nwa
On 5/30/2016 1:02 PM, Cecil Bayona wrote:
> On 5/30/2016 11:41 AM, Rick C. Hodgin wrote: >> On Monday, May 30, 2016 at 12:03:28 PM UTC-4, Rick C. Hodgin wrote: >>> On Monday, May 30, 2016 at 11:54:11 AM UTC-4, Rick C. Hodgin wrote: > >>>> >>>> https://github.com/DRuffer/ep8080/tree/master/ep80 >>> >>> I may be missing an obvious link, but if anybody knows where I can get >>> the PPT files used in these presentations, please pot a link: >>> >>> ep8080 architecture morning sessions: >>> Feb.27.2016: https://www.youtube.com/watch?v=-DYKuBmSGaE >>> Mar.26.2016: https://www.youtube.com/watch?v=XO0VqKhsPQE >>> Apr.23.2016: https://www.youtube.com/watch?v=s9cnnPiQtn8 >> >> Also, if anyone has a block diagram or logical component layout of some >> kind, one which shows the internal components and how they are all hooked >> up through this ep8080 design, please post that info as well. >> >> Best regards, >> Rick C. Hodgin >> > > I would also be interested in those items, there are several nice > looking soft CPUs available for use with Forth , the common thread among > is lack of documentation.
The best way to learn about the structure of the ep8080 would be to draw a block diagram from the VHDL code. I looked at the code when I debugged the problem I found and it is not so complex. There are separate registers for the user accessible registers as well as the internal registers like the PSW. There is a process for the control signals enabling the registers and controlling the various other functions in the CPU such as multiplexers and carry bits, etc. There are the multiplexers and the other data path logic. To draw the block diagram, I would follow the data path from the registers backwards to the sources. I believe you will find there is a small multiplexer on the input of each register and two larger muxes controlled by the source and destination fields of the instruction opcode. I can't say much about the rest, I didn't dig in to understand it all. Once you have mapped out the data path, you can trace the control flow through the control logic to understand how the opcode is interpreted. -- Rick C