FPGARelated.com
Forums

6502 FPGA core

Started by Frank Buss May 26, 2007
On May 27, 11:36 am, Frank Buss <f...@frank-buss.de> wrote:

> I think readability is very good (ok, maybe because I know Forth and Lisp) > and power-usage should be good, too, because fewer LEs are used. My current > Forth FPGA implementation needs 319 LEs (about 5% of the small Cyclone > EP1C6Q240C8). But I expect 10 times slower than e.g. the T65, so the all in > all cycles per power would be not so good.
> Frank Buss, f...@frank-buss.dehttp://www.frank-buss.de,http://www.it4-systems.de
Don't worry to much about speed. You will be amazed how easy it is to optimize uCode, as soon as the processor really works. And nobody says, that you have only one execution unit in the system. (Had once something like this with 2.5 execution units, controlling a 36 bit proecessor (data width) But, excellent project !
Brian Drummond wrote:

> On Mon, 28 May 2007 07:34:24 +1200, Jim Granville > <no.spam@designtools.maps.co.nz> wrote: > > >>Frank Buss wrote: >><snip> > > >>If this runs slower, one of my pet ideas for FPGA cores, is to design >>them to run from SerialFLASH memory. Top end ones (winbond) run at >>150MBd of link speed, so can feed nearly 20MB/s of streaming code. >>Ideally, the core has a short-skip opcode, as the jump in such memory >>has a higher cost. > > > Or a "four address instruction" like the Pilot Ace, with SerialFlash in > place of a tube full of mercury?
You've lost me ? -jg
On May 28, 12:40 pm, Jim Granville <no.s...@designtools.maps.co.nz>
wrote:
> Brian Drummond wrote: > > On Mon, 28 May 2007 07:34:24 +1200, Jim Granville > > <no.s...@designtools.maps.co.nz> wrote: > > >>Frank Buss wrote: > >><snip> > > >>If this runs slower, one of my pet ideas for FPGA cores, is to design > >>them to run from SerialFLASH memory. Top end ones (winbond) run at > >>150MBd of link speed, so can feed nearly 20MB/s of streaming code. > >>Ideally, the core has a short-skip opcode, as the jump in such memory > >>has a higher cost. > > > Or a "four address instruction" like the Pilot Ace, with SerialFlash in > > place of a tube full of mercury? > > You've lost me ? > -jg
Me too, but this looks relevant http://research.microsoft.com/~GBell/Computer_Structures__Readings_and_Examples/00000213.htm
Tommy Thorn wrote:
> On May 28, 12:40 pm, Jim Granville <no.s...@designtools.maps.co.nz> > wrote: > >>Brian Drummond wrote: >> >>>On Mon, 28 May 2007 07:34:24 +1200, Jim Granville >>><no.s...@designtools.maps.co.nz> wrote: >> >>>>Frank Buss wrote: >>>><snip> >> >>>>If this runs slower, one of my pet ideas for FPGA cores, is to design >>>>them to run from SerialFLASH memory. Top end ones (winbond) run at >>>>150MBd of link speed, so can feed nearly 20MB/s of streaming code. >>>>Ideally, the core has a short-skip opcode, as the jump in such memory >>>>has a higher cost. >> >>>Or a "four address instruction" like the Pilot Ace, with SerialFlash in >>>place of a tube full of mercury? >> >>You've lost me ? >>-jg > > > Me too, but this looks relevant > > http://research.microsoft.com/~GBell/Computer_Structures__Readings_and_Examples/00000213.htm
Wow, that's quite impressive. A 1MHz clock, back in 1951! I had not thought of Serial Data, only Serial code access, as those speeds are getting tolerable, and the pin/pcb savings are massive. Most FPGAs have some SRAM, and uC projects commonly need less DATA than Code, but it raises a good point: Serial data _could_ also be used, and the Ramtron FRAM devices would be good candidates - up to 64K bytes of Data, in 20MHz SPI. So, you'd set that up on separate pins. -jg
On Tue, 29 May 2007 07:40:21 +1200, Jim Granville
<no.spam@designtools.maps.co.nz> wrote:

>Brian Drummond wrote: > >> On Mon, 28 May 2007 07:34:24 +1200, Jim Granville >> <no.spam@designtools.maps.co.nz> wrote: >> >> >>>Frank Buss wrote: >>><snip> >> >> >>>If this runs slower, one of my pet ideas for FPGA cores, is to design >>>them to run from SerialFLASH memory. Top end ones (winbond) run at >>>150MBd of link speed, so can feed nearly 20MB/s of streaming code. >>>Ideally, the core has a short-skip opcode, as the jump in such memory >>>has a higher cost. >> >> >> Or a "four address instruction" like the Pilot Ace, with SerialFlash in >> place of a tube full of mercury? > >You've lost me ? >-jg
In some designs of that era, three address instructions were common, source1, source2 and dest, very like the register addresses in a RISC. The innovation here was a fourth address; for the next instruction, coded to appear out of the delay line (or drum memory) just when it was needed. Important because the next location in program memory would have flashed past, and you'd have to wait for the memory's cycle time (or a whole drum revolution) before it came round again. Apparently it was a headache to hand-code for maximum performance, or "offered great scope for programmer ingenuity" :-) but worthwhile for heavily used code. (I believe it had the first floating point library, coded this way) But it could still be useful for streaming instructions from serial memory. - Brian
On Tue, 29 May 2007 10:48:51 +1200, Jim Granville
<no.spam@designtools.maps.co.nz> wrote:

>Tommy Thorn wrote: >> On May 28, 12:40 pm, Jim Granville <no.s...@designtools.maps.co.nz> >> wrote: >> >>>Brian Drummond wrote:
>>>>Or a "four address instruction" like the Pilot Ace, with SerialFlash in >>>>place of a tube full of mercury? >>> >>>You've lost me ? >>>-jg >> >> >> Me too, but this looks relevant >> >> http://research.microsoft.com/~GBell/Computer_Structures__Readings_and_Examples/00000213.htm > >Wow, that's quite impressive. A 1MHz clock, back in 1951!
"it is not thought wise to design for higher speeds than this as yet" http://www.alanturing.net/turing_archive/archive/p/p01/P01-001.html (from 1945) May 1950 according to http://www.npl.co.uk/publications/metromnia/issue8/ which has some details. Apparently both code and data, but the "fourth address" was specifically to optimise code location. Surprisingly small, according to http://www.scienceandsociety.co.uk/results.asp?image=10303412 - Brian (wondering how many tubes you can fit in a CLB)
Frank Buss wrote:
> I've implemented a first version of a 6502 core. It has a very simple > architecture: First the command is read and then for every command a list > of microcodes are executed, controlled by a state machine. To avoid the > redundant VHDL typing, the VHDL code is generated with a Lisp program: > > http://www.frank-buss.de/vhdl/cpu.lisp > > This is the output: > > http://www.frank-buss.de/vhdl/t_rex_test.vhdl > > I've tested some instructions, like LDA, and looks like it works, but I'm > sure there are many bugs and not all features are implemented (e.g. BCD > mode or interrupt handling). It uses 2,960 LEs with Quartus 7.1, which is > too much compared to the 797 LEs of the T65 project. Any ideas how to > improve it? My idea was, that the synthesizer would be able to merge the > addressing mode implementations for the commands, but maybe this has to be > refactored by hand. > > My goal is to beat the T65 project in LE usage. Speed and 100% > compatibility with the original 6502 (e.g. the strange S0 and V-flag > feature or the original hardware reset vectors) is not important for me, > but code compiled with http://www.cc65.org/ must work. > > Most FPGAs have some kbyte memory (>5 kByte, even for inexpensive FPGAs, > freely configurable as ROM and RAM), so maybe a good idea would be to store > some microcode in memory? What instruction set is useful to implement the > 6502 instruction set? Maybe a Forth-like microcode? > > Any ideas how to improve the Lisp code? I like my idea of using a lambda > function in addressing-commands, because this looks more clean than a > macro, which I've tried first, but I don't like the explicit call of > emit-lines. How can I refactor it to a more DSL like approach? >
Somewhere around here I have a (very old) reference manual for the 6502 - one of my all time favourite processors - that actually listed the instruction decode by bit positions. I'll have to dig it out and amuse myself by writing some code to actually do the decode using straight combinational logic ;) Cheers PeteS
Brian Drummond wrote:

> "it is not thought wise to design for higher speeds than this as yet" > http://www.alanturing.net/turing_archive/archive/p/p01/P01-001.html > (from 1945)
That's on page 3. Another funny sentence, describingthe architecture: "Erasible memory units of fairly large capacity, to be known as dynamic storage (DS). Probable consisting of between 50 and 500 mercury tanks with a capacity of about 1000 digits each." "digit" here means binary digit, so this will be about 0.04 per mill of my current PC main memory. The interesting thing is the instruction set, but it is very difficult to extract it from the document, because it is a mix of proposals and detailed descriptions of which registers to use for which arithmetic operations. Is there any documentation of the actually running system? If possible, as a modern, pure functional, description, without describing the problems and architecture of mercury delay lines :-) -- Frank Buss, fb@frank-buss.de http://www.frank-buss.de, http://www.it4-systems.de