Well, finally got Altera Quartus 10.1 working on Jolicloud linux using the = dash->bash hack. The compile speed on this netbook is quite good compared t= o the old windows box. I've fiddled with the instruction set, the sound fil= ter and the video resolution, and added some modulo addressing on the R and= S stack registers. It now weighs in at 75% of 1270 4LUT device. No specific altera megafunctio= ns used. Pure VHDL. (will add UFM spi though at some point). With no constr= aints it gives Fmax of 85MHz in C5. All arithmetic is based on the MInus in= struction. All conditional branching is based on stack return address manip= ulation. http://code.google.com/p/nibz/downloads/detail?name=3DnibzX7.vhd&can=3D2&q= =3D which is a re-upload while a compiling version (no VHDL errors) Apart from later bug fixes, it's a wrap. Free BSD. Cheers Jacko
NibzX7 processor
Started by ●April 16, 2011
Reply by ●April 16, 20112011-04-16
On Apr 16, 12:39=A0pm, jacko <jackokr...@gmail.com> wrote:> Well, finally got Altera Quartus 10.1 working on Jolicloud linux using th=e dash->bash hack. The compile speed on this netbook is quite good compared= to the old windows box. I've fiddled with the instruction set, the sound f= ilter and the video resolution, and added some modulo addressing on the R a= nd S stack registers.> > It now weighs in at 75% of 1270 4LUT device. No specific altera megafunct=ions used. Pure VHDL. (will add UFM spi though at some point). With no cons= traints it gives Fmax of 85MHz in C5. All arithmetic is based on the MInus = instruction. All conditional branching is based on stack return address man= ipulation.> > http://code.google.com/p/nibz/downloads/detail?name=3DnibzX7.vhd&can=3D2&=q=3D> which is a re-upload while a compiling version (no VHDL errors) > > Apart from later bug fixes, it's a wrap. Free BSD. > > Cheers JackoJacko, What was you goal in designing this CPU? What were you attempting to opimize? Only having a minus instruction for arithmetic seems like it might use a dozen or so fewer LUTs, but at what cost? I can only assume that means an addition is done by first subtracting one addend from 0 and then subtracting it from the other addend. So every add requires two instructions. I believe your instruction set is pretty minimal from what I've seen. How many bits wide are the stacks? Just under 1000 LUTs is not bad for a 16 bit processor and is really good for a 32 bit machine. Have you seen the ZPU? It is a stack based machine designed to be coded in C! What will they think of next... Rick
Reply by ●April 17, 20112011-04-17
On Saturday, 16 April 2011 21:18:18 UTC+1, rickman wrote: <snip>> What was you goal in designing this CPU?=20To make a very small system (it includes a video and sound output too), aft= er that the priorities were a high MIPS/MB rating, a high MIPS rating, a re= asonably high code density using a dynamic compression system, and a founda= tion to make larger SMP systems.> What were you attempting to > opimize?=20Mainly area, but the speed technology was used for the last compile as it f= its.> Only having a minus instruction for arithmetic seems like it > might use a dozen or so fewer LUTs, but at what cost? I can only > assume that means an addition is done by first subtracting one addend > from 0 and then subtracting it from the other addend. So every add > requires two instructions. I believe your instruction set is pretty > minimal from what I've seen.Yes, the instruction set is optimized for threaded code, and so it's likely= + would be a subroutine.> How many bits wide are the stacks? Just > under 1000 LUTs is not bad for a 16 bit processor and is really good > for a 32 bit machine.The stacks (2 of them) are 16 bit wide with auto increment and decrement.> Have you seen the ZPU? It is a stack based machine designed to be > coded in C! What will they think of next...I had a look, and the very small version is limited in the number of instru= ctions it offers. Designed for C? Almost as funny a claim as designed for H= askell...=20 Cheers Jacko
Reply by ●April 17, 20112011-04-17
On Apr 17, 5:18=A0am, jacko <jackokr...@gmail.com> wrote:> On Saturday, 16 April 2011 21:18:18 UTC+1, rickman =A0wrote: > > <snip> > > > What was you goal in designing this CPU? > > To make a very small system (it includes a video and sound output too), a=fter that the priorities were a high MIPS/MB rating, a high MIPS rating, a = reasonably high code density using a dynamic compression system, and a foun= dation to make larger SMP systems.> > > What were you attempting to > > opimize? > > Mainly area, but the speed technology was used for the last compile as it=fits.> > > Only having a minus instruction for arithmetic seems like it > > might use a dozen or so fewer LUTs, but at what cost? =A0I can only > > assume that means an addition is done by first subtracting one addend > > from 0 and then subtracting it from the other addend. =A0So every add > > requires two instructions. =A0I believe your instruction set is pretty > > minimal from what I've seen. > > Yes, the instruction set is optimized for threaded code, and so it's like=ly + would be a subroutine.> > > How many bits wide are the stacks? =A0Just > > under 1000 LUTs is not bad for a 16 bit processor and is really good > > for a 32 bit machine. > > The stacks (2 of them) are 16 bit wide with auto increment and decrement. > > > Have you seen the ZPU? =A0It is a stack based machine designed to be > > coded in C! =A0What will they think of next... > > I had a look, and the very small version is limited in the number of inst=ructions it offers. Designed for C? Almost as funny a claim as designed for= Haskell... Not sure I follow. What do you mean the instructions are limited? They use emulation to implement some instructions depending on the core used. This is very much the Forth concept of building words. The C claim is not funny, It's real. They are using gcc I believe and people have used the ZPU in real apps. I wasn't impressed because it is not as fast as my design, but it is even smaller and faster versions are not a lot bigger. So I have to give them their due. They met their goal of making the smallest possible (32 bit!) processor supported by a C compiler with variations designed for higher speeds. I don't think there is ANY other soft CPU under several thousand LUTs that has a C compiler. Rick
Reply by ●April 17, 20112011-04-17
On Sunday, 17 April 2011 23:00:57 UTC+1, rickman wrote:> On Apr 17, 5:18=A0am, jacko <jacko...@gmail.com> wrote: > > On Saturday, 16 April 2011 21:18:18 UTC+1, rickman =A0wrote: > > > > <snip> > > > > > What was you goal in designing this CPU? > > > > To make a very small system (it includes a video and sound output too),=after that the priorities were a high MIPS/MB rating, a high MIPS rating, = a reasonably high code density using a dynamic compression system, and a fo= undation to make larger SMP systems.> > > > > What were you attempting to > > > opimize? > > > > Mainly area, but the speed technology was used for the last compile as =it fits.> > > > > Only having a minus instruction for arithmetic seems like it > > > might use a dozen or so fewer LUTs, but at what cost? =A0I can only > > > assume that means an addition is done by first subtracting one addend > > > from 0 and then subtracting it from the other addend. =A0So every add > > > requires two instructions. =A0I believe your instruction set is prett=y> > > minimal from what I've seen. > > > > Yes, the instruction set is optimized for threaded code, and so it's li=kely + would be a subroutine.> > > > > How many bits wide are the stacks? =A0Just > > > under 1000 LUTs is not bad for a 16 bit processor and is really good > > > for a 32 bit machine. > > > > The stacks (2 of them) are 16 bit wide with auto increment and decremen=t.> > > > > Have you seen the ZPU? =A0It is a stack based machine designed to be > > > coded in C! =A0What will they think of next... > > > > I had a look, and the very small version is limited in the number of in=structions it offers. Designed for C? Almost as funny a claim as designed f= or Haskell...>=20 > Not sure I follow. What do you mean the instructions are limited? > They use emulation to implement some instructions depending on the > core used. This is very much the Forth concept of building words.Yes the emulation to reduce core size, 'having' instructions but not implem= enting them in hardware is not having them at all, and is marketing speak.> The C claim is not funny, It's real. They are using gcc I believe and > people have used the ZPU in real apps. I wasn't impressed because it > is not as fast as my design, but it is even smaller and faster > versions are not a lot bigger. So I have to give them their due. > They met their goal of making the smallest possible (32 bit!) > processor supported by a C compiler with variations designed for > higher speeds. I don't think there is ANY other soft CPU under > several thousand LUTs that has a C compiler.Supporting C, that's good, but designed for C is more marketing speak. Cons= idering C was designed to work on processors, I'd expect a stack frame link= instruction similar to the 68k at least... with word stride multiplication= for pointer arithmetic... but fair dues, it's not too bad, but suffers fro= m hype. Cheers Jacko
Reply by ●April 18, 20112011-04-18
On 18 Apr., 00:00, rickman <gnu...@gmail.com> wrote:>=A0I don't think there is ANY other soft CPU under > several thousand LUTs that has a C compiler. >Please do not forget my ERIC5: About 300 LUTs, about ATMEL AVR performance, with C-compiler. Regards, Thomas www.entner-electronics.com
Reply by ●April 18, 20112011-04-18
On 18 Apr., 12:08, Thomas Entner <thomas.entne...@gmail.com> wrote:> On 18 Apr., 00:00, rickman <gnu...@gmail.com> wrote:>=A0I don't think the=re is ANY other soft CPU under> > several thousand LUTs that has a C compiler. > > Please do not forget my ERIC5: About 300 LUTs, about ATMEL AVR > performance, with C-compiler. > > Regards, > > Thomas > > www.entner-electronics.comP.S.: And it even includes an add-instruction ;-) Not to mention the multiplier...
Reply by ●April 18, 20112011-04-18
It looks ok Thomas, haven't seen the ISA. The main reason I dropped the add= instruction (originally there was no minus), was that minus is more primit= ive, in that construction of minus from plus requires xor. In the context o= f threaded code compilation the MI instruction can be used just once. The main features are a 3 in 1 compression mode, so that 3 instructions may= be placed in 16 bits, for a high code density. No opcode is needed to pref= ix a subroutine jump. There are 5 registers and a borrow flag. Pre/post -/+= is applied to all indirect memory access. A hardware loading of RAM via an= SPI EEPROM at boot time is included, via a hardware SPI interface. A simpl= e interrupt method can be used. Code size is 48K * 16 bit when using 16 bit= generic word size and the 3 in 1 compression. Data size is up to 64K * 16 = bit, as addressable memory is 128KB using a 16 bit generic. Video DMA is in= cluded for a sub VGA resolution of 256*256 in 8 colours. A 16 bit delta sig= ma DAC is included. 2 * 8 bit ports (one in, one out) are included. With no= cache a 0.2 MIPS/MB processing from RAM is standard (including operand acc= ess). BSD license. For an further explanation of a preference for MInus, the Z80 explains best= with a DJNZ, explaining count down to borrow is an excellent looping mecha= nism. The saving of a few cells is necessary considering the size of the UF= M-SPI mega function, and the 1270 LE MAX II kit I am targeting. It's all ve= ry logical. After all is considered, the 16 bit memory model with auto +/- saves a lot = of code is a stack based design. Think of all those extra cycles adding or = subtracting 1 which are hidden in Nibz, and the poultry complexity of perfo= rming an add is tiny. The subroutine branch saving alone is major significa= nt, considering factorization into small subroutines is where code density = comes from. Cheers Jacko
Reply by ●April 19, 20112011-04-19
On Apr 18, 6:08=A0am, Thomas Entner <thomas.entne...@gmail.com> wrote:> On 18 Apr., 00:00, rickman <gnu...@gmail.com> wrote:>=A0I don't think the=re is ANY other soft CPU under> > several thousand LUTs that has a C compiler. > > Please do not forget my ERIC5: About 300 LUTs, about ATMEL AVR > performance, with C-compiler. > > Regards, > > Thomas > > www.entner-electronics.comI should have included "32 bit" processor. That was what they wanted. One 32 bit processor architecture, one instruction set and many possible speed ranges. That is not my goal. I prefer to provide more customized CPUs which are optimized for the application which almost always require more speed, at least in bursts. Rick
Reply by ●April 19, 20112011-04-19
Jacko,> Supporting C, that's good, but designed for C is more marketing speak. > Considering C was designed to work on processors, I'd expect a stack > frame link instruction similar to the 68k at least... with word stride > multiplication for pointer arithmetic... but fair dues, it's not too bad, but > suffers from hype.I don't agree that it is just marketing speak - the instructions were selected to encode C programs as compactly as possible while still having a tiny implementation. The CRISP (sold as the AT&T Hobbit) was a much better C processor, but an FPGA implementation of that would be several times larger than the ZPU. The VAX was also a really great target for C, but couldn't perform as well as RISCs (neither can the ZPU). -- Jecel





