FPGARelated.com
Forums

Re Zero operand CPUs

Started by Unknown March 20, 2009
In article <1478a4ab-62d3-4e9d-b1a3-769f847260a2@j8g2000yql.googlegroups.com>, 
Jacko <jackokring@gmail.com> wrote: 

> So if you had possiblly 4 instructions to do stack init pointers and > save both aswell, what would you use? >
------------ Previously Jacko wrote:
> You will find the subroutine RI FI RO SO BA does the same as > the forth word lit. > > == subroutine to do move #$222 to address $665 == > $lit , > $666 , <-- ? typo $665 <> $666 or is this for the auto-decr ? > $lit , > $222 , > SI FA FO BA
========= Does $lit , $666 , $lit , $222 mean: "push $666, push $222" ? Then SI FA FO BA == (S)->A ; (S)->Q ; A->(Q) ; (R)->P == ? If it's a stack-machine, then which register is the TOS-pointer ? Let's try to work backwards:- (R)->P should move #$222 to address $665?6 ==> P == address $665 [or $666] and R pointed to mem-containing #$222 So, how did $lit , $222 [ push $222] get $222 into mem pointed to by R ? ------------- Perhaps this is all obvious for someone with a VHDL design background, but this forth-group has just degenerated into clowing, with this thread. Can anybody contribute any knowledge/interpretation to how this nibz works ? ------------- someone wrote:
> > How many bits are in an opcode, 4 or 5? > > I would say it has to be five > > or there is no way for the machine to distinguish between > > an opcode and an address. In other words, there > > *has* to be a CALL instruction, > > even if it is just a one bit opcode with the rest being the address.
Jacko wrote:
> if(instructionRegister<16) doOpcode(instructionRegister) else > doSubroutine(instructionRegister);
IMO this corresponds to the wiki:
> For this example we will take the most complex to understand instruction BO. > Details > > BO ((S) AND A) mul 2 -> A > > This translates as BOth instruction: Load the indirect contents of > memory indexed by S and 'and' it with A. > Then shift this left one bit through the CarryRollBit and save this > result into A. > Also increase S by 1 to do the post increment indexing.
So, '16 basic instructions' [need 4 bits] and the BO instruction, would get the BasicInstr/Subroutine 1-bit-flag. This would imply a 5bit wide word, which is obviously not the case ? ------------- who can uderstand/explain this:--
>There are no conditional instructions, SU of the carry to (R) >is the major branching method.
...
> * SU (S)+A->A - SUm (CarryRollBit added too)
--- Since of the 16 instruction, the only ALU types: +, xor, and; use 'A' as one would expect an accumulator to be used [as a source & destination for the binary op.], 'A' *is* the accumulator !! Similarly 'S' is seen to be the TOS-pointer. --- wiki says:
> All opcodes above 15 are subroutine call addresses.
OK, so for an 8 bit wide word, you get 256-15 possible subroutines ? And the subroutines also use the basic 16 instructions, including possibly nested-subroutine/s ? No, because the first-level of subroutines has already been allocated the 256-16 subroutine-pointers ? Ok, you can only have 256-16 *different* subroutines, but they can be nested - limited only by RAM to hold a stacked-returns ? == Chris Glur.
On 20 Mar, 21:54, AliB...@gmail.com wrote:
> In article <1478a4ab-62d3-4e9d-b1a3-769f84726...@j8g2000yql.googlegroups.=
com>,
> > Jacko <jackokr...@gmail.com> wrote: > > So if you had possiblly 4 instructions to do stack init pointers and > > save both aswell, what would you use? > > ------------Previously Jacko wrote: > > You will find the subroutine RI FI RO SO BA does the same as > > =A0 =A0the forth word lit. > > > =A0 =A0=3D=3D subroutine to do move #$222 to address $665 =3D=3D > > =A0 =A0$lit , > > =A0 =A0$666 , =A0<-- ? typo =A0$665 <> $666 or is this for the auto-dec=
r ? Yes, It sure is.
> > =A0 =A0$lit , > > =A0 =A0$222 , > > =A0 =A0SI FA FO BA > > =3D=3D=3D=3D=3D=3D=3D=3D=3D > Does $lit , $666 , $lit , =A0$222 mean: > "push =A0 =A0$666, push =A0 =A0$222" ?
Yes when lit is defined as the soubroutine equivelent to lit.
> Then =A0SI FA FO BA =3D=3D > (S)->A ; (S)->Q ; A->(Q) ; (R)->P =3D=3D ?
I would think so!! TOS->A, TOS(2)->Q, STORE TOS -> TOS(2) ADDRESS, GET REURN TO PROGRAM COUNTER
> If it's a stack-machine, then which register > is the TOS-pointer ? =A0
S, but a top of stack optimization using A may be possible, for speed, but lower space efficiency.
> Let's try to work backwards:- > =A0(R)->P should move #$222 to address $665?6 > =A0=3D=3D> P =A0=3D=3D address $665 [or $666] > =A0 =A0 and > R pointed to mem-containing #$222 > > So, how did =A0 =A0$lit , =A0 =A0$222 [ push =A0 =A0$222] get > $222 into mem pointed to by R ? > -------------
RI FI SO RO BA where SO RO commutes to RO SO as a duplicate expression for same function. (R)->Q,(Q)->A,A->(S),Q->(R),(R)->P get return address and get indirect next address following return address, and save this on stack, and put incremented return address back on return stack and get return address (modified by +1) into program counter to execute a return to the address following the literal value.
> Perhaps this is all obvious for someone with a > VHDL design background, but this forth-group has > just degenerated into clowing, with this thread.
;-)
> Can anybody contribute any knowledge/interpretation > to how this nibz works ? > ------------- > > someone wrote: > > > =A0How =A0many =A0bits are in an opcode, 4 or 5? =A0 > > > =A0I would say it has to be five > > > or there is no way for the machine to distinguish between > > > =A0an opcode and =A0an =A0address. =A0 In =A0other =A0words, =A0there=
=A0
> > > =A0*has* =A0to be =A0a CALL =A0instruction, > > > even if it is just a one bit opcode with the rest being the address. > Jacko wrote: > > =A0 =A0if(instructionRegister<16) doOpcode(instructionRegister) else > > =A0 =A0 =A0 doSubroutine(instructionRegister); > > IMO this corresponds to the wiki: > > > For this example we will take the most complex to understand instructio=
n BO.
> > =A0 =A0Details > > > =A0 =A0BO ((S) AND A) mul 2 -> A > > > =A0 =A0This translates as BOth instruction: Load the indirect contents =
of
> > memory indexed by S and 'and' it with A. > > Then shift this left one bit through the =A0CarryRollBit and save this > > result into A. > > Also increase S by 1 to do the post increment indexing. > > So, '16 basic instructions' [need 4 bits] and the BO instruction, > would get the BasicInstr/Subroutine 1-bit-flag.
No, a basic instruction is a number under 16, any number over 15 is an address. This does have a disadvantage of not being able to call a subroutine below address 16 but this is not a major fault, as boot code would be here, and it is possible to place a subroutine call instruction within these addresses.
> This would imply a 5bit wide word, which is obviously not > the case ?
The implication of the extra bit 'needed' is not a true account of functioning.
> ------------- who can uderstand/explain this:-->There are no conditional =
instructions, SU of the carry to (R)
> >is the major branching method. > ... > > =A0 =A0 =A0 =A0* SU (S)+A->A - SUm (CarryRollBit added too)
(R)->Q,Q->(S),(S)->A,A+(S)->A,A->(S),(S)->Q,Q->(R),(R)->P
> --- > Since of the 16 instruction, the only ALU types: +, xor, and; > use 'A' as one would expect an accumulator to be used > [as a source & destination for the binary op.], 'A' *is* the > accumulator !! Similarly 'S' is seen to be the TOS-pointer. > --- wiki says:> All =A0opcodes above 15 are subroutine call addresses. > > OK, so for an 8 bit wide word, you get 256-15 possible subroutines ? > And the subroutines also use the basic 16 instructions, including > =A0possibly nested-subroutine/s ? > No, because the first-level of subroutines has already been allocated > =A0the 256-16 subroutine-pointers ? > Ok, you can only have 256-16 *different* subroutines, but they can be > =A0nested - limited only by RAM to hold a stacked-returns ? > > =3D=3D Chris Glur.
Hope this helps. Cheers jacko
On Mar 20, 7:39=A0pm, Jacko <jackokr...@gmail.com> wrote:
> > > This would imply a 5bit wide word, which is obviously not > > the case ? > > The implication of the extra bit 'needed' is not a true account of > functioning.
Ah, but it is. In your specific implementation, you have not only a fifth bit, but also a sixth, seventh all the way up to 16th, no? You have a 16 bit instruction word and only 17 opcodes; 0 through 15 are the ones you list, and 16 through 65535 is the LIT or CALL instruction (I'm not sure which).
On 21 Mar, 04:30, rickman <gnu...@gmail.com> wrote:
> On Mar 20, 7:39=A0pm, Jacko <jackokr...@gmail.com> wrote: > > > > > > This would imply a 5bit wide word, which is obviously not > > > the case ? > > > The implication of the extra bit 'needed' is not a true account of > > functioning.
That would be like saying you need the extra )s on the front of numbers when you do arithmetic on paper.
> Ah, but it is. =A0In your specific implementation, you have not only a > fifth bit, but also a sixth, seventh all the way up to 16th, no? =A0You > have a 16 bit instruction word and only 17 opcodes; 0 through 15 are > the ones you list, and 16 through 65535 is the LIT or CALL instruction > (I'm not sure which).
(Depends on the subroutine start address) all subroutines are calls, so they are all calls, just one is LIT. Yes, you will find primitives use codes 0-15 and colon definitions use 0-65535. If you are crazy enough to have a massive primitive set, or to implement such a set in full width memory, then you would be right. On the 12 bit version you could use 16 bit memory, and have the high 4 as the primitive part of the address space. As stated on the website (somewhare) this processor is not designed for running monolith inlined code, and pay in space and cache slowdown such things will, say yoda. So in the example I gave for the store, it's likely the last line of simple instructions would be a subroutine named store or +1! You will find a large amount of primitive code can be optimized into a small logic area, especially if the address space over which these subroutines is spread is sparse to allow combinational alignment of product terms and boolean logic reduction. To just generalize this code as something to slot into the threading is missing the point that this is an ocassional feature, not a best practice. cheers jacko "speak unto my mobile I will, sometime it may be a programming tool."
On Mar 21, 2:38=A0am, Jacko <jackokr...@gmail.com> wrote:
> On 21 Mar, 04:30, rickman <gnu...@gmail.com> wrote: > > > On Mar 20, 7:39=A0pm, Jacko <jackokr...@gmail.com> wrote: > > > > > This would imply a 5bit wide word, which is obviously not > > > > the case ? > > > > The implication of the extra bit 'needed' is not a true account of > > > functioning. > > That would be like saying you need the extra )s on the front of > numbers when you do arithmetic on paper.
Extra what??? Actually, you are quoting your own comment.
> > Ah, but it is. =A0In your specific implementation, you have not only a > > fifth bit, but also a sixth, seventh all the way up to 16th, no? =A0You > > have a 16 bit instruction word and only 17 opcodes; 0 through 15 are > > the ones you list, and 16 through 65535 is the LIT or CALL instruction > > (I'm not sure which). > > (Depends on the subroutine start address) all subroutines are calls, > so they are all calls, just one is LIT.
I have no idea what you are talking about. How does this instruction set specify literals? Your obfuscation is getting to be annoying. You never explain what you mean, you speak in crypto language and you seem intent on never really explaining the principles of your design. Even your assembly language is some new symbolism that just serves to isolate what you are doing and thinking rather than to be at all useful for communication. None of the rest of this is at all useful. You are presuming that I am making some sort of statement or that I am looking at your processor from a very different point of view. I am doing neither. I am trying to understand your processor from the point of view of a small, embeddable CPU for use in an FPGA and in particular, to be programmable in Forth. That is the target of my CPU. I am hoping to learn something about these processors that I don't know or that I haven't thought to try. What I am learning about this design is that it seems to have been designed without regard to a lot of knowledge available, not that I will ever know for sure because it will never really be explained. Have you read Koopman's book on stack CPUs? He covers a lot of ground with that.
> Yes, you will find primitives use codes 0-15 and colon definitions use > 0-65535. If you are crazy enough to have a massive primitive set, or > to implement such a set in full width memory, then you would be right. > On the 12 bit version you could use 16 bit memory, and have the high 4 > as the primitive part of the address space. > > As stated on the website (somewhare) this processor is not designed > for running monolith inlined code, and pay in space and cache slowdown > such things will, say yoda. > > So in the example I gave for the store, it's likely the last line of > simple instructions would be a subroutine named store or +1! > > You will find a large amount of primitive code can be optimized into a > small logic area, especially if the address space over which these > subroutines is spread is sparse to allow combinational alignment of > product terms and boolean logic reduction. > > To just generalize this code as something to slot into the threading > is missing the point that this is an ocassional feature, not a best > practice.
Rick
On 21 Mar, 14:13, rickman <gnu...@gmail.com> wrote:
> On Mar 21, 2:38=A0am, Jacko <jackokr...@gmail.com> wrote: > > > On 21 Mar, 04:30, rickman <gnu...@gmail.com> wrote: > > > > On Mar 20, 7:39=A0pm, Jacko <jackokr...@gmail.com> wrote: > > > > > > This would imply a 5bit wide word, which is obviously not > > > > > the case ? > > > > > The implication of the extra bit 'needed' is not a true account of > > > > functioning. > > > That would be like saying you need the extra )s on the front of > > numbers when you do arithmetic on paper. > > Extra what??? =A0Actually, you are quoting your own comment.
Zero's in the Most Significant Digits. (shift ')')
> > > Ah, but it is. =A0In your specific implementation, you have not only =
a
> > > fifth bit, but also a sixth, seventh all the way up to 16th, no? =A0Y=
ou
> > > have a 16 bit instruction word and only 17 opcodes; 0 through 15 are > > > the ones you list, and 16 through 65535 is the LIT or CALL instructio=
n
> > > (I'm not sure which). > > > (Depends on the subroutine start address) all subroutines are calls, > > so they are all calls, just one is LIT. > > I have no idea what you are talking about. =A0How does this instruction > set specify literals?
The instuction set does not specify literals, BUT it does specify enough variety of instuction to write a subroutine to load a literal. On calling this subroutine, the return address is on the R stack, and is loaded, used as an index to memory to fetch the literal with post increment. It is then save back to R stack to allow contiuation of the instruction stream on exit from the subroutine. The fetched literal in the accumulator is stacked onto the stack before return. : LIT RI ( get return address ) FI ( get literal at return address post increase also ) RO ( save new return address =3D addr+1 ) SO ( stack the fetched literal ) BA ( exit from subrotine ) ; ( well a code def may be better ) : LITERAL ['] LIT , , ;
> Your obfuscation is getting to be annoying. =A0You never explain what > you mean, you speak in crypto language and you seem intent on never > really explaining the principles of your design. =A0Even your assembly > language is some new symbolism that just serves to isolate what you > are doing and thinking rather than to be at all useful for > communication.
I am open to anyone developing a 'longwind' instruction mnemonic set, there is no ONE CORRECT way to symbolize function.
> None of the rest of this is at all useful. =A0You are presuming that I > am making some sort of statement or that I am looking at your > processor from a very different point of view. =A0I am doing neither. =A0=
I
> am trying to understand your processor from the point of view of a > small, embeddable CPU for use in an FPGA and in particular, to be > programmable in Forth. =A0That is the target of my CPU. =A0I am hoping to > learn something about these processors that I don't know or that I > haven't thought to try. =A0What I am learning about this design is that > it seems to have been designed without regard to a lot of knowledge > available, not that I will ever know for sure because it will never > really be explained.
Without regard? So what would be the point of exploring the avenue of regard to convention? I understand it gets followed verbatum thoundsands of times on university degree programs, and it hasn't produced many major improvements in design.
> Have you read Koopman's book on stack CPUs? =A0He covers a lot of ground > with that.
I think I did download it once, and have a long base of reading stretching from '82. So you would be suggesting "literal common, need literal fast", where as I would suggest "literal in opcode create bulk" not in the instuction decode execute sense, but in the instruction representation sense. Literals by there nature are not nibble constrained, where as code/end-code definitions can be. Also say I have the subroutine for LIT at address $0101, there is nothing inherant in the design to prevent me adding a IR=3D$0101 comparator with (P)->S routing in a single fetch execute cycle. Such a design is still code compatable with nibz. Let's call this a hard wired subroutine option for purpose technique. It is not a default as it will make your code definitions bigger (not nibble wide without significant extra logic). Usual comment number 1: "I can't write primitives without literals!" -
> "Such a thing is not primitive."
Usual comment Number 2: "But nibbles waste memory in cells!" -> "The zero cell area can be factored to leave an area to place nibz in, or some other nibble oriented compaction protocol can be used." Usual comment number 3: "Why don't you do everything at once!" -> "Because extra arms occupy more volume, and require more motor cortex, hence more volume, hence slower reaction time, hence no such thing is possible." cheers jacko
On Mar 21, 1:43 pm, Jacko <jackokr...@gmail.com> wrote:
> On 21 Mar, 14:13, rickman <gnu...@gmail.com> wrote: > > > On Mar 21, 2:38 am, Jacko <jackokr...@gmail.com> wrote: > > > > That would be like saying you need the extra )s on the front of > > > numbers when you do arithmetic on paper. > > > Extra what??? Actually, you are quoting your own comment. > > Zero's in the Most Significant Digits. (shift ')')
This why people have no idea what you are talking about. How does ')' imply a zero??? Back to the point, the issue is not the notation that the opcodes from 0 to 15 have to have zeros in front, the issue is that your opcode is N bits wide, not 4! The point is that there are only 17 opcodes in your machine and each one uses a full word for storage. The logic in this CPU may be small, but the program storage is nothing like compact.
> > > (Depends on the subroutine start address) all subroutines are calls, > > > so they are all calls, just one is LIT. > > > I have no idea what you are talking about. How does this instruction > > set specify literals? > > The instuction set does not specify literals, BUT it does specify > enough variety of instuction to write a subroutine to load a literal. > On calling this subroutine, the return address is on the R stack, and > is loaded, used as an index to memory to fetch the literal with post > increment. It is then save back to R stack to allow contiuation of the > instruction stream on exit from the subroutine. The fetched literal in > the accumulator is stacked onto the stack before return. > > : LIT RI ( get return address ) FI ( get literal at return address > post increase also ) RO ( save new return address = addr+1 ) SO > ( stack the fetched literal ) BA ( exit from subrotine ) ; ( well a > code def may be better ) > > : LITERAL ['] LIT , , ;
I would say this is also very, very inefficient. Try reading Koopman's book on stack computers. His data indicates that LIT is the second most frequent Forth operation (on average) in terms of occurrence in the code and sixth most frequent in terms of execution. Having to embed a literal in the code and then call a subroutine to get it put on the stack may be novel and interesting, but very inefficient. Funny, often the goal of literal instructions is to optimize small magnitude literals since they are more frequent than larger magnitudes. This is especially true for relative addressing due to the locality in time and space for memory accesses. Your scheme of using those low magnitude literals for opcodes shoots that in the foot.
> > Your obfuscation is getting to be annoying. You never explain what > > you mean, you speak in crypto language and you seem intent on never > > really explaining the principles of your design. Even your assembly > > language is some new symbolism that just serves to isolate what you > > are doing and thinking rather than to be at all useful for > > communication. > > I am open to anyone developing a 'longwind' instruction mnemonic set, > there is no ONE CORRECT way to symbolize function.
No, there is no one way to do something correctly. But there are an unbounded number of ways to do it poorly. The question is are you documenting it so others will understand it or just for your own amusement? If the former I think the response you are getting is telling you it isn't working. If the later, only you can judge.
> > None of the rest of this is at all useful. You are presuming that I > > am making some sort of statement or that I am looking at your > > processor from a very different point of view. I am doing neither. I > > am trying to understand your processor from the point of view of a > > small, embeddable CPU for use in an FPGA and in particular, to be > > programmable in Forth. That is the target of my CPU. I am hoping to > > learn something about these processors that I don't know or that I > > haven't thought to try. What I am learning about this design is that > > it seems to have been designed without regard to a lot of knowledge > > available, not that I will ever know for sure because it will never > > really be explained. > > Without regard? So what would be the point of exploring the avenue of > regard to convention? I understand it gets followed verbatum > thoundsands of times on university degree programs, and it hasn't > produced many major improvements in design.
I don't think you understand what I am saying. I am not suggesting that you need to repeat the designs of others. I am suggesting that you might learn something from their failures (or just suboptimal successes). I have found a number of design decisions that make you machine inherently limited and inefficient. If you care about that, you will read a few references to learn what others have found before you. Then you can improve on their work rather than to go off blindly and make all your own mistakes. Of course this is assuming that making a useful design is your goal. I don't know this is your goal. You may well be doing this to amuse yourself only. The lack of any real communication regarding your design tends to indicate the latter.
> > Have you read Koopman's book on stack CPUs? He covers a lot of ground > > with that. > > I think I did download it once, and have a long base of reading > stretching from '82. > > So you would be suggesting "literal common, need literal fast", where > as I would suggest "literal in opcode create bulk" not in the > instuction decode execute sense, but in the instruction representation > sense. Literals by there nature are not nibble constrained, where as > code/end-code definitions can be.
I have no idea what you mean by any of this. What does "nibble constrained" mean? As to the trade off between bulk and "speed", how large is your code if you have to use some, what, four, five, six words to insert a literal in your code -1, for example, versus a command that inserts a literal using perhaps two words? Koopman's book says Literals are some 10% of the occurence of Forth words. That would mean your usage of Literal requires the code to be some 20 to 30% larger! That is what I mean...
> Also say I have the subroutine for LIT at address $0101, there is > nothing inherant in the design to prevent me adding a IR=$0101 > comparator with (P)->S routing in a single fetch execute cycle. Such a > design is still code compatable with nibz. Let's call this a hard > wired subroutine option for purpose technique. It is not a default as > it will make your code definitions bigger (not nibble wide without > significant extra logic).
Ok, this sounds a bit like the ZPU. They have reserved a number of opcodes for "emulate" instructions which can be a subroutine or done with logic. Of course that will work. But it means you address space gets chopped up which is something to be avoided in a CPU addressing limited internal FPGA memory. It also is still not as efficient as other schemes using two words when often only one will do, or better less than one. Is it efficient to use 32 bits to insert a -1 in your code?
> Usual comment number 1: "I can't write primitives without literals!" - > > > "Such a thing is not primitive." > > Usual comment Number 2: "But nibbles waste memory in cells!" -> "The > zero cell area can be factored to leave an area to place nibz in, or > some other nibble oriented compaction protocol can be used."
Again, I don't know what you are talking about.
> Usual comment number 3: "Why don't you do everything at once!" -> > "Because extra arms occupy more volume, and require more motor cortex, > hence more volume, hence slower reaction time, hence no such thing is > possible."
This on the other hand is perfectly clear... not! Do you really care if you make anyone understand what you are talking about? I'm going to start calling your "Cryptoman"! Your one weakness is "Cryptonite", a substance that makes you communicate with perfect lucidity and brings about your ultimate destruction! Rick
On 21 Mar, 22:28, rickman <gnu...@gmail.com> wrote:
> On Mar 21, 1:43 pm, Jacko <jackokr...@gmail.com> wrote: > > > On 21 Mar, 14:13, rickman <gnu...@gmail.com> wrote: > > > > On Mar 21, 2:38 am, Jacko <jackokr...@gmail.com> wrote: > > > > > That would be like saying you need the extra )s on the front of > > > > numbers when you do arithmetic on paper. > > > > Extra what??? =A0Actually, you are quoting your own comment. > > > Zero's in the Most Significant Digits. (shift ')') > > This why people have no idea what you are talking about. =A0How does ')' > imply a zero???
It doesn't imply a zero but infering a zero as being on the same key is well ...
> Back to the point, the issue is not the notation that the opcodes from > 0 to 15 have to have zeros in front, the issue is that your opcode is > N bits wide, not 4! =A0The point is that there are only 17 opcodes in > your machine and each one uses a full word for storage. =A0The logic in > this CPU may be small, but the program storage is nothing like > compact.
The subroutine threading is very compact, although not token threading level compact. I would assume an FPGA needing this compaction, would not use token threading, but would use an indirection table for all subroutine addresses and literal constants. To imply the advanced branch as one opcode misses the point entirely. One semantic yes but definatly different codes. As you point out if you choose not to optimize the section of memory containing the primitive 16 opcode subroutines then yes you will have some space occupied by zeros. In a typical small forth system, the primitive code requirements are small, and so I think your dislike is mis-respresentative of the compact size a system may be programmed in.
> > > > (Depends on the subroutine start address) all subroutines are calls=
,
> > > > so they are all calls, just one is LIT. > > > > I have no idea what you are talking about. =A0How does this instructi=
on
> > > set specify literals? > > > The instuction set does not specify literals, BUT it does specify > > enough variety of instuction to write a subroutine to load a literal. > > On calling this subroutine, the return address is on the R stack, and > > is loaded, used as an index to memory to fetch the literal with post > > increment. It is then save back to R stack to allow contiuation of the > > instruction stream on exit from the subroutine. The fetched literal in > > the accumulator is stacked onto the stack before return. > > > : LIT RI ( get return address ) FI ( get literal at return address > > post increase also ) RO ( save new return address =3D addr+1 ) SO > > ( stack the fetched literal ) BA ( exit from subrotine ) ; ( well a > > code def may be better ) > > > : LITERAL ['] LIT , , ; > > I would say this is also very, very inefficient. =A0Try reading > Koopman's book on stack computers. =A0His data indicates that LIT is the > second most frequent Forth operation (on average) in terms of > occurrence in the code and sixth most frequent in terms of execution. > Having to embed a literal in the code and then call a subroutine to > get it put on the stack may be novel and interesting, but very > inefficient.
And having a (P)->(S) double memory access opcode as a primitive opcode within the basic suported set, leads to quite a miss-match to a single memory access per instruction architecture. A descision was made at design time to limit meory accesses per instruction for the simple reason of size of design, and scalabiltiy of multiple dispatch algorithms. I do not consider load literal to TOS in anyway primitive in this sense. Hardware efficiency for a particular task, can be implemented by subroutine hardwiring.
> Funny, often the goal of literal instructions is to optimize small > magnitude literals since they are more frequent than larger > magnitudes. =A0This is especially true for relative addressing due to > the locality in time and space for memory accesses. =A0Your scheme of > using those low magnitude literals for opcodes shoots that in the > foot.
Relative addressing? This definitly does not scale as well as stack or direct addressing. Like I said, the instruction set is designed for scaling option. Forcing certain instructions into hardware as must haves, destroys scalability options.
> > > Your obfuscation is getting to be annoying. =A0You never explain what > > > you mean, you speak in crypto language and you seem intent on never > > > really explaining the principles of your design. =A0Even your assembl=
y
> > > language is some new symbolism that just serves to isolate what you > > > are doing and thinking rather than to be at all useful for > > > communication. > > > I am open to anyone developing a 'longwind' instruction mnemonic set, > > there is no ONE CORRECT way to symbolize function. > > No, there is no one way to do something correctly. =A0But there are an > unbounded number of ways to do it poorly. =A0The question is are you > documenting it so others will understand it or just for your own > amusement? =A0If the former I think the response you are getting is > telling you it isn't working. =A0If the later, only you can judge.
Well maybe the sylabic mnemonic set was for my own ammusement, or though ideas, and as such I did it that way. If people feel/require/ need it to be another way, then they are able to do it that way themselves. I am not an opcode translation service. Certain things get done free. If you want a free addition to the project, by all means make a request. If you want it doing right, right now, by some 'my way' standard, do it yourself.
> > > None of the rest of this is at all useful. =A0You are presuming that =
I
> > > am making some sort of statement or that I am looking at your > > > processor from a very different point
of view. =A0I am doing neither. =A0I
> > > am trying to understand your processor from the point of view of a > > > small, embeddable CPU for use in an FPGA and in particular, to be > > > programmable in Forth. =A0That is the target of my CPU. =A0I am hopin=
g to
> > > learn something about these processors that I don't know or that I > > > haven't thought to try. =A0What I am learning about this design is th=
at
> > > it seems to have been designed without regard to a lot of knowledge > > > available, not that I will ever know for sure because it will never > > > really be explained. > > > Without regard? So what would be the point of exploring the avenue of > > regard to convention? I understand it gets followed verbatum > > thoundsands of times on university degree programs, and it hasn't > > produced many major improvements in design. > > I don't think you understand what I am saying. =A0I am not suggesting > that you need to repeat the designs of others. =A0I am suggesting that > you might learn something from their failures (or just suboptimal > successes). =A0I have found a number of design decisions that make you > machine inherently limited and inefficient. =A0If you care about that, > you will read a few references to learn what others have found before > you. =A0Then you can improve on their work rather than to go off blindly > and make all your own mistakes.
And a new way of thinking of literals, and restrictions on such, and the results when applied to scaled multi-processor/super-scalar have been tried by who? Just because all mainstream designs have literal instructions for 'performance reasons' does not in any way imply 'performance' has been a closed research field.
> Of course this is assuming that making a useful design is your goal. > I don't know this is your goal. =A0You may well be doing this to amuse > yourself only. =A0The lack of any real communication regarding your > design tends to indicate the latter.
Compared to many 8 bit designs this is both useful and effective.
> > > Have you read Koopman's book on stack CPUs? =A0He covers a lot of gro=
und
> > > with that. > > > I think I did download it once, and have a long base of reading > > stretching from '82. > > > So you would be suggesting "literal common, need literal fast", where > > as I would suggest "literal in opcode create bulk" not in the > > instuction decode execute sense, but in the instruction representation > > sense. Literals by there nature are not nibble constrained, where as > > code/end-code definitions can be. > > I have no idea what you mean by any of this. =A0What does "nibble > constrained" mean? =A0As to the trade off between bulk and "speed", how > large is your code if you have to use some, what, four, five, six > words to insert a literal in your code -1, for example, versus a > command that inserts a literal using perhaps two words? =A0Koopman's > book says Literals are some 10% of the occurence of Forth words. =A0That > would mean your usage of Literal requires the code to be some 20 to > 30% larger!
Subroutine only written once. 2 cells per literal. -! obviously can be 1 cell if it also becomes threaded. in fact any literal used more than 3 times can be more effectively shrunk to 1 cell per literal. (a subroutine) constrained =3D> made to fit within limits or bounds. (nibble constrained =3D 0 to 15).
> That is what I mean... > > > Also say I have the subroutine for LIT at address $0101, there is > > nothing inherant in the design to prevent me adding a IR=3D$0101 > > comparator with (P)->S routing in a single fetch execute cycle. Such a > > design is still code compatable with nibz. Let's call this a hard > > wired subroutine option for purpose technique. It is not a default as > > it will make your code definitions bigger (not nibble wide without > > significant extra logic). > > Ok, this sounds a bit like the ZPU. =A0They have reserved a number of > opcodes for "emulate" instructions which can be a subroutine or done > with logic. =A0Of course that will work. =A0But it means you address spac=
e
> gets chopped up which is something to be avoided in a CPU addressing > limited internal FPGA memory. =A0It also is still not as efficient as > other schemes using two words when often only one will do, or better > less than one. =A0Is it efficient to use 32 bits to insert a -1 in your > code?
ZPU emulate some essential instructions in hardware, yes kind of like, but all essential instructions are in hardware. Extra instructions have no fixed opcode, they have virtual subroutine addresses. So rather than soft instructions, why not have hard subroutines? From a microcode point of view, such things are closely related.
> > Usual comment number 1: "I can't write primitives without literals!" - > > > > "Such a thing is not primitive." > > > Usual comment Number 2: "But nibbles waste memory in cells!" -> "The > > zero cell area can be factored to leave an area to place nibz in, or > > some other nibble oriented compaction protocol can be used." > > Again, I don't know what you are talking about. > > > Usual comment number 3: "Why don't you do everything at once!" -> > > "Because extra arms occupy more volume, and require more motor cortex, > > hence more volume, hence slower reaction time, hence no such thing is > > possible." > > This on the other hand is perfectly clear... =A0not! > > Do you really care if you make anyone understand what you are talking > about?
making people understand? If people understand the y do, if people do not understand the may eventaually, if the is such an important need to indoctrinate people, making may be an unsavoury procedure. Really care? As opposed to virtually care? As in expressing a duty of care? Please elaborate using non circular arguments ...
> I'm going to start calling your "Cryptoman"! =A0Your one weakness is > "Cryptonite", a substance that makes you communicate with perfect > lucidity and brings about your ultimate destruction!
This would imply I am self destructive, by me not communicating my need to have the Cryptonite removed, as I would see my destruction, as I would see everything of my instamachinations for the concept of constrained infinite lucidity to be elaborated within me and flush out my band limited mouth to provide enlightenment to everyone within ear shot. Your statement is inconsistant bleep bleep, BA stack underflow ......
Jacko wrote:
> On 21 Mar, 22:28, rickman <gnu...@gmail.com> wrote:
...
>> Do you really care if you make anyone understand what you are talking >> about? > > making people understand? If people understand the y do, if people do > not understand the may eventaually, if the is such an important need > to indoctrinate people, making may be an unsavoury procedure. > > Really care? As opposed to virtually care? As in expressing a duty of > care? Please elaborate using non circular arguments ...
In other words, no. Advice to Rick: give up. Cheers, Elizabeth -- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310.999.6784 5959 West Century Blvd. Suite 700 Los Angeles, CA 90045 http://www.forth.com "Forth-based products and Services for real-time applications since 1973." ==================================================
On Mar 22, 11:13 am, Jacko <jackokr...@gmail.com> wrote:
> On 21 Mar, 22:28, rickman <gnu...@gmail.com> wrote: > > > On Mar 21, 1:43 pm, Jacko <jackokr...@gmail.com> wrote: > > > > On 21 Mar, 14:13, rickman <gnu...@gmail.com> wrote: > > > > > On Mar 21, 2:38 am, Jacko <jackokr...@gmail.com> wrote: > > > > > > That would be like saying you need the extra )s on the front of > > > > > numbers when you do arithmetic on paper. > > > > > Extra what??? Actually, you are quoting your own comment. > > > > Zero's in the Most Significant Digits. (shift ')') > > > This why people have no idea what you are talking about. How does ')' > > imply a zero??? > > It doesn't imply a zero but infering a zero as being on the same key > is well ...
This is the sort of obfuscation that you seem to revel in. Why infer or imply a zero when a zero could have been typed???
> > Back to the point, the issue is not the notation that the opcodes from > > 0 to 15 have to have zeros in front, the issue is that your opcode is > > N bits wide, not 4! The point is that there are only 17 opcodes in > > your machine and each one uses a full word for storage. The logic in > > this CPU may be small, but the program storage is nothing like > > compact. > > The subroutine threading is very compact, although not token threading > level compact. I would assume an FPGA needing this compaction, would > not use token threading, but would use an indirection table for all > subroutine addresses and literal constants.
Yes, subroutine threading is compact. But your use of an word wide opcode for *every* instruction is not compact.
> To imply the advanced branch as one opcode misses the point entirely. > One semantic yes but definatly different codes. > > As you point out if you choose not to optimize the section of memory > containing the primitive 16 opcode subroutines then yes you will have > some space occupied by zeros. In a typical small forth system, the > primitive code requirements are small, and so I think your dislike is > mis-respresentative of the compact size a system may be programmed in.
If you think I am concerned with 16 words of memory lost to the opcodes, then you are confused about what I have said. There are two ways your instruction set is less than optimal. The encoding uses a full word for every instruction. Many MISC machines use opcodes of five bits. My machine uses opcodes of 8 or 9 bits depending on the implementation. Using 16 or 32 bits is very wasteful for the code using primitives. Even in higher level code a significant percentage of the codes is still primitives. The other inefficiency is the poor integration of literals into the instruction set. Needing to call a subroutine to load a literal is not an efficient use of memory or processor speed. Adding an optimization for direct implementation of the literal subroutine is still not an efficient use of memory, requiring two words for each literal. My main point is that you seem to be making your design decisions without the benefit of the work that has gone on before you. I am sure your design has advantages, although I doubt anyone here will ever know because of your poor attempts to communicate.
> > > : LIT RI ( get return address ) FI ( get literal at return address > > > post increase also ) RO ( save new return address = addr+1 ) SO > > > ( stack the fetched literal ) BA ( exit from subrotine ) ; ( well a > > > code def may be better ) > > > > : LITERAL ['] LIT , , ; > > > I would say this is also very, very inefficient. Try reading > > Koopman's book on stack computers. His data indicates that LIT is the > > second most frequent Forth operation (on average) in terms of > > occurrence in the code and sixth most frequent in terms of execution. > > Having to embed a literal in the code and then call a subroutine to > > get it put on the stack may be novel and interesting, but very > > inefficient. > > And having a (P)->(S) double memory access opcode as a primitive > opcode within the basic suported set, leads to quite a miss-match to a > single memory access per instruction architecture. A descision was > made at design time to limit meory accesses per instruction for the > simple reason of size of design, and scalabiltiy of multiple dispatch > algorithms. I do not consider load literal to TOS in anyway primitive > in this sense. Hardware efficiency for a particular task, can be > implemented by subroutine hardwiring.
I can't say anything about what is efficient in your machine. I do know that loading a literal is frequent in most CPU architectures and needs to be optimized over many other things. If you are designing a CPU for a large, complex CPU, then it will not be close to optimal for small machines. Is your stack in memory and not hardware? Since no one but yourself understands your instruction set, I can't tell what is happening with your code.
> > Funny, often the goal of literal instructions is to optimize small > > magnitude literals since they are more frequent than larger > > magnitudes. This is especially true for relative addressing due to > > the locality in time and space for memory accesses. Your scheme of > > using those low magnitude literals for opcodes shoots that in the > > foot. > > Relative addressing? This definitly does not scale as well as stack or > direct addressing. Like I said, the instruction set is designed for > scaling option. Forcing certain instructions into hardware as must > haves, destroys scalability options.
Why would relative addressing not scale? It is just a simple index off the PC. But since you have indicated that your goal is to have a scalable instruction set, I can understand why this machine will not be at all optimal for FPGA use where program memory is tight.
> > > I am open to anyone developing a 'longwind' instruction mnemonic set, > > > there is no ONE CORRECT way to symbolize function. > > > No, there is no one way to do something correctly. But there are an > > unbounded number of ways to do it poorly. The question is are you > > documenting it so others will understand it or just for your own > > amusement? If the former I think the response you are getting is > > telling you it isn't working. If the later, only you can judge. > > Well maybe the sylabic mnemonic set was for my own ammusement, or > though ideas, and as such I did it that way. If people feel/require/ > need it to be another way, then they are able to do it that way > themselves. I am not an opcode translation service. Certain things get > done free. If you want a free addition to the project, by all means > make a request. If you want it doing right, right now, by some 'my > way' standard, do it yourself.
No one gives a rat's rear what you *call* your instructions. No one understands what they do because you have not *documented* the instructions in a coherent way. I have no real interest in your project since I don't see any value in it. You seem to be trying to communicate to others here about your ideas and designs, but are failing to do so. That is the reason for my statements. I don't really care that much about understanding your project. I'm just pointing out that you seem to be failing in your goal.
> > > > None of the rest of this is at all useful. You are presuming that I > > > > am making some sort of statement or that I am looking at your > > > > processor from a very different point > > of view. I am doing neither. I > > > > > > > am trying to understand your processor from the point of view of a > > > > small, embeddable CPU for use in an FPGA and in particular, to be > > > > programmable in Forth. That is the target of my CPU. I am hoping to > > > > learn something about these processors that I don't know or that I > > > > haven't thought to try. What I am learning about this design is that > > > > it seems to have been designed without regard to a lot of knowledge > > > > available, not that I will ever know for sure because it will never > > > > really be explained. > > > > Without regard? So what would be the point of exploring the avenue of > > > regard to convention? I understand it gets followed verbatum > > > thoundsands of times on university degree programs, and it hasn't > > > produced many major improvements in design. > > > I don't think you understand what I am saying. I am not suggesting > > that you need to repeat the designs of others. I am suggesting that > > you might learn something from their failures (or just suboptimal > > successes). I have found a number of design decisions that make you > > machine inherently limited and inefficient. If you care about that, > > you will read a few references to learn what others have found before > > you. Then you can improve on their work rather than to go off blindly > > and make all your own mistakes. > > And a new way of thinking of literals, and restrictions on such, and > the results when applied to scaled multi-processor/super-scalar have > been tried by who? Just because all mainstream designs have literal > instructions for 'performance reasons' does not in any way imply > 'performance' has been a closed research field. > > > Of course this is assuming that making a useful design is your goal. > > I don't know this is your goal. You may well be doing this to amuse > > yourself only. The lack of any real communication regarding your > > design tends to indicate the latter. > > Compared to many 8 bit designs this is both useful and effective. > > > > > > > Have you read Koopman's book on stack CPUs? He covers a lot of ground > > > > with that. > > > > I think I did download it once, and have a long base of reading > > > stretching from '82. > > > > So you would be suggesting "literal common, need literal fast", where > > > as I would suggest "literal in opcode create bulk" not in the > > > instuction decode execute sense, but in the instruction representation > > > sense. Literals by there nature are not nibble constrained, where as > > > code/end-code definitions can be. > > > I have no idea what you mean by any of this. What does "nibble > > constrained" mean? As to the trade off between bulk and "speed", how > > large is your code if you have to use some, what, four, five, six > > words to insert a literal in your code -1, for example, versus a > > command that inserts a literal using perhaps two words? Koopman's > > book says Literals are some 10% of the occurence of Forth words. That > > would mean your usage of Literal requires the code to be some 20 to > > 30% larger! > > Subroutine only written once. 2 cells per literal. -! obviously can be > 1 cell if it also becomes threaded. in fact any literal used more than > 3 times can be more effectively shrunk to 1 cell per literal. (a > subroutine)
Since the literal instruction is used so often, and most literals are small values, the literal instruction can be smaller than a word on the average. In some MISC machines the literal instruction uses whatever is left of the current word. In my machine the literal (also used to specify addresses for calls and jumps) is the most optimized instruction with one bit plus the data field in the remaining bits. In the 9 bit instruction format, +127 -128 range takes a single byte and a 16 bit literal only takes two 9 bit bytes. Jumps and calls are further optimized by including a 5 bit field for the lsbs of the address calculation. There is nothing wrong with the way you are doing things. I thought you were optimizing for a small design for FPGA use. But I see now you have other priorities.
> constrained => made to fit within limits or bounds. (nibble > constrained = 0 to 15). > > > That is what I mean... > > > > Also say I have the subroutine for LIT at address $0101, there is > > > nothing inherant in the design to prevent me adding a IR=$0101 > > > comparator with (P)->S routing in a single fetch execute cycle. Such a > > > design is still code compatable with nibz. Let's call this a hard > > > wired subroutine option for purpose technique. It is not a default as > > > it will make your code definitions bigger (not nibble wide without > > > significant extra logic). > > > Ok, this sounds a bit like the ZPU. They have reserved a number of > > opcodes for "emulate" instructions which can be a subroutine or done > > with logic. Of course that will work. But it means you address space > > gets chopped up which is something to be avoided in a CPU addressing > > limited internal FPGA memory. It also is still not as efficient as > > other schemes using two > > > ZPU emulate some essential instructions in hardware, yes kind of like, > but all essential instructions are in hardware. Extra instructions > have no fixed opcode, they have virtual subroutine addresses. So > rather than soft instructions, why not have hard subroutines? From a > microcode point of view, such things are closely related.
Yes, I see your point (for once). I wonder how useful this really is compared to more conventional
> > > Usual comment number 1: "I can't write primitives without literals!" - > > > > > "Such a thing is not primitive." > > > > Usual comment Number 2: "But nibbles waste memory in cells!" -> "The > > > zero cell area can be factored to leave an area to place nibz in, or > > > some other nibble oriented compaction protocol can be used." > > > Again, I don't know what you are talking about. > > > > Usual comment number 3: "Why don't you do everything at once!" -> > > > "Because extra arms occupy more volume, and require more motor cortex, > > > hence more volume, hence slower reaction time, hence no such thing is > > > possible." > > > This on the other hand is perfectly clear... not! > > > Do you really care if you make anyone understand what you are talking > > about? > > making people understand? If people understand the y do, if people do > not understand the may eventaually, if the is such an important need > to indoctrinate people, making may be an unsavoury procedure. > > Really care? As opposed to virtually care? As in expressing a duty of > care? Please elaborate using non circular arguments ...
Yeah, I guess I just can't write clearly...
> > I'm going to start calling your "Cryptoman"! Your one weakness is > > "Cryptonite", a substance that makes you communicate with perfect > > lucidity and brings about your ultimate destruction! > > This would imply I am self destructive, by me not communicating my > need to have the Cryptonite removed, as I would see my destruction, as > I would see everything of my instamachinations for the concept of > constrained infinite lucidity to be elaborated within me and flush out > my band limited mouth to provide enlightenment to everyone within ear > shot. Your statement is inconsistant bleep bleep, BA stack > underflow ......
Exactly!!! It worked just as I planned... BUWWWWWHHAAAHHAAA!!! Rick