FPGARelated.com
Forums

Nios II Going Live...

Started by Kenneth Land May 19, 2004
Goran Bilski wrote:
> It seems that Altera has created a MicroBlaze as well. > They have finally realized that a FPGA based soft processor should have > - 32 bit ISA > - 32 registers > - 3 operand instruction format > - JTAG based HW debugging > - HW divider > > The weird register window mechanism from NIOS (is it called NIOS1 now?) > didn't work well in embedded processing markets. > > G�ran Bilski
Actually, it works quite well if used correctly. It isn't used correctly in the implementations I've seen (from Altera and from an OS vendor). I modified the OS to change the register spill strategy: Rather than spilling the entire register set, we only spill one register frame. Restores are done normally. This results in a "run time optimization" of the top of the register window forprograms. This works very well in practice because after initialization and task startup, a task's register window is at the top of the register file. For a 256 register file that means you get 14 function calls before a register spill occurs. I'm a little sad that we'll lose the register windows in Nios2. Performance, etc. will make up for it. ;-) -Rich
On Fri, 21 May 2004 00:09:34 +0100, "Tim"
<tim@rockylogic.com.nooospam.com> wrote:

>Austin Lesea wrote: > >> lowest interrupt latency of any soft processor core (and >> even better than most hard processors) > >that must be red rag to a bull for john jackson and the other >transputer folk.
Tee hee. Interrupt latency is a joke number. I wrote a piece about twelve years ago for one of the embedded-system comics, pointing out how insignificant is the processor's own interrupt latency - there are many things that are orders of magnitude more important to interrupt performance. Here as in many other things, the transputer was on the right track. Sadly, limitations of design culture and available technology doomed it to commercial failure. Just for the record, here's Bromley's First and Second Law of commercial failure in a technological product: First Law: Probability of commercial failure is increased if the product meets any of the following criteria: 1) It employs concepts and techniques that will become popular more than a decade later. 2) Its design is based on technically, logically or mathematically sound principles. 3) Its creators are British. Second Law: The probability of commercial failure is unity if two or more of the above criteria are met.
>and why are there so many transputer people in fpgaland?
Perhaps because they know a good thing when they see one? Getting more and more cynical as time rolls by... -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com Fax: +44 (0)1425 471573 Web: http://www.doulos.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
jseely@altera.com (Joel A. Seely) wrote in message news:<9bded7a8.0405200947.28b2d90c@posting.google.com>...
>When you start using > the processor in applications that have an RTOS, it's a different > story. Each time you have to do a context switch, unless the RTOS is > really clever, you have to save out the whole set of registers > associated with the task that is getting swapped out and read in the > set of registers for the task that is getting swapped back in.
Yes, but without the windows those would have been swapped out to the stack allready anyway so you loose nothing. Also note how much you gain: For example for a bifurcating recursion even a single level of register windows saves 50% of the register spills, regardless of how deep the recursion is. Two levels save 75%. And so on... For non-recursive scenarios the numbers are even better. (5 levels save almost all spills) BTW: This whole discussion is oT and belongs into comp.arch. Kolja Sulimma
In article <h8cra09qmgc17ulbvh1t4bk80dr1it2g39@4ax.com>,
Jonathan Bromley  <jonathan.bromley@doulos.com> wrote:

>Tee hee. Interrupt latency is a joke number. I wrote a >piece about twelve years ago for one of the embedded-system >comics, pointing out how insignificant is the processor's >own interrupt latency - there are many things that are >orders of magnitude more important to interrupt performance. >Here as in many other things, the transputer was on the >right track. Sadly, limitations of design culture and >available technology doomed it to commercial failure.
I remember doing a bit of due dilligance for a relative who was looking at a job at a company which was making similar claims (they were using a shadow-register setup). I basically did an amdahl's law workup and gave the advice of "this is why it is bogus", and the observation that, since the company HAD funding, it might be good for a year but nothing beyond that.
>>and why are there so many transputer people in fpgaland? > >Perhaps because they know a good thing when they see one?
More importantly, if we ever "solve" the tool problem for general purpose computation on FPGAs, we solve it for Transputers. -- Nicholas C. Weaver nweaver@cs.berkeley.edu
Rick,

You are correct.  I just lashed out.  I apologize (to the newsgroup).

Now that we are the "gorilla" I need to be 5X more humble.  We win with 
listening to customers and always placing them first.

I can't say I won't over-react again, but I can say I will try to improve.

Austin

-snip-
> > > I am just curious Austin, do you think this message helped either you or > Xilinx? >
Austin Lesea <austin@xilinx.com> wrote in message news:<c8jeee$cfc3@cliff.xsj.xilinx.com>...
> Jesse, > > Processors, plural. > > I'm still right. > > Austin >
My sincere apologies. I would drop this, but as its a public forum and I want the reading public to know the truth. Some further elaboration: Multiple embedded processorS on an FPGA (plural) have been technologically feasible, supported, and implemented by customers -- with Nios -- since its inception (I'm sure the same could be said of other offerings prior to that date, too), and we continue to support that. That has been extended in the most recent release of our product. As an example, the user can debug many (we have tested up to 8) processorS (plural) simultaneously via a single JTAG connection and a nice IDE environment. That's the real beauty of an FPGA, as we all know... you have logic you can put to any use, including the same use several times over to do interesting things. And if for some reason a "soft" processor does not equal a "hard" one, well, I suppose that is a matter of debate. They both take compiled C code and do useful tasks, so I think they're both proessorS. Regards, Jesse Kempa Altera Corp. jkempa at altera dot com
You're really sad ?  Take a look at the terribly broken setjmp/longjmp 
implementation for Nios I.  Register windows work ok if you
never switch stacks (say for threads or to have a separate exception 
stack).   A correct implementation of context switching requires that
you spill all the register windows on the task being switched out and
restore to the previous depth the windows on the task being switched in.
setjmp/longjmp together should behave as a context switch.

If your interrupt processing model is -- all processing related to an 
interrupt happens in the interrupt service routine you might be happy 
with register windows (unless you are unfortunate enough to have the 
exception occur when the windows are full).  On the other hand, if your
modle is do only the things that must be done in the service routine, 
then enable a thread to do the rest, then you probably aren't too happy.

I'm quite pleased that they dumped this feature and took the lean approach.

Geoffrey



Richard Pennington wrote:
> Goran Bilski wrote: > >> It seems that Altera has created a MicroBlaze as well. >> They have finally realized that a FPGA based soft processor should have >> - 32 bit ISA >> - 32 registers >> - 3 operand instruction format >> - JTAG based HW debugging >> - HW divider >> >> The weird register window mechanism from NIOS (is it called NIOS1 >> now?) didn't work well in embedded processing markets. >> >> G&#4294967295;ran Bilski > > > Actually, it works quite well if used correctly. It isn't used correctly > in the implementations I've seen (from Altera and from an OS vendor). > I modified the OS to change the register spill strategy: Rather than > spilling the entire register set, we only spill one register frame. > Restores are done normally. This results in a "run time optimization" of > the top of the register window forprograms. This works very well in > practice because after initialization and task startup, a task's > register window is at the top of the register file. For a 256 register > file that means you get 14 function calls before a register spill occurs. > > I'm a little sad that we'll lose the register windows in Nios2. > Performance, etc. will make up for it. ;-) > > -Rich >
Geoffrey Brown <geobrown@cs.indiana.edu> writes:
> You're really sad ? Take a look at the terribly broken setjmp/longjmp > implementation for Nios I. Register windows work ok if you > never switch stacks (say for threads or to have a separate exception > stack). A correct implementation of context switching requires that > you spill all the register windows on the task being switched out and > restore to the previous depth the windows on the task being switched in. > setjmp/longjmp together should behave as a context switch.
The officially defined semantics of setjmp and longjmp do not require that they be usable for switching stacks; they only are defined to unwind a stack. I ran into exactly this problem when I ported the Telebit Netblazer operating system to the AMD 29000 back in 1991. The 29000 typically uses register windows, although it can also use the entire set of 128 local registers as "normal" non-windowed registers. I had to rewrite the setjmp and longjmp implementation exactly as you describe. However, I wouldn't claim that this is because the setjmp/longjmp implmenetation was broken. It was behaving exactly as specified. Rather, the problem is with using setjmp/longjmp for something other than unwinding the stack. I thing a case could be made that the next revision of the C standard should have new library functions for context switching.
"Tim" <tim@rockylogic.com.nooospam.com> wrote in message news:<c8jdui$9rg$1$8300dec7@news.demon.co.uk>...
> Austin Lesea wrote: > > > lowest interrupt latency of any soft processor core (and > > even better than most hard processors) >
Oh, I am asked to say something:) Ok I have no idea whose interupt latency is shortest. Probably the cpu that has the fastest clock rate or the one thats specially designed for int response handling. I suspect that the several ASIC MT cpus that have recently come along for the wireless set could well have the best int response esp 1 that runs 8 threads at 250MHz (or was it 400MHz) because the threads run all the time every 8th cycle. ANd these cpus don't have context to swap since they have N contexts in ram. Technically Transputers don't have interrupts, thats too low a level of looking at them, but they do service events with an incredibly quick response for a variety of reasons but that was at 25MHz and 15yrs ago. Now the R3 cpu also being an multithreaded (MT) cpu (and also now running baby code BTW in C model) could designate 1 of its 16 threads to poll some HW and take the event home. That would mean about 20-50 cycles of computation might pass before Pn noticed it had to do some work. If Pn can find away to stay active in the IX engine without branching (which causes process swap round robin style) then it could notice an event in <4cycles. I don't think I will add support for always stay active process. Now when the process thats does service an interupt does get it's turn, it will have no registers to swap but it may have to do some cache misses while workset becomes reloaded but thats transparent to MT. If it pans out at 250MHz in V2Pro it may or may not have fastest int response. It will however have the most throughput of any FPGA cpu bordering on 1.3clock Freq from the sim traces. It loves branches and transfers and swapping, its the nature of the MT beastie.
> that must be red rag to a bull for john jackson and the other > transputer folk. > > and why are there so many transputer people in fpgaland?
Well I don't remember anyone else here that identifies themself as such, most are probably busy elsewhere. And where is Alan C! Well the answer to that is real simple. Anything FPGAs do today esp DSP and coms and whatever was once done by Transputers. Look at Nallatech and a whole load of UK/European companies that were once Transputer TRAM module houses. Those that survived are all FPGA guys today and in the top tier of high perf engineering. Whats a good engineer to do when something runs out of gas, look for the next obvious replacemment. Also the FPGA and the Transputer more or less came out at the same time 84++, the Transputer peaked along time ago, the FPGA really started peaking only a few years ago, wasn't really much use till 4K or later (sorry).. That also brings me to the other point. Occam runs on both. Not C. Ofcource Occam had to resurrect itself in C syntax (HandelC) to be more attractive to the avg EE to be synthesizeable for FPGA. BTW I am not a fan of HandelC, just mention thats its roots go back to Occam. I will leave it there regards johnjakson_usa_com
Jonathan Bromley <jonathan.bromley@doulos.com> wrote in message news:<h8cra09qmgc17ulbvh1t4bk80dr1it2g39@4ax.com>...
> On Fri, 21 May 2004 00:09:34 +0100, "Tim" > <tim@rockylogic.com.nooospam.com> wrote: > > >Austin Lesea wrote: > > > >> lowest interrupt latency of any soft processor core (and > >> even better than most hard processors) > > > >that must be red rag to a bull for john jackson and the other > >transputer folk. > > Tee hee. Interrupt latency is a joke number. I wrote a > piece about twelve years ago for one of the embedded-system > comics, pointing out how insignificant is the processor's > own interrupt latency - there are many things that are > orders of magnitude more important to interrupt performance. > Here as in many other things, the transputer was on the > right track. Sadly, limitations of design culture and > available technology doomed it to commercial failure. > > Just for the record, here's Bromley's First and Second Law > of commercial failure in a technological product: > > First Law: > Probability of commercial failure is increased if the > product meets any of the following criteria: > 1) It employs concepts and techniques that will become > popular more than a decade later. > 2) Its design is based on technically, logically or > mathematically sound principles. > 3) Its creators are British. >
Perhaps I am doomed to fail on all 3 counts. Anyway I may be a US citizen before this thing gets polished and can deny the last rule as everything important has to seem to be invented or reinvented in the US- (sadly). Since my math isn't so great maybe I can deny the 2nd rule too:). And 20yrs have passed since I left and the Transputer shipped so I can beat that one too perhaps.
> Second Law: > The probability of commercial failure is unity if two > or more of the above criteria are met. > > >and why are there so many transputer people in fpgaland? > > Perhaps because they know a good thing when they see one? >
yep
> Getting more and more cynical as time rolls by... > -- > Jonathan Bromley, Consultant > > DOULOS - Developing Design Know-how > VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services > > Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK > Tel: +44 (0)1425 471223 mail:jonathan.bromley@doulos.com > Fax: +44 (0)1425 471573 Web: http://www.doulos.com > > The contents of this message may contain personal views which > are not the views of Doulos Ltd., unless specifically stated.