comp.arch.fpga | Nios II Going Live...| page 4

Reply by Richard Pennington ●May 20, 20042004-05-20

Goran Bilski wrote:
> It seems that Altera has created a MicroBlaze as well.
> They have finally realized that a FPGA based soft processor should have
> - 32 bit ISA
> - 32 registers
> - 3 operand instruction format
> - JTAG based HW debugging
> - HW divider
> 
> The weird register window mechanism from NIOS (is it called NIOS1 now?) 
> didn't work well in embedded processing markets.
> 
> G&#4294967295;ran Bilski

Actually, it works quite well if used correctly. It isn't used correctly 
in the implementations I've seen (from Altera and from an OS vendor).
I modified the OS to change the register spill strategy: Rather than 
spilling the entire register set, we only spill one register frame. 
Restores are done normally. This results in a "run time optimization" of
the top of the register window forprograms. This works very well in 
practice because after initialization and task startup, a task's 
register window is at the top of the register file. For a 256 register 
file that means you get 14 function calls before a register spill occurs.

I'm a little sad that we'll lose the register windows in Nios2. 
Performance, etc. will make up for it. ;-)

-Rich

Reply by Jonathan Bromley ●May 21, 20042004-05-21

On Fri, 21 May 2004 00:09:34 +0100, "Tim"
<tim@rockylogic.com.nooospam.com> wrote:

>Austin Lesea wrote:
>
>>    lowest interrupt latency of any soft processor core (and
>> even better than most hard processors)
>
>that must be red rag to a bull for john jackson and the other
>transputer folk.

Tee hee.  Interrupt latency is a joke number.  I wrote a
piece about twelve years ago for one of the embedded-system 
comics, pointing out how insignificant is the processor's 
own interrupt latency - there are many things that are 
orders of magnitude more important to interrupt performance.
Here as in many other things, the transputer was on the
right track.  Sadly, limitations of design culture and
available technology doomed it to commercial failure.

Just for the record, here's Bromley's First and Second Law 
of commercial failure in a technological product:

First Law:
  Probability of commercial failure is increased if the
  product meets any of the following criteria:
  1) It employs concepts and techniques that will become
     popular more than a decade later.
  2) Its design is based on technically, logically or
     mathematically sound principles.
  3) Its creators are British.

Second Law:
  The probability of commercial failure is unity if two
  or more of the above criteria are met.

>and why are there so many transputer people in fpgaland?

Perhaps because they know a good thing when they see one?

Getting more and more cynical as time rolls by...
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223          mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573                Web: http://www.doulos.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Reply by Kolja Sulimma ●May 21, 20042004-05-21

jseely@altera.com (Joel A. Seely) wrote in message news:<9bded7a8.0405200947.28b2d90c@posting.google.com>...
>When you start using
> the processor in applications that have an RTOS, it's a different
> story.  Each time you have to do a context switch, unless the RTOS is
> really clever, you have to save out the whole set of registers
> associated with the task that is getting swapped out and read in the
> set of registers for the task that is getting swapped back in.

Yes, but without the windows those would have been swapped out to the
stack allready anyway so you loose nothing.

Also note how much you gain: For example for a bifurcating recursion
even a single level of register windows saves 50% of the register
spills, regardless of how deep the recursion is. Two levels save 75%.
And so on...
For non-recursive scenarios the numbers are even better. (5 levels
save almost all spills)

BTW: This whole discussion is oT and belongs into comp.arch.

Kolja Sulimma

Reply by Nicholas C. Weaver ●May 21, 20042004-05-21

In article <h8cra09qmgc17ulbvh1t4bk80dr1it2g39@4ax.com>,
Jonathan Bromley  <jonathan.bromley@doulos.com> wrote:

>Tee hee.  Interrupt latency is a joke number.  I wrote a
>piece about twelve years ago for one of the embedded-system 
>comics, pointing out how insignificant is the processor's 
>own interrupt latency - there are many things that are 
>orders of magnitude more important to interrupt performance.
>Here as in many other things, the transputer was on the
>right track.  Sadly, limitations of design culture and
>available technology doomed it to commercial failure.

I remember doing a bit of due dilligance for a relative who was
looking at a job at a company which was making similar claims (they
were using a shadow-register setup).  

I basically did an amdahl's law workup and gave the advice of "this is
why it is bogus", and the observation that, since the company HAD
funding, it might be good for a year but nothing beyond that.

>>and why are there so many transputer people in fpgaland?
>
>Perhaps because they know a good thing when they see one?

More importantly, if we ever "solve" the tool problem for general
purpose computation on FPGAs, we solve it for Transputers.

-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu

Reply by Austin Lesea ●May 21, 20042004-05-21

Rick,

You are correct.  I just lashed out.  I apologize (to the newsgroup).

Now that we are the "gorilla" I need to be 5X more humble.  We win with 
listening to customers and always placing them first.

I can't say I won't over-react again, but I can say I will try to improve.

Austin

-snip-
> 
> 
> I am just curious Austin, do you think this message helped either you or
> Xilinx?  
>

Reply by Jesse Kempa ●May 21, 20042004-05-21

Austin Lesea <austin@xilinx.com> wrote in message news:<c8jeee$cfc3@cliff.xsj.xilinx.com>...
> Jesse,
> 
> Processors, plural.
> 
> I'm still right.
> 
> Austin
> 

My sincere apologies. I would drop this, but as its a public forum and
I want the reading public to know the truth. Some further elaboration:

Multiple embedded processorS on an FPGA (plural) have been
technologically feasible, supported, and implemented by customers --
with Nios -- since its inception (I'm sure the same could be said of
other offerings prior to that date, too), and we continue to support
that. That has been extended in the most recent release of our
product. As an example, the user can debug many (we have tested up to
8) processorS (plural) simultaneously via a single JTAG connection and
a nice IDE environment.

That's the real beauty of an FPGA, as we all know... you have logic
you can put to any use, including the same use several times over to
do interesting things.

And if for some reason a "soft" processor does not equal a "hard" one,
well, I suppose that is a matter of debate. They both take compiled C
code and do useful tasks, so I think they're both proessorS.

Regards,

Jesse Kempa
Altera Corp.
jkempa at altera dot com

Reply by Geoffrey Brown ●May 21, 20042004-05-21

You're really sad ?  Take a look at the terribly broken setjmp/longjmp 
implementation for Nios I.  Register windows work ok if you
never switch stacks (say for threads or to have a separate exception 
stack).   A correct implementation of context switching requires that
you spill all the register windows on the task being switched out and
restore to the previous depth the windows on the task being switched in.
setjmp/longjmp together should behave as a context switch.

If your interrupt processing model is -- all processing related to an 
interrupt happens in the interrupt service routine you might be happy 
with register windows (unless you are unfortunate enough to have the 
exception occur when the windows are full).  On the other hand, if your
modle is do only the things that must be done in the service routine, 
then enable a thread to do the rest, then you probably aren't too happy.

I'm quite pleased that they dumped this feature and took the lean approach.

Geoffrey

Richard Pennington wrote:
> Goran Bilski wrote:
> 
>> It seems that Altera has created a MicroBlaze as well.
>> They have finally realized that a FPGA based soft processor should have
>> - 32 bit ISA
>> - 32 registers
>> - 3 operand instruction format
>> - JTAG based HW debugging
>> - HW divider
>>
>> The weird register window mechanism from NIOS (is it called NIOS1 
>> now?) didn't work well in embedded processing markets.
>>
>> G&#4294967295;ran Bilski
> 
> 
> Actually, it works quite well if used correctly. It isn't used correctly 
> in the implementations I've seen (from Altera and from an OS vendor).
> I modified the OS to change the register spill strategy: Rather than 
> spilling the entire register set, we only spill one register frame. 
> Restores are done normally. This results in a "run time optimization" of
> the top of the register window forprograms. This works very well in 
> practice because after initialization and task startup, a task's 
> register window is at the top of the register file. For a 256 register 
> file that means you get 14 function calls before a register spill occurs.
> 
> I'm a little sad that we'll lose the register windows in Nios2. 
> Performance, etc. will make up for it. ;-)
> 
> -Rich
>

Reply by ●May 21, 20042004-05-21

Geoffrey Brown <geobrown@cs.indiana.edu> writes:
> You're really sad ?  Take a look at the terribly broken setjmp/longjmp
> implementation for Nios I.  Register windows work ok if you
> never switch stacks (say for threads or to have a separate exception
> stack).   A correct implementation of context switching requires that
> you spill all the register windows on the task being switched out and
> restore to the previous depth the windows on the task being switched in.
> setjmp/longjmp together should behave as a context switch.

The officially defined semantics of setjmp and longjmp do not require
that they be usable for switching stacks; they only are defined to
unwind a stack.

I ran into exactly this problem when I ported the Telebit Netblazer
operating system to the AMD 29000 back in 1991.  The 29000 typically
uses register windows, although it can also use the entire set of 128
local registers as "normal" non-windowed registers.  I had to rewrite
the setjmp and longjmp implementation exactly as you describe.

However, I wouldn't claim that this is because the setjmp/longjmp
implmenetation was broken.  It was behaving exactly as specified.
Rather, the problem is with using setjmp/longjmp for something
other than unwinding the stack.

I thing a case could be made that the next revision of the C standard
should have new library functions for context switching.

Reply by john jakson ●May 21, 20042004-05-21

"Tim" <tim@rockylogic.com.nooospam.com> wrote in message news:<c8jdui$9rg$1$8300dec7@news.demon.co.uk>...
> Austin Lesea wrote:
> 
> >    lowest interrupt latency of any soft processor core (and
> > even better than most hard processors)
> 

Oh, I am asked to say something:)

Ok I have no idea whose interupt latency is shortest. Probably the cpu
that has the fastest clock rate or the one thats specially designed
for int response handling.

I suspect that the several ASIC MT cpus that have recently come along
for the wireless set could well have the best int response esp 1 that
runs 8 threads at 250MHz (or was it 400MHz) because the threads run
all the time every 8th cycle. ANd these cpus don't have context to
swap since they have N contexts in ram.

Technically Transputers don't have interrupts, thats too low a level
of looking at them, but they do service events with an incredibly
quick response for a variety of reasons but that was at 25MHz and
15yrs ago.

Now the R3 cpu also being an multithreaded (MT) cpu (and also now
running baby code BTW in C model) could designate 1 of its 16 threads
to poll some HW and take the event home. That would mean about 20-50
cycles of computation might pass before Pn noticed it had to do some
work. If Pn can find away to stay active in the IX engine without
branching (which causes process swap round robin style) then it could
notice an event in <4cycles. I don't think I will add support for
always stay active process. Now when the process thats does service an
interupt does get it's turn, it will have no registers to swap but it
may have to do some cache misses while workset becomes reloaded but
thats transparent to MT. If it pans out at 250MHz in V2Pro it may or
may not have fastest int response. It will however have the most
throughput of any FPGA cpu bordering on 1.3clock Freq from the sim
traces. It loves branches and transfers and swapping, its the nature
of the MT beastie.

> that must be red rag to a bull for john jackson and the other
> transputer folk.
> 
> and why are there so many transputer people in fpgaland?

Well I don't remember anyone else here that identifies themself as
such, most are probably busy elsewhere. And where is Alan C!

Well the answer to that is real simple. Anything FPGAs do today esp
DSP and coms and whatever was once done by Transputers. Look at
Nallatech and a whole load of UK/European companies that were once
Transputer TRAM module houses. Those that survived are all FPGA guys
today and in the top tier of high perf engineering. Whats a good
engineer to do when something runs out of gas, look for the next
obvious replacemment.

Also the FPGA and the Transputer more or less came out at the same
time 84++,
the Transputer peaked along time ago, the FPGA really started peaking
only a few years ago, wasn't really much use till 4K or later
(sorry)..

That also brings me to the other point. Occam runs on both. Not C.
Ofcource Occam had to resurrect itself in C syntax (HandelC) to be
more attractive to the avg EE to be synthesizeable for FPGA. BTW I am
not a fan of HandelC, just mention thats its roots go back to Occam.

I will leave it there

regards

johnjakson_usa_com

Reply by john jakson ●May 21, 20042004-05-21

Jonathan Bromley <jonathan.bromley@doulos.com> wrote in message news:<h8cra09qmgc17ulbvh1t4bk80dr1it2g39@4ax.com>...
> On Fri, 21 May 2004 00:09:34 +0100, "Tim"
> <tim@rockylogic.com.nooospam.com> wrote:
> 
> >Austin Lesea wrote:
> >
> >>    lowest interrupt latency of any soft processor core (and
> >> even better than most hard processors)
> >
> >that must be red rag to a bull for john jackson and the other
> >transputer folk.
> 
> Tee hee.  Interrupt latency is a joke number.  I wrote a
> piece about twelve years ago for one of the embedded-system 
> comics, pointing out how insignificant is the processor's 
> own interrupt latency - there are many things that are 
> orders of magnitude more important to interrupt performance.
> Here as in many other things, the transputer was on the
> right track.  Sadly, limitations of design culture and
> available technology doomed it to commercial failure.
> 
> Just for the record, here's Bromley's First and Second Law 
> of commercial failure in a technological product:
> 
> First Law:
>   Probability of commercial failure is increased if the
>   product meets any of the following criteria:
>   1) It employs concepts and techniques that will become
>      popular more than a decade later.
>   2) Its design is based on technically, logically or
>      mathematically sound principles.
>   3) Its creators are British.
> 

Perhaps I am doomed to fail on all 3 counts. 

Anyway I may be a US citizen before this thing gets polished and can
deny the last rule as everything important has to seem to be invented
or reinvented in the US- (sadly).

Since my math isn't so great maybe I can deny the 2nd rule too:).

And 20yrs have passed since I left and the Transputer shipped so I can
beat that one too perhaps.

> Second Law:
>   The probability of commercial failure is unity if two
>   or more of the above criteria are met.
> 
> >and why are there so many transputer people in fpgaland?
> 
> Perhaps because they know a good thing when they see one?
> 

yep

> Getting more and more cynical as time rolls by...
> -- 
> Jonathan Bromley, Consultant
> 
> DOULOS - Developing Design Know-how
> VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services
> 
> Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
> Tel: +44 (0)1425 471223          mail:jonathan.bromley@doulos.com
> Fax: +44 (0)1425 471573                Web: http://www.doulos.com
> 
> The contents of this message may contain personal views which 
> are not the views of Doulos Ltd., unless specifically stated.

Previous 2 345 6 Next

Nios II Going Live...

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group