comp.arch.fpga | Tiny CPUs for Slow Logic| page 3

Reply by Svenn Are Bjerkem ●March 19, 20192019-03-19

On Tuesday, March 19, 2019 at 1:13:38 AM UTC+1, gnuarm.del...@gmail.com wro=
te:
> Most of us have implemented small processors for logic operations that do=
n't need to happen at high speed.  Simple CPUs can be built into an FPGA us=
ing a very small footprint much like the ALU blocks.  There are stack based=
 processors that are very small, smaller than even a few kB of memory. =20
>=20
> If they were easily programmable in something other than C would anyone b=
e interested?  Or is a C compiler mandatory even for processors running ver=
y small programs? =20
>=20
> I am picturing this not terribly unlike the sequencer I used many years a=
go on an I/O board for an array processor which had it's own assembler.  It=
 was very simple and easy to use, but very much not a high level language. =
 This would have a language that was high level, just not C rather somethin=
g extensible and simple to use and potentially interactive.=20
>=20
> Rick C.

picoblaze is such a small cpu and I would like to program it in something e=
lse but its assembler language.=20

--=20
svenn

Reply by Tom Gardner ●March 19, 20192019-03-19

On 19/03/19 14:29, Theo Markettos wrote:
> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
>> Understand XMOS's xCORE processors and xC language, see how
>> they complement and support each other. I found the net result
>> stunningly easy to get working first time, without having to
>> continually read obscure errata!
> 
> I can see the merits of the XMOS approach.  But I'm unclear how this relates
> to the OP's proposal, which (I think) is having tiny CPUs as hard
> logic blocks on an FPGA, like DSP blocks.

A reasonable question.

A major problem with lots of communicating sequential
processors (such as the OP suggests) is how to /think/
about orchestrating them so they compute and communicate
to produce a useful result.

Once you have such a conceptual framework, thereafter you
can develop tools to help.

Oddly enough that occurred to CAR (Tony) Hoare back in
the 70s, and he produced the CSP (communicating sequential
processes) calculus.

In the 80s that was embodied in hardware and software, the
transputers and occam respectively. The modern variant is
the xCORE processors and xC.

They provide a concrete demonstration of one set of tools
and  techniques that allow a cloud of processors to do
useful work.

That's something the GA144 conspicuously failed to achieve.

The OP appears to have a vague concept of something running
through his head, but appears unwilling to understand what
has been tried, what has failed, and where the /conceptual/
practical problems lie.

Overall the OP is a bit like the UK Parliament at the moment.
Both know what they don't want, but can't articulate/decide
what they do want.

The UK Parliament is an unmitigated dysfunctional mess.

> I completely understand the problem of running out of hardware threads, so
> a means of 'just add another one' is handy.  But the issue is how to combine
> such things with other synthesised logic.

I don't think it is difficult to combine those, any more
or less than it is difficult to combine current traditional
hardware and software.

> The XMOS approach is fine when the hardware is uniform and the software sits
> on top, but when the hardware is synthesised and the 'CPUs' sit as pieces in
> a fabric containing random logic (as I think the OP is suggesting) it
> becomes a lot harder to reason about what the system is doing and what the
> software running on such heterogeneous cores should look like.  Only the
> FPGA tools have a full view of what the system looks like, and it seems
> stretching them to have them also generate software to run on these cores.

Through long experience, I'm wary of any single tool that
claims to do everything from top to bottom. They always
work well for things that fit their constraints, but badly
otherwise.

N.B. that includes a single programming style from top to
bottom of a software application. I've used top-level FSMs
expressed in GC'ed OOP languages that had procedural runtimes.
Why? Because the application domain was inherently FSM based,
the GC'ed OOP tools were the best way to create distributed high
availability systems, and the procedural language was  the best
way to create the runtime.

I have comparable examples involving hardware all the
way from low-noise analogue electronics upwards.

Moral: choose the right conceptual framework for each part
of the problem.

> We are not talking about a multi- or many- core chip here, with the CPUs as
> the primary element of compute, but the CPUs scattered around as 'state
> machine elements' justs ups the complexity and makes it harder to understand
> compared with the same thing synthesised out of flip-flops.

It is up to the OP to give us a clue as to example problems
and solutions, and why his concepts are significantly better
than existing techniques.

> I would be interested to know what applications might use heterogenous
> many-cores and what performance is achievable.

Yup.

The "granularity" of the computation and communication will
be a key to understanding what the OP is thinking.

Reply by ●March 19, 20192019-03-19

On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
> On 19/03/19 14:29, Theo Markettos wrote:
> > Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
> >> Understand XMOS's xCORE processors and xC language, see how
> >> they complement and support each other. I found the net result
> >> stunningly easy to get working first time, without having to
> >> continually read obscure errata!
> > 
> > I can see the merits of the XMOS approach.  But I'm unclear how this relates
> > to the OP's proposal, which (I think) is having tiny CPUs as hard
> > logic blocks on an FPGA, like DSP blocks.
> 
> A reasonable question.
> 
> A major problem with lots of communicating sequential
> processors (such as the OP suggests) is how to /think/
> about orchestrating them so they compute and communicate
> to produce a useful result.
> 
> Once you have such a conceptual framework, thereafter you
> can develop tools to help.
> 
> Oddly enough that occurred to CAR (Tony) Hoare back in
> the 70s, and he produced the CSP (communicating sequential
> processes) calculus.
> 

Which had surprisingly small influence on how majority (not majority in sense of 70%, majority in sense of 99.7%) of the industry solve their problems.

> In the 80s that was embodied in hardware and software, the
> transputers and occam respectively. The modern variant is
> the xCORE processors and xC.
> 

The same as above.

> They provide a concrete demonstration of one set of tools
> and  techniques that allow a cloud of processors to do
> useful work.
> 
> That's something the GA144 conspicuously failed to achieve.
> 
> The OP appears to have a vague concept of something running
> through his head, but appears unwilling to understand what
> has been tried, what has failed, and where the /conceptual/
> practical problems lie.
> 
> Overall the OP is a bit like the UK Parliament at the moment.
> Both know what they don't want, but can't articulate/decide
> what they do want.
> 
> The UK Parliament is an unmitigated dysfunctional mess.
> 

Do you prefer dysfunctional mesh ;)

> 
> 
> > I completely understand the problem of running out of hardware threads, so
> > a means of 'just add another one' is handy.  But the issue is how to combine
> > such things with other synthesised logic.
> 
> I don't think it is difficult to combine those, any more
> or less than it is difficult to combine current traditional
> hardware and software.
> 
> 
> > The XMOS approach is fine when the hardware is uniform and the software sits
> > on top, but when the hardware is synthesised and the 'CPUs' sit as pieces in
> > a fabric containing random logic (as I think the OP is suggesting) it
> > becomes a lot harder to reason about what the system is doing and what the
> > software running on such heterogeneous cores should look like.  Only the
> > FPGA tools have a full view of what the system looks like, and it seems
> > stretching them to have them also generate software to run on these cores.
> 
> Through long experience, I'm wary of any single tool that
> claims to do everything from top to bottom. They always
> work well for things that fit their constraints, but badly
> otherwise.
> 
> N.B. that includes a single programming style from top to
> bottom of a software application. I've used top-level FSMs
> expressed in GC'ed OOP languages that had procedural runtimes.
> Why? Because the application domain was inherently FSM based,
> the GC'ed OOP tools were the best way to create distributed high
> availability systems, and the procedural language was  the best
> way to create the runtime.
> 
> I have comparable examples involving hardware all the
> way from low-noise analogue electronics upwards.
> 
> Moral: choose the right conceptual framework for each part
> of the problem.
> 
> 
> > We are not talking about a multi- or many- core chip here, with the CPUs as
> > the primary element of compute, but the CPUs scattered around as 'state
> > machine elements' justs ups the complexity and makes it harder to understand
> > compared with the same thing synthesised out of flip-flops.
> 
> It is up to the OP to give us a clue as to example problems
> and solutions, and why his concepts are significantly better
> than existing techniques.
> 
> 
> > I would be interested to know what applications might use heterogenous
> > many-cores and what performance is achievable.
> 
> Yup.
> 
> The "granularity" of the computation and communication will
> be a key to understanding what the OP is thinking.

I don't know what Rick had in mind.
I personally would go for one "hard-CPU" block per 4000-5000 6-input logic elements (i.e. Altera ALMs or Xilinx CLBs). Each block could be configured either as one 64-bit core or pair of 32-bit cores. The bock would contains hard instruction decoders/ALUs/shifters and hard register files. It can optionally borrow adjacent DSP blocks for multipliers. Adjacent embedded memory blocks can be used for data memory. Code memory should be a bit more flexible giving to designer a choice between embedded memory blocks or distributed memory (X)/MLABs(A).

Reply by Tom Gardner ●March 19, 20192019-03-19

On 19/03/19 17:35, already5chosen@yahoo.com wrote:
> On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
>> On 19/03/19 14:29, Theo Markettos wrote:
>>> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
>>>> Understand XMOS's xCORE processors and xC language, see how they
>>>> complement and support each other. I found the net result stunningly
>>>> easy to get working first time, without having to continually read
>>>> obscure errata!
>>> 
>>> I can see the merits of the XMOS approach.  But I'm unclear how this
>>> relates to the OP's proposal, which (I think) is having tiny CPUs as
>>> hard logic blocks on an FPGA, like DSP blocks.
>> 
>> A reasonable question.
>> 
>> A major problem with lots of communicating sequential processors (such as
>> the OP suggests) is how to /think/ about orchestrating them so they compute
>> and communicate to produce a useful result.
>> 
>> Once you have such a conceptual framework, thereafter you can develop tools
>> to help.
>> 
>> Oddly enough that occurred to CAR (Tony) Hoare back in the 70s, and he
>> produced the CSP (communicating sequential processes) calculus.
>> 
> 
> Which had surprisingly small influence on how majority (not majority in sense
> of 70%, majority in sense of 99.7%) of the industry solve their problems.

That's principally because Moore's "law" enabled people to
avoid confronting the issues. Now that Moore's "law" has run
out of steam, the future becomes more interesting.

Note that TI included some of the concepts in its DSP processors.

Golang has included some of the concepts.

Many libraries included some of the concepts.



>> In the 80s that was embodied in hardware and software, the transputers and
>> occam respectively. The modern variant is the xCORE processors and xC.
>> 
> 
> The same as above.
> 
>> They provide a concrete demonstration of one set of tools and  techniques
>> that allow a cloud of processors to do useful work.
>> 
>> That's something the GA144 conspicuously failed to achieve.
>> 
>> The OP appears to have a vague concept of something running through his
>> head, but appears unwilling to understand what has been tried, what has
>> failed, and where the /conceptual/ practical problems lie.
>> 
>> Overall the OP is a bit like the UK Parliament at the moment. Both know
>> what they don't want, but can't articulate/decide what they do want.
>> 
>> The UK Parliament is an unmitigated dysfunctional mess.
>> 
> 
> Do you prefer dysfunctional mesh ;)

:) I'll settle for anything that /works/ predictably :(



>>> I completely understand the problem of running out of hardware threads,
>>> so a means of 'just add another one' is handy.  But the issue is how to
>>> combine such things with other synthesised logic.
>> 
>> I don't think it is difficult to combine those, any more or less than it is
>> difficult to combine current traditional hardware and software.
>> 
>> 
>>> The XMOS approach is fine when the hardware is uniform and the software
>>> sits on top, but when the hardware is synthesised and the 'CPUs' sit as
>>> pieces in a fabric containing random logic (as I think the OP is
>>> suggesting) it becomes a lot harder to reason about what the system is
>>> doing and what the software running on such heterogeneous cores should
>>> look like.  Only the FPGA tools have a full view of what the system looks
>>> like, and it seems stretching them to have them also generate software to
>>> run on these cores.
>> 
>> Through long experience, I'm wary of any single tool that claims to do
>> everything from top to bottom. They always work well for things that fit
>> their constraints, but badly otherwise.
>> 
>> N.B. that includes a single programming style from top to bottom of a
>> software application. I've used top-level FSMs expressed in GC'ed OOP
>> languages that had procedural runtimes. Why? Because the application domain
>> was inherently FSM based, the GC'ed OOP tools were the best way to create
>> distributed high availability systems, and the procedural language was  the
>> best way to create the runtime.
>> 
>> I have comparable examples involving hardware all the way from low-noise
>> analogue electronics upwards.
>> 
>> Moral: choose the right conceptual framework for each part of the problem.
>> 
>> 
>>> We are not talking about a multi- or many- core chip here, with the CPUs
>>> as the primary element of compute, but the CPUs scattered around as
>>> 'state machine elements' justs ups the complexity and makes it harder to
>>> understand compared with the same thing synthesised out of flip-flops.
>> 
>> It is up to the OP to give us a clue as to example problems and solutions,
>> and why his concepts are significantly better than existing techniques.
>> 
>> 
>>> I would be interested to know what applications might use heterogenous 
>>> many-cores and what performance is achievable.
>> 
>> Yup.
>> 
>> The "granularity" of the computation and communication will be a key to
>> understanding what the OP is thinking.
> 
> I don't know what Rick had in mind. I personally would go for one "hard-CPU"
> block per 4000-5000 6-input logic elements (i.e. Altera ALMs or Xilinx CLBs).
> Each block could be configured either as one 64-bit core or pair of 32-bit
> cores. The bock would contains hard instruction decoders/ALUs/shifters and
> hard register files. It can optionally borrow adjacent DSP blocks for
> multipliers. Adjacent embedded memory blocks can be used for data memory.
> Code memory should be a bit more flexible giving to designer a choice between
> embedded memory blocks or distributed memory (X)/MLABs(A).

It would be interesting to find an application level
description (i.e. language constructs) that
  - could be automatically mapped onto those primitives
    by a toolset
  - was useful for more than a niche subset of applications
  - was significantly better than existing tools

I wouldn't hold my breath :)

Reply by ●March 19, 20192019-03-19

On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos wrote:
> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
> > Understand XMOS's xCORE processors and xC language, see how
> > they complement and support each other. I found the net result
> > stunningly easy to get working first time, without having to
> > continually read obscure errata!
> 
> I can see the merits of the XMOS approach.  But I'm unclear how this relates
> to the OP's proposal, which (I think) is having tiny CPUs as hard
> logic blocks on an FPGA, like DSP blocks.
> 
> I completely understand the problem of running out of hardware threads, so
> a means of 'just add another one' is handy.  But the issue is how to combine
> such things with other synthesised logic.
> 
> The XMOS approach is fine when the hardware is uniform and the software sits
> on top, but when the hardware is synthesised and the 'CPUs' sit as pieces in
> a fabric containing random logic (as I think the OP is suggesting) it
> becomes a lot harder to reason about what the system is doing and what the
> software running on such heterogeneous cores should look like.  Only the
> FPGA tools have a full view of what the system looks like, and it seems
> stretching them to have them also generate software to run on these cores.

When people talk about things like "software running on such heterogeneous cores" it makes me think they don't really understand how this could be used.  If you treat these small cores like logic elements, you don't have such lofty descriptions of "system software" since the software isn't created out of some global software package.  Each core is designed to do a specific job just like any other piece of hardware and it has discrete inputs and outputs just like any other piece of hardware.  If the hardware clock is not too fast, the software can synchronize with and literally function like hardware, but implementing more complex logic than the same area of FPGA fabric might.  

There is no need to think about how the CPUs would communicate unless there is a specific need for them to do so.  The F18A uses a handshaked parallel port in their design.  They seem to have done a pretty slick job of it and can actually hang the processor waiting for the acknowledgement saving power and getting an instantaneous wake up following the handshake.  This can be used with other CPUs or 

> We are not talking about a multi- or many- core chip here, with the CPUs as
> the primary element of compute, but the CPUs scattered around as 'state
> machine elements' justs ups the complexity and makes it harder to understand
> compared with the same thing synthesised out of flip-flops.

Not sure what is hard to think about.  It's a CPU, a small CPU with limited memory to implement small tasks that can do rather complex operations compared to a state machine really and includes memory, arithmetic and logic as well as I/O without having to write a single line of HDL.  Only the actual app needs to be written. 

> I would be interested to know what applications might use heterogenous
> many-cores and what performance is achievable.

Yes, clearly not getting the concept.  Asking about heterogeneous performance is totally antithetical to this idea. 

Rick C.

Reply by ●March 19, 20192019-03-19

On Tuesday, March 19, 2019 at 11:24:33 AM UTC-4, Svenn Are Bjerkem wrote:
> On Tuesday, March 19, 2019 at 1:13:38 AM UTC+1, gnuarm.del...@gmail.com wrote:
> > Most of us have implemented small processors for logic operations that don't need to happen at high speed.  Simple CPUs can be built into an FPGA using a very small footprint much like the ALU blocks.  There are stack based processors that are very small, smaller than even a few kB of memory.  
> > 
> > If they were easily programmable in something other than C would anyone be interested?  Or is a C compiler mandatory even for processors running very small programs?  
> > 
> > I am picturing this not terribly unlike the sequencer I used many years ago on an I/O board for an array processor which had it's own assembler.  It was very simple and easy to use, but very much not a high level language.  This would have a language that was high level, just not C rather something extensible and simple to use and potentially interactive. 
> > 
> > Rick C.
> 
> picoblaze is such a small cpu and I would like to program it in something else but its assembler language. 

Yes, it is small.  How large is the program you are interested in? 

Rick C.

Reply by David Brown ●March 20, 20192019-03-20

On 20/03/2019 03:30, gnuarm.deletethisbit@gmail.com wrote:
> On Tuesday, March 19, 2019 at 10:29:07 AM UTC-4, Theo Markettos
> wrote:
>> Tom Gardner <spamjunk@blueyonder.co.uk> wrote:
>>> Understand XMOS's xCORE processors and xC language, see how they
>>> complement and support each other. I found the net result 
>>> stunningly easy to get working first time, without having to 
>>> continually read obscure errata!
>> 
>> I can see the merits of the XMOS approach.  But I'm unclear how
>> this relates to the OP's proposal, which (I think) is having tiny
>> CPUs as hard logic blocks on an FPGA, like DSP blocks.
>> 
>> I completely understand the problem of running out of hardware
>> threads, so a means of 'just add another one' is handy.  But the
>> issue is how to combine such things with other synthesised logic.
>> 
>> The XMOS approach is fine when the hardware is uniform and the
>> software sits on top, but when the hardware is synthesised and the
>> 'CPUs' sit as pieces in a fabric containing random logic (as I
>> think the OP is suggesting) it becomes a lot harder to reason about
>> what the system is doing and what the software running on such
>> heterogeneous cores should look like.  Only the FPGA tools have a
>> full view of what the system looks like, and it seems stretching
>> them to have them also generate software to run on these cores.
> 
> When people talk about things like "software running on such
> heterogeneous cores" it makes me think they don't really understand
> how this could be used.  If you treat these small cores like logic
> elements, you don't have such lofty descriptions of "system software"
> since the software isn't created out of some global software package.
> Each core is designed to do a specific job just like any other piece
> of hardware and it has discrete inputs and outputs just like any
> other piece of hardware.  If the hardware clock is not too fast, the
> software can synchronize with and literally function like hardware,
> but implementing more complex logic than the same area of FPGA fabric
> might.
> 

That is software.

If you want to try to get cycle-precise control of the software and use
that precision for direct hardware interfacing, you are almost certainly
going to have a poor, inefficient and difficult design.  It doesn't
matter if you say "think of it like logic" - it is /not/ logic, it is
software, and you don't use that for cycle-precise control.  You use
when you need flexibility, calculations, and decisions.

> There is no need to think about how the CPUs would communicate unless
> there is a specific need for them to do so.  The F18A uses a
> handshaked parallel port in their design.  They seem to have done a
> pretty slick job of it and can actually hang the processor waiting
> for the acknowledgement saving power and getting an instantaneous
> wake up following the handshake.  This can be used with other CPUs or
> 

Fair enough.

> 
> 
>> We are not talking about a multi- or many- core chip here, with the
>> CPUs as the primary element of compute, but the CPUs scattered
>> around as 'state machine elements' justs ups the complexity and
>> makes it harder to understand compared with the same thing
>> synthesised out of flip-flops.
> 
> Not sure what is hard to think about.  It's a CPU, a small CPU with
> limited memory to implement small tasks that can do rather complex
> operations compared to a state machine really and includes memory,
> arithmetic and logic as well as I/O without having to write a single
> line of HDL.  Only the actual app needs to be written.
> 
> 
>> I would be interested to know what applications might use
>> heterogenous many-cores and what performance is achievable.
> 
> Yes, clearly not getting the concept.  Asking about heterogeneous
> performance is totally antithetical to this idea.
> 
> Rick C.
>

Reply by Philipp Klaus Krause ●March 20, 20192019-03-20

Am 19.03.19 um 16:24 schrieb Svenn Are Bjerkem:
> 
> picoblaze is such a small cpu and I would like to program it in something else but its assembler language. 
> 

It would be possible to write a C compiler for it (with some
restrictions, such as functions being non-reentrant). The architecture
doesn't seem any worse than PIC. And there are / were pic14 and pic16
backends in SDCC.

Philipp

Reply by ●March 20, 20192019-03-20

On Wednesday, March 20, 2019 at 4:32:07 AM UTC+2, gnuarm.del...@gmail.com wrote:
> On Tuesday, March 19, 2019 at 11:24:33 AM UTC-4, Svenn Are Bjerkem wrote:
> > On Tuesday, March 19, 2019 at 1:13:38 AM UTC+1, gnuarm.del...@gmail.com wrote:
> > > Most of us have implemented small processors for logic operations that don't need to happen at high speed.  Simple CPUs can be built into an FPGA using a very small footprint much like the ALU blocks.  There are stack based processors that are very small, smaller than even a few kB of memory.  
> > > 
> > > If they were easily programmable in something other than C would anyone be interested?  Or is a C compiler mandatory even for processors running very small programs?  
> > > 
> > > I am picturing this not terribly unlike the sequencer I used many years ago on an I/O board for an array processor which had it's own assembler.  It was very simple and easy to use, but very much not a high level language.  This would have a language that was high level, just not C rather something extensible and simple to use and potentially interactive. 
> > > 
> > > Rick C.
> > 
> > picoblaze is such a small cpu and I would like to program it in something else but its assembler language. 
> 
> Yes, it is small.  How large is the program you are interested in? 
> 
> Rick C.

I don't know about Svenn Are Bjerkem, but can tell you about myself.
Last time when I considered something like that and wrote enough of the program to make measurements the program contained ~250 Nios2 instructions. I'd guess, on minimalistic stack machine it would take 350-400 instructions.
At the end, I didn't do it in software. Coding the same functionality in HDL turned out to be not hard, which probably suggests that my case was smaller than average.

Another extreme, where I did end up using "small" soft core, it was much more like "real" software: 2300 Nios2 instructions.

Reply by ●March 20, 20192019-03-20

On Tuesday, March 19, 2019 at 10:07:38 PM UTC+2, Tom Gardner wrote:
> On 19/03/19 17:35, already5chosen@yahoo.com wrote:
> > On Tuesday, March 19, 2019 at 6:19:36 PM UTC+2, Tom Gardner wrote:
> >> 
> >> The UK Parliament is an unmitigated dysfunctional mess.
> >> 
> > 
> > Do you prefer dysfunctional mesh ;)
> 
> :) I'll settle for anything that /works/ predictably :(
> 

UK political system is completely off-topic in comp.arch.fpga. However I'd say that IMHO right now your parliament is facing unusually difficult problem on one hand, but at the same time it's not really "life or death" sort of the problem. Having troubles and appearing non-decisive in such situation is normal. It does not mean that the system is broken.

Previous 1 234 5 6 Next

Tiny CPUs for Slow Logic

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group