comp.arch.fpga | Tiny CPUs for Slow Logic| page 6

Reply by David Brown ●March 21, 20192019-03-21

On 21/03/2019 03:21, gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:

> 
>> I want to know if that is going to happen with your ideas here.
>> Sure, you don't have a full business plan - but do you at least
>> have thoughts about the kind of usage where these mini cpus would
>> be a technologically superior choice compared to using state
>> machines in VHDL (possibly generated with external programs),
>> sequential logic generators (like C to HDL compilers, matlab tools,
>> etc.), normal soft processors, or normal hard processors?
> 
> The point wasn't that I don't have a business plan.  The point was
> that I haven't given this as much thought as would have been done if
> I were working on a business plan.  I'm kicking around an idea.  I'm
> not in a position to create FPGA with or without small CPUs.
> 
> 
>> Give me a /reason/ to all this - rather than just saying you can
>> make a simple stack-based cpu that's very small, so you could have
>> lots of them on a chip.
> 
> Why?  Why don't you give ME a reason?  Why don't you switch your
> point of view and figure out how this would be useful?  Neither of us
> have anything to gain or lose.
> 

I don't have any good ideas of what these might be used for.  And I 
can't see how it ends up as /my/ responsibility to figure out why /your/ 
idea might be a good idea.

You presented an idea - having several small, simple cpus on a chip. 
It's taken a long time, and a lot of side-tracks, to drag out of you 
what you are really thinking about.  (Perhaps you didn't have a clear 
idea in your mind with your first post, and it has solidified underway - 
in which case, great, and I'm glad the thread has been successful there.)

I've been trying to help by trying to look at how these might be used, 
and how they compare to alternative existing solutions.  And I have been 
trying to get /you/ to come up with some ideas about when they might be 
useful.  All I'm getting is a lot of complaints, insults, condescension, 
patronisation.  You tell me I don't understand what these are for - yet 
you refuse to say what they are for (the nearest we have got in any post 
in this thread to evidence that there is any use-case, is you telling me 
you have ideas but refuse to tell me as I am not an FPGA designer by 
profession).  You are forever telling me about the wonders of the F18A 
and the GA144, and how I can't understand your ideas because I don't 
understand that device - while simultaneously telling me that device is 
irrelevant to your proposal.  You are asking for opinions and thoughts 
about how people would program these devices, then tell me I am wrong 
and closed-minded when I give you answers.

Hopefully, you have got /some/ ideas and thoughts out of this thread. 
You can take a long, hard look at the idea in that light, and see if it 
really is something that could be useful - in today's world with today's 
tools and technology, or tomorrow's world with new tools and development 
systems.

But next time you want to start a thread asking for ideas and opinions, 
how about responding with phrases like "I hadn't thought of it that 
way", "I think FPGA designers IME would like this" - not "You are wrong, 
and clearly ignorant".

You are a smart guy, and you are great at answering other people's 
questions and helping them out - but boy, are you bad at asking for help 
yourself.

Reply by ●March 21, 20192019-03-21

On Thursday, March 21, 2019 at 4:21:13 AM UTC+2, gnuarm.del...@gmail.com wrote:
> 
> So???  You are the one who keeps talking about software/hardware whatever.  I'm talking about the software being able to synchronize with the clock of the other hardware.  When that happens there are tight timing constraints in the same sense of the software sampling an ADC on a periodic basis and having to process the resulting data before the next sample is ready.  The only difference is something like the F18A running at a few GHz can do a lot in a 10 ns clock cycle. 
> 
> 

I certainly don't like "few GHz" part.
Distributing single multi-GHZ clock over full area of FPGA is non-starter from power perspective alone, but even ignoring the power, such distribution takes significant area making the whole proposition unattractive. As I understand it, the whole point is that this thingies take little area, so they are not harmful even for those buyers of device that don't utilize them at all or utilize very little.
Alternatively, multi-GHZ clocks can be generated by local specialized PLLs, but I am afraid that PLLs would be several times bigger than cores themselves and need good non-noisy power supplies and grounds that are probably hard to get in the middle of the chip etc... I really know too little about PLLs, but I think that I know enough to conclude that it's not much better idea than chip-wide clock distribution at multi-GHZ.

My idea of small hard cores is completely different in that regard. IMHO, they should run either with the same clock as surrounding FPGA fabric or with clock, delivered by simple clock doubler. Even clock quadrupling does not appear as a good idea to my engineering intuition.

Reply by Tom Gardner ●March 21, 20192019-03-21

On 21/03/19 02:21, gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:
>> 
>> I agree that software should not in itself create a problem.  Trying to 
>> think of them as "logic" /would/ create problems.  Think of them as 
>> software, and program them as software.  I expect you'd think of them as 
>> entirely independent units with independent programs, rather than as a 
>> multi-cpu or heterogeneous system.
> 
> Ok, please tell me what those problems would be.  I have no idea what you
> mean by what you say.  You are likely reading a lot into this that I am not
> intending.

I have no difficulty understanding what he is saying.

Several people have difficulty understanding what you
are proposing.

You are proposing vague ideas, so the onus is on you
to make your ideas clear.


>>> As to the connection, I really don't get your point.  They either connect
>>> directly to the hardware because that's how they are designed, or they
>>> don't... because that's how they are designed.  I don't know what you are
>>> saying about that.
>>> 
>> 
>> "Synchronise directly with hardware" might be a better phrase.
> 
> I don't know why and likely I'm' not going to care.  I think you need to
> learn more of how the F18A works.

No, we really don't have to learn more about one specific
processor - especially if it is just to help you.

If, OTOH, you succinctly summarise its key points and
how that achieves benefits, then we might be interested.


>>> Enough!  The CPUs run software.  Now, what is YOUR point?
>>> 
>> 
>> My point was that these are not logic, they are not logic elements (even if
>> they could be physically small and cheap and scattered around a chip like
>> logic elements).  Thinking about them as "sequential logic elements" is not
>> helpful.  Think of them as small processors running simple and limited
>> /software/.  Unless you can find a way to automatically generate code for
>> them, then they will be programmed using a /software/ programming language,
>> not a logic or hardware programming language.  If you are happy to accept
>> that now, then great - we can move on.
> 
> You have it backwards.  Please show me what you think the problems are.  I
> don't care if they run software or have a Maxwell demon tossing bits about as
> long as it does what I need.  You seem to get hung up on terminology so
> easily.

You need to explain your points better.

There's the old adage that "you only realise how little
you know about a subject when you try to teach it to
other people".


> That is your construct because you know nothing of how the F18A works.  As
> I've mentioned before, you would do well to read some of the app notes on
> this device.  It really does have some good ideas to offer.

Give us the elevator pitch, so we can estimate whether
it would be a beneficial use of our remaining life.



> The point wasn't that I don't have a business plan.  The point was that I
> haven't given this as much thought as would have been done if I were working
> on a business plan.  I'm kicking around an idea.  I'm not in a position to
> create FPGA with or without small CPUs.
> 
> 
>> Give me a /reason/ to all this - rather than just saying you can make a 
>> simple stack-based cpu that's very small, so you could have lots of them on
>> a chip.
> 
> Why?  Why don't you give ME a reason?  Why don't you switch your point of
> view and figure out how this would be useful?  Neither of us have anything to
> gain or lose.

Why? Because you are trying to propagate your ideas.
The onus is on you to convince us, not the other way
around.

Reply by Tom Gardner ●March 21, 20192019-03-21

On 20/03/19 16:32, already5chosen@yahoo.com wrote:
> On Wednesday, March 20, 2019 at 5:51:21 PM UTC+2, Tom Gardner wrote:
>> On 20/03/19 14:51, already5chosen@yahoo.com wrote:
>>> On Wednesday, March 20, 2019 at 4:31:27 PM UTC+2, Tom Gardner wrote:
>>>> On 20/03/19 14:11, already5chosen@yahoo.com wrote:
>>>>> On Wednesday, March 20, 2019 at 3:37:17 PM UTC+2, Tom Gardner wrote:
>>>>>> 
>>>>>> But more difficult that creating such a toolset is defining an
>>>>>> application level description that a toolset can munge.
>>>>>> 
>>>>>> So, define (initially by example, later more formally) inputs to
>>>>>> the toolset and outputs from it. Then we can judge whether the
>>>>>> concepts are more than handwaving wishes.
>>>>>> 
>>>>> 
>>>>> I don't understand what you are asking for.
>>>> 
>>>> Go back and read the parts of my post that you chose to snip.
>>>> 
>>>> Give a handwaving indication of the concepts that avoid the conceptual
>>>> problems that I mentioned.
>>> 
>>> Frankly, it starts to sound like you never used soft CPU cores in your
>>> designs. So, for somebody like myself, who uses them routinely for
>>> different tasks since 2006, you are really not easy to understand.
>> 
>> Professionally, since 1978 I've done everything from low noise analogue
>> electronics, many hardware-software systems using all sorts of
>> technologies, networking at all levels of the protocol stack, "up" to high
>> availability distributed soft real-time systems.
>> 
>> And almost all of that has been on the bleeding edge.
>> 
>> So, yes, I do have more than a passing acquaintance with the
>> characteristics of many hardware and software technologies, and where
>> partitions between them can, should and should not be drawn.
>> 
> 
> Is it sort of admission that you indeed never designed with soft cores?

No, it is not.


>>> Concept? Concepts are good for new things, not for something that is a
>>> variation of something old and routine and obviously working.
>> 
>> Whatever is being proposed, is it old or new?
>> 
>> If old then the OP needs enlightenment and concrete examples can easily be
>> noted.
>> 
>> If new, then provide the concepts.
>> 
> 
> It is a new variation of of old concept. A cross between PPCs in ancient
> VirtexPro and soft cores virtually everywhere in more modern times. Probably,
> best characterized by what is not alike: it is not alike Xilinx Zynq or
> Altera Cyclone5-HPS.
> 
> "New" part comes more from new economics of sub-20nm processes than from
> abstractions that you try to draf into it. NRE is more and more expensive,
> gates are more and more cheap (Well, the cost of gates started to stagnate in
> last couple of years, but that does not matter. What's matter is that at
> something like TSMC 12nm gate are already quite cheap). So, adding multiple
> small CPU cores that could be used as replacement for multiple soft CPU cores
> that people already used to use today, now starts to make sense. May be, it's
> not a really good proposition, but at these silicon geometries it can't be
> written out as obviously stupid proposition.

The starting points are fine, but so what?

There's little point building something if it
isn't useful in practice.

For examples of that, see Intel's 432 and 860
processors, and there are other examples.

Reply by Theo ●March 21, 20192019-03-21

gnuarm.deletethisbit@gmail.com wrote:
> On Wednesday, March 20, 2019 at 6:53:07 AM UTC-4, Theo wrote:
> > Your bottom-up approach means it's difficult to see the big picture of
> > what's going on.  That means it's hard to understand the whole system, and
> > to program from a whole-system perspective.
> 
> I never mentioned a bottom up or a top down approach to design.  Nothing
> about using these small CPUs is about the design "direction".  I am pretty
> sure that you have to define the circuit they will work in before you can
> start designing the code.

Your approach is 'I have this low-level thing (a tiny CPU), what can I use
it for?'.  That's bottom up.  A top down view would be 'my problem is X,
what's the best way to solve it?'.  The advantage of the latter view is you
can explore some of the architectural space before targeting a solution
that's appropriate to the problem (with metrics to measure it), aiming to
find the global maximum.  In a bottom-up approach you need to sell to users
that your idea will help their problem, but until you build a system they
don't know that it will even be a local maximum.

> > What's the logic equation of a processor?  
> 
> Obviously it is like a combination of LUTs with FFs and able to implement
> any logic you wish including math.  BTW, in many devices the elements are
> not at all so simple.  Xilinx LUTs can be used as shift registers.  There
> are additional logic within the logic blocks that allow math with carry
> chains, combining LUTs to form larger LUTs, breaking LUTs into smaller
> LUTs and lets not forget about routing which may not be used much anymore,
> not sure.

You can still reason about blocks as combinations of basic functions.  A
block that is LUT+FF can still be analysed in separate parts.
A processor is a 'black box' as far as the tools go.  That means any
software is opaque to analysis of correctness.  The tools therefore can't
know that the circuit they produced matches the input HDL.

Simulation does not give you equivalence checking of the form of LVS (layout
versus schematic) or compiler correctness testing, it only tests a
particular set of (usually hand-defined) test cases.  There's much less
coverage than equivalence checking tools.

> Why does it need to be inferred.  If you want to write an HDL tool to turn
> HDL into processor code, have at it.  But then there are other methods. 
> Someone mentioned his MO is to use other tools for designing his
> algorithms and letting that tool generate the software for a processor or
> the HDL for an FPGA.  That would seem easy enough to integrate.

That's roughly what OpenCL and friends can do.  But those are top-down
architecturally (starting with a chip block diagram), rather than starting
with tiny building blocks as you're suggesting.

> Huh?  You can't simulate code on a processor???

Verification is greater than simulation, as described above.

> > If we scale the processors up a bit, I could see the merits in say a
> > bank of, say, 32 Cortex M0s that could be interconnected as part of the
> > FPGA fabric and programmed in software for dedicated tasks (for
> > instance, read the I2C EEPROM on the DRAM DIMM and configure the DRAM
> > controller at boot).
> 
> I don't follow your logic.  What is different about the ARM processor from
> the stack processor other than that it is larger and slower and requires a
> royalty on each one?  Are you talking about writing the code in C vs. 
> what ever is used for the stack processor?

If you have an existing codebase (supplied by the vendor of your external
chip, for example), it'll likely be in C.  It won't be in
special-stack-assembler, and your architecture seems to be designed to not
be amenable to compilers.

> The point of the many hard cores is the saving of resources.  Soft cores
> would be the most wasteful way to implement logic.  If the application is
> large enough they can implement things in software that aren't as
> practical in HDL, but that would be a different class of logic from the
> tiny CPUs I'm talking about.

'Wastefulness' is one parameter.  But you can also consider that every
unused hard-core is also wasteful in terms of silicon area.  Can you show
that the hard-cores would be used enough of the time to outweigh the space
they waste on other people's designs?

> You lost me with the gear shift.  The mention of instruction rate is about
> the CPU being fast enough to keep up with FPGA logic.  The issue with
> "heterogeneous performance" is the "heterogeneous" part, lumping the many
> CPUs together to create some sort of number cruncher.  That's not what
> this is about.  Like in the GA144, I fully expect most CPUs to be sitting
> around most of the time idling, waiting for data.  This is a good thing
> actually.  These CPUs could consume significant current if they run at GHz
> all the time.  I believe in the GA144 at that slower rate each processor
> can use around 2.5 mA.  Not sure if a smaller process would use more or
> less power when running flat out.  It's been too many years since I worked
> with those sorts of numbers.

OK, so once we drop any idea of MIPS, we're talking about something simpler
than a Cortex M0.  You should be able to make a design that clocks at a few
hundred MHz on an FPGA process.  You could choose to run it synchronously
with your FPGA logic, or on an internal clock and synchronise inputs and
outputs.  You probably wouldn't tile these, but you could deploy them as a
'hardware thread' in places you need a complicated state machine.

> > In essence, your proposal has a disconnect between the situations existing
> > FPGA blocks are used (implemented automatically by P&R tools) and the
> > situations software is currently used (human-driven software and
> > architectural design).  It's unclear how you claim to bridge this gap.
> 
> I certainly don't see how P&R tools would be a problem.  They accommodate
> multipliers, DSP blocks, memory block and many, many special bits of
> assorted components inside the FPGAs which vary from vendor to vendor. 
> Clock generators and distribution is pretty unique to each manufacturer. 
> Lattice has all sorts of modules to offer like I2C and embedded Flash. 
> Then there are entire CPUs embedded in FPGAs.  Why would supporting them
> be so different from what I am talking about?

If this is a module that the tools have no visibility over, ie just a blob
with inputs and outputs, then they can implement that.  In that instance
there is a manageability problem - beyond a handful of processes, writing
heterogeneous distributed software is hard.  Unless each processor is doing
a very small, well-defined, task, I think the chances of bugs are high.

If instead you want interaction with the toolchain in terms of
generating/checking the software running on such cores, that's also
problematic.

I hadn't seen Picoblaze before, but that seems a strong fit with what you're
suggesting.  So a question: why isn't it more successful?  And why isn't
Xilinx putting hard Picoblazes into their FPGAs, which they could do
tomorrow if they felt the need?

Theo

Reply by Tom Gardner ●March 21, 20192019-03-21

On 21/03/19 10:49, Theo wrote:
> gnuarm.deletethisbit@gmail.com wrote:
>> On Wednesday, March 20, 2019 at 6:53:07 AM UTC-4, Theo wrote:
>>> Your bottom-up approach means it's difficult to see the big picture of
>>> what's going on.  That means it's hard to understand the whole system, and
>>> to program from a whole-system perspective.
>>
>> I never mentioned a bottom up or a top down approach to design.  Nothing
>> about using these small CPUs is about the design "direction".  I am pretty
>> sure that you have to define the circuit they will work in before you can
>> start designing the code.
> 
> Your approach is 'I have this low-level thing (a tiny CPU), what can I use
> it for?'.  That's bottom up.  A top down view would be 'my problem is X,
> what's the best way to solve it?'.  

The OP's attitude and responses have puzzled me. However, they
make more sense if that is indeed his design strategy - and I
suspect it is, based on comments he has made in other parts
of this thread.

That attitude surprises me, since all my /designs/ have been
based on "what do I need to achieve" plus "what can individual
technologies achieve" plus "which combination of technologies
is best at achieving my objectives". I.e top down with a
knowledge of the bottom pieces.

Of course I /implement/ my designs in a more bottom up way.

(I agree with the rest of your statements)

Reply by ●March 21, 20192019-03-21

On Thursday, March 21, 2019 at 5:22:09 AM UTC-4, already...@yahoo.com wrote:
> On Thursday, March 21, 2019 at 4:21:13 AM UTC+2, gnuarm.del...@gmail.com wrote:
> > 
> > So???  You are the one who keeps talking about software/hardware whatever.  I'm talking about the software being able to synchronize with the clock of the other hardware.  When that happens there are tight timing constraints in the same sense of the software sampling an ADC on a periodic basis and having to process the resulting data before the next sample is ready.  The only difference is something like the F18A running at a few GHz can do a lot in a 10 ns clock cycle. 
> > 
> > 
> 
> I certainly don't like "few GHz" part.
> Distributing single multi-GHZ clock over full area of FPGA is non-starter from power perspective alone, but even ignoring the power, such distribution takes significant area making the whole proposition unattractive. As I understand it, the whole point is that this thingies take little area, so they are not harmful even for those buyers of device that don't utilize them at all or utilize very little.

There is no multi-GHz clock distribution.  These CPUs can be self timed.  The F18A is.  Think of asynchronous logic.  It's not literally asynchronous, but similar with internal delays setting the speed so all the internal logic works correctly.  The only clock would be whatever clock the rest of the logic is using. 

Think of these CPUs running from the clock generated by a ring oscillator in each CPU.  There would be a minimum CPU speed over PVT (Process, Voltage, Temperature).  That's all you need to make this work. 

> Alternatively, multi-GHZ clocks can be generated by local specialized PLLs, but I am afraid that PLLs would be several times bigger than cores themselves and need good non-noisy power supplies and grounds that are probably hard to get in the middle of the chip etc... I really know too little about PLLs, but I think that I know enough to conclude that it's not much better idea than chip-wide clock distribution at multi-GHZ.

That's the advantage of synchronizing at the interface rather than trying to run at lock step.  CPUs free run at some fast speed.  They sit waiting for data on a clock transition not clocking, using very little power.  On receiving the same clock edge the rest of the chip is using the CPU starts running, data previously generated is output (like a FF), data on the inputs is read, processed and the result is held while the CPU pends on the next clock edge again going into a sleep state.  

You can read how the F18A does it at an atomic level in the clock management.  The wake up is *very* fast.  

> My idea of small hard cores is completely different in that regard. IMHO, they should run either with the same clock as surrounding FPGA fabric or with clock, delivered by simple clock doubler. Even clock quadrupling does not appear as a good idea to my engineering intuition.

This would make the CPU ridiculously slow and not a good trade off for fabric logic.  

CPUs can be size efficient when they do a lot of sequential calculations.  This essentially takes advantage of the enormous multiplexer in the memory to allow it to replace a larger amount of logic.  But if the needs are faster than a slow processor can handle the processor needs to run at a much higher clock speed.  This allows an even higher space efficiency since now the logic in the CPU is executing more instructions in a single clock.  

So let a small CPU run a very high rates and synchronize at the system clock rate by handshaking just like a LUT/FF logic block without worrying about the fact that it is running a lot of instructions.  It just needs to run enough to get the job done.  The timing is like the logic in a data path between FFs.  I has to run fast enough to reach the next FF before the next clock edge.  It won't matter if it is faster.  So the CPU only needs a minimum spec on the internal clock speed. 

Rick C.

Reply by ●March 21, 20192019-03-21

On Thursday, March 21, 2019 at 3:37:14 AM UTC-4, David Brown wrote:
> On 21/03/2019 03:21, gnuarm.deletethisbit@gmail.com wrote:
> > On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:
> 
> > 
> >> I want to know if that is going to happen with your ideas here.
> >> Sure, you don't have a full business plan - but do you at least
> >> have thoughts about the kind of usage where these mini cpus would
> >> be a technologically superior choice compared to using state
> >> machines in VHDL (possibly generated with external programs),
> >> sequential logic generators (like C to HDL compilers, matlab tools,
> >> etc.), normal soft processors, or normal hard processors?
> > 
> > The point wasn't that I don't have a business plan.  The point was
> > that I haven't given this as much thought as would have been done if
> > I were working on a business plan.  I'm kicking around an idea.  I'm
> > not in a position to create FPGA with or without small CPUs.
> > 
> > 
> >> Give me a /reason/ to all this - rather than just saying you can
> >> make a simple stack-based cpu that's very small, so you could have
> >> lots of them on a chip.
> > 
> > Why?  Why don't you give ME a reason?  Why don't you switch your
> > point of view and figure out how this would be useful?  Neither of us
> > have anything to gain or lose.
> > 
> 
> I don't have any good ideas of what these might be used for.  And I 
> can't see how it ends up as /my/ responsibility to figure out why /your/ 
> idea might be a good idea.
> 
> You presented an idea - having several small, simple cpus on a chip. 
> It's taken a long time, and a lot of side-tracks, to drag out of you 
> what you are really thinking about.  (Perhaps you didn't have a clear 
> idea in your mind with your first post, and it has solidified underway - 
> in which case, great, and I'm glad the thread has been successful there.)
> 
> I've been trying to help by trying to look at how these might be used, 
> and how they compare to alternative existing solutions.  And I have been 
> trying to get /you/ to come up with some ideas about when they might be 
> useful.  All I'm getting is a lot of complaints, insults, condescension, 
> patronisation.  You tell me I don't understand what these are for - yet 
> you refuse to say what they are for (the nearest we have got in any post 
> in this thread to evidence that there is any use-case, is you telling me 
> you have ideas but refuse to tell me as I am not an FPGA designer by 
> profession).  You are forever telling me about the wonders of the F18A 
> and the GA144, and how I can't understand your ideas because I don't 
> understand that device - while simultaneously telling me that device is 
> irrelevant to your proposal.  You are asking for opinions and thoughts 
> about how people would program these devices, then tell me I am wrong 
> and closed-minded when I give you answers.
> 
> Hopefully, you have got /some/ ideas and thoughts out of this thread. 
> You can take a long, hard look at the idea in that light, and see if it 
> really is something that could be useful - in today's world with today's 
> tools and technology, or tomorrow's world with new tools and development 
> systems.
> 
> But next time you want to start a thread asking for ideas and opinions, 
> how about responding with phrases like "I hadn't thought of it that 
> way", "I think FPGA designers IME would like this" - not "You are wrong, 
> and clearly ignorant".
> 
> You are a smart guy, and you are great at answering other people's 
> questions and helping them out - but boy, are you bad at asking for help 
> yourself.

 I think if you go back and read, I said it all before.  But because there is a lot of new thinking involved, it was very hard to get you to understand what was being said rather than continue to look at it the way you have been looking at it for the last few decades. 

Rick C.

Reply by ●March 21, 20192019-03-21

On Thursday, March 21, 2019 at 5:40:30 AM UTC-4, Tom Gardner wrote:
> On 21/03/19 02:21, gnuarm.deletethisbit@gmail.com wrote:
> > On Wednesday, March 20, 2019 at 5:38:16 PM UTC-4, David Brown wrote:
> >> 
> >> I agree that software should not in itself create a problem.  Trying to 
> >> think of them as "logic" /would/ create problems.  Think of them as 
> >> software, and program them as software.  I expect you'd think of them as 
> >> entirely independent units with independent programs, rather than as a 
> >> multi-cpu or heterogeneous system.
> > 
> > Ok, please tell me what those problems would be.  I have no idea what you
> > mean by what you say.  You are likely reading a lot into this that I am not
> > intending.
> 
> I have no difficulty understanding what he is saying.
> 
> Several people have difficulty understanding what you
> are proposing.
> 
> You are proposing vague ideas, so the onus is on you
> to make your ideas clear.

There is no onus.  This is not a business proposal.  If you want to discuss it, do so.  If not, don't.  

If you can't tell me what your concerns are, I can't address them.  If no one can tell me what problems are being talked about by "Trying to think of them as "logic" /would/ create problems."  I can't possibly address those concerns. 


> >>> As to the connection, I really don't get your point.  They either connect
> >>> directly to the hardware because that's how they are designed, or they
> >>> don't... because that's how they are designed.  I don't know what you are
> >>> saying about that.
> >>> 
> >> 
> >> "Synchronise directly with hardware" might be a better phrase.
> > 
> > I don't know why and likely I'm' not going to care.  I think you need to
> > learn more of how the F18A works.
> 
> No, we really don't have to learn more about one specific
> processor - especially if it is just to help you.
> 
> If, OTOH, you succinctly summarise its key points and
> how that achieves benefits, then we might be interested.

I don't see a question.  Are you trying to teach me how to post in newsgroups?  lol 

Ask a question if you have one.  Explain something I've said that is wrong.  But if you don't have anything better to say, I can't help you. 


> >>> Enough!  The CPUs run software.  Now, what is YOUR point?
> >>> 
> >> 
> >> My point was that these are not logic, they are not logic elements (even if
> >> they could be physically small and cheap and scattered around a chip like
> >> logic elements).  Thinking about them as "sequential logic elements" is not
> >> helpful.  Think of them as small processors running simple and limited
> >> /software/.  Unless you can find a way to automatically generate code for
> >> them, then they will be programmed using a /software/ programming language,
> >> not a logic or hardware programming language.  If you are happy to accept
> >> that now, then great - we can move on.
> > 
> > You have it backwards.  Please show me what you think the problems are.  I
> > don't care if they run software or have a Maxwell demon tossing bits about as
> > long as it does what I need.  You seem to get hung up on terminology so
> > easily.
> 
> You need to explain your points better.
> 
> There's the old adage that "you only realise how little
> you know about a subject when you try to teach it to
> other people".

Which points?  I'm starting to think you are not here for the hunting.  


> > That is your construct because you know nothing of how the F18A works.  As
> > I've mentioned before, you would do well to read some of the app notes on
> > this device.  It really does have some good ideas to offer.
> 
> Give us the elevator pitch, so we can estimate whether
> it would be a beneficial use of our remaining life.

If you don't have any idea what I'm talking about at this point, an elevator pitch won't help.  


> > The point wasn't that I don't have a business plan.  The point was that I
> > haven't given this as much thought as would have been done if I were working
> > on a business plan.  I'm kicking around an idea.  I'm not in a position to
> > create FPGA with or without small CPUs.
> > 
> > 
> >> Give me a /reason/ to all this - rather than just saying you can make a 
> >> simple stack-based cpu that's very small, so you could have lots of them on
> >> a chip.
> > 
> > Why?  Why don't you give ME a reason?  Why don't you switch your point of
> > view and figure out how this would be useful?  Neither of us have anything to
> > gain or lose.
> 
> Why? Because you are trying to propagate your ideas.
> The onus is on you to convince us, not the other way
> around.

No, I'm trying to discuss an idea.  If you don't wish to discuss the idea, then that's fine.  

Rick C.

Reply by ●March 21, 20192019-03-21

On Thursday, March 21, 2019 at 6:49:11 AM UTC-4, Theo wrote:
> gnuarm.deletethisbit@gmail.com wrote:
> > On Wednesday, March 20, 2019 at 6:53:07 AM UTC-4, Theo wrote:
> > > Your bottom-up approach means it's difficult to see the big picture of
> > > what's going on.  That means it's hard to understand the whole system, and
> > > to program from a whole-system perspective.
> > 
> > I never mentioned a bottom up or a top down approach to design.  Nothing
> > about using these small CPUs is about the design "direction".  I am pretty
> > sure that you have to define the circuit they will work in before you can
> > start designing the code.
> 
> Your approach is 'I have this low-level thing (a tiny CPU), what can I use
> it for?'.  That's bottom up.  A top down view would be 'my problem is X,
> what's the best way to solve it?'.  The advantage of the latter view is you
> can explore some of the architectural space before targeting a solution
> that's appropriate to the problem (with metrics to measure it), aiming to
> find the global maximum.  In a bottom-up approach you need to sell to users
> that your idea will help their problem, but until you build a system they
> don't know that it will even be a local maximum.

I'm not designing anything so I can't be designing bottom up.  I'm not selling anything, so I don't have users.  

I'm discussing an idea.  I'm kicking a can.  I'm running a flag up the flag pole. 

If you aren't interested in discussing this, then that's ok.  But there's no point at all in having a meta-discussion. 


> > > What's the logic equation of a processor?  
> > 
> > Obviously it is like a combination of LUTs with FFs and able to implement
> > any logic you wish including math.  BTW, in many devices the elements are
> > not at all so simple.  Xilinx LUTs can be used as shift registers.  There
> > are additional logic within the logic blocks that allow math with carry
> > chains, combining LUTs to form larger LUTs, breaking LUTs into smaller
> > LUTs and lets not forget about routing which may not be used much anymore,
> > not sure.
> 
> You can still reason about blocks as combinations of basic functions.  A
> block that is LUT+FF can still be analysed in separate parts.
> A processor is a 'black box' as far as the tools go.  That means any
> software is opaque to analysis of correctness.  The tools therefore can't
> know that the circuit they produced matches the input HDL.

"Correctness" in what sense?  I've never worked with tools that could analyze my HDL to tell me if it was logically correct.  I really have no idea what you are talking about here.  I also don't see the point of your pointing out the LUT can be separate from the FF in a LUT/FF combination.  You can model the CPU as a large LUT with FFs.  It can do the same job.  The FF can be removed.  The logic can be removed.  Whatever analysis that can be done on the LUT/FF can be applied to the CPU.  

If you want to verify the "correctness" of parts of a design my inspection, I would expect that to be done on the HDL anyway, not on the generated logic... unless you thought the tools were suspect. 


> Simulation does not give you equivalence checking of the form of LVS (layout
> versus schematic) or compiler correctness testing, it only tests a
> particular set of (usually hand-defined) test cases.  There's much less
> coverage than equivalence checking tools.

So those techniques can't be applied to software?  


> > Why does it need to be inferred.  If you want to write an HDL tool to turn
> > HDL into processor code, have at it.  But then there are other methods. 
> > Someone mentioned his MO is to use other tools for designing his
> > algorithms and letting that tool generate the software for a processor or
> > the HDL for an FPGA.  That would seem easy enough to integrate.
> 
> That's roughly what OpenCL and friends can do.  But those are top-down
> architecturally (starting with a chip block diagram), rather than starting
> with tiny building blocks as you're suggesting.
> 
> > Huh?  You can't simulate code on a processor???
> 
> Verification is greater than simulation, as described above.
> 
> > > If we scale the processors up a bit, I could see the merits in say a
> > > bank of, say, 32 Cortex M0s that could be interconnected as part of the
> > > FPGA fabric and programmed in software for dedicated tasks (for
> > > instance, read the I2C EEPROM on the DRAM DIMM and configure the DRAM
> > > controller at boot).
> > 
> > I don't follow your logic.  What is different about the ARM processor from
> > the stack processor other than that it is larger and slower and requires a
> > royalty on each one?  Are you talking about writing the code in C vs. 
> > what ever is used for the stack processor?
> 
> If you have an existing codebase (supplied by the vendor of your external
> chip, for example), it'll likely be in C.  It won't be in
> special-stack-assembler, and your architecture seems to be designed to not
> be amenable to compilers.

You can write any compiler you want.  I don't know what libraries you would be using to replace FPGA logic with software.  Are we talking about print statements?  

How do you port C libraries to logic in an FPGA now?  Do it the same way. 


> > The point of the many hard cores is the saving of resources.  Soft cores
> > would be the most wasteful way to implement logic.  If the application is
> > large enough they can implement things in software that aren't as
> > practical in HDL, but that would be a different class of logic from the
> > tiny CPUs I'm talking about.
> 
> 'Wastefulness' is one parameter.  But you can also consider that every
> unused hard-core is also wasteful in terms of silicon area.  Can you show
> that the hard-cores would be used enough of the time to outweigh the space
> they waste on other people's designs?

That assumes some number of CPUs on the FPGA.  We don't have those numbers.  We also don't have any real data on how large a logic block is in an FPGA, at least I don't.  y

I think you are making silly points when we are discussing a concept.  Of course we won't have the sort of data you are talking about. 


> > You lost me with the gear shift.  The mention of instruction rate is about
> > the CPU being fast enough to keep up with FPGA logic.  The issue with
> > "heterogeneous performance" is the "heterogeneous" part, lumping the many
> > CPUs together to create some sort of number cruncher.  That's not what
> > this is about.  Like in the GA144, I fully expect most CPUs to be sitting
> > around most of the time idling, waiting for data.  This is a good thing
> > actually.  These CPUs could consume significant current if they run at GHz
> > all the time.  I believe in the GA144 at that slower rate each processor
> > can use around 2.5 mA.  Not sure if a smaller process would use more or
> > less power when running flat out.  It's been too many years since I worked
> > with those sorts of numbers.
> 
> OK, so once we drop any idea of MIPS, we're talking about something simpler
> than a Cortex M0.  You should be able to make a design that clocks at a few
> hundred MHz on an FPGA process.  

I don't think a few hundred MIPS is fast enough to actually be useful.  GIPS is required.  


> You could choose to run it synchronously
> with your FPGA logic, or on an internal clock and synchronise inputs and
> outputs.  You probably wouldn't tile these, but you could deploy them as a
> 'hardware thread' in places you need a complicated state machine.

A state machine is one application.  But I don't see them being limited in any way in replacing logic other than logic that is too small for this to be efficient. 

Xilinx makes a big deal of their shift registers from a LUT.  I've seen designs where many stages of shift register were needed.  This CPU could replace a large number of those running at some hundreds of MHz data clock rate. 


> > > In essence, your proposal has a disconnect between the situations existing
> > > FPGA blocks are used (implemented automatically by P&R tools) and the
> > > situations software is currently used (human-driven software and
> > > architectural design).  It's unclear how you claim to bridge this gap.
> > 
> > I certainly don't see how P&R tools would be a problem.  They accommodate
> > multipliers, DSP blocks, memory block and many, many special bits of
> > assorted components inside the FPGAs which vary from vendor to vendor. 
> > Clock generators and distribution is pretty unique to each manufacturer. 
> > Lattice has all sorts of modules to offer like I2C and embedded Flash. 
> > Then there are entire CPUs embedded in FPGAs.  Why would supporting them
> > be so different from what I am talking about?
> 
> If this is a module that the tools have no visibility over, ie just a blob
> with inputs and outputs, then they can implement that.  

Why no visibility? 


> In that instance
> there is a manageability problem - beyond a handful of processes, writing
> heterogeneous distributed software is hard.  Unless each processor is doing
> a very small, well-defined, task, I think the chances of bugs are high.

You need to explain to me what is hard about *this*.  Giving it a label and then saying anything with that label is hard doesn't mean much.  I don't think the label fits. 


> If instead you want interaction with the toolchain in terms of
> generating/checking the software running on such cores, that's also
> problematic.

I don't follow.  In the design it's logic.  You keep trying to think of it the way you think of all software.  It's logic.  Inputs and outputs.  You only need to dig into the code after you find there is something wrong with the mapping of inputs to outputs like any other logic module.  Presumably the code would have been simulated with appropriate inputs and outputs. 


> I hadn't seen Picoblaze before, but that seems a strong fit with what you're
> suggesting.  So a question: why isn't it more successful?  And why isn't
> Xilinx putting hard Picoblazes into their FPGAs, which they could do
> tomorrow if they felt the need?

More successful than what?  The Volkswagen Beetle?  

I can't explain much of what Xilinx does except they respond to their largest customers who pay thousands of dollars for a single FPGA chip.  They say what goes into Xilinx FPGAs and the rest of us are tag-alongs.  Literally. 

Rick C.

Previous 4 567 8 Next

Tiny CPUs for Slow Logic

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group