FPGARelated.com
Forums

Implementing Multi-Processor Systems in FPGAs

Started by Paul Hartke February 24, 2005
I finally got around to watching the "Implementing Multi-Processor Systems
in FPGAs" TechOnLine Webcast:
http://seminar2.techonline.com/s/altera_feb1005
http://seminar2.techonline.com/~additionalresources/altera_feb1005/altera_Feb10_slides_edited.pdf

Based on this seminar and my own imagination, I can envision quite a lot of
potential usage models.  However, I was wondering to what level folks are
_actually_ using multiple soft-core processors in an FPGA for their
commercial, academic, and/or personal projects right now.

What is the overall architecture--how many processors are used?  How do the
processors coordinate their activities?  How is data processing distributed
across them?  Is the code/data stored in on-chip memory or externally?

Thanks.
Paul
I have one of the Stratix II development kit with the EP2S60. Just out
of curosity I wanted to see if how many processors I could fit on a
device like that. I was able to add sixteen of the fast processors. I
gave each a little on chip RAM and ROM to bootstrap with. And I
implemented a SDRAM controller. That used up approximatel 78 % of the
device. The discouraging factor from my little experiment was that SOPC
Builder became almost impossibly slow. SOPC Builder is a JAVA
application. It spent a lot of time looking for conflicts.

When I have the time my next little experiment is to create a working
dual or quad processor. My long term goal is create a single chip,
multiprocessor, 3d - graphics system with a similar api as OpenGL.

Without having researched or done it yet I would have to say that the
system controller and memory is implementation dependent on the
designer.

Derek

DerekSimmons@FrontierNet.net wrote:
> I have one of the Stratix II development kit with the EP2S60. Just
out
> of curosity I wanted to see if how many processors I could fit on a > device like that. I was able to add sixteen of the fast processors. I > gave each a little on chip RAM and ROM to bootstrap with. And I > implemented a SDRAM controller. That used up approximatel 78 % of the > device. The discouraging factor from my little experiment was that
SOPC
> Builder became almost impossibly slow. SOPC Builder is a JAVA > application. It spent a lot of time looking for conflicts. > > When I have the time my next little experiment is to create a working > dual or quad processor. My long term goal is create a single chip, > multiprocessor, 3d - graphics system with a similar api as OpenGL. > > Without having researched or done it yet I would have to say that the > system controller and memory is implementation dependent on the > designer.
While the very largest FPGAs look like they could hold perhaps 1 cpu per BlockRam (perhaps even upto 100 or more), I think the middle size part will be a better fit, price closer to the min possible per BlockRam in vol, more BlockRams per 1KLuts, more total system IOs etc and more place to throw off heat and closer to higher vol price. The other question is what type of cpu arch to use, answer seems obvious to me, one that supports concurrency right in its architecture rather than foisting it on top of something with no idea what a process is. So far only the Transputer has shown how easy it is to put together 100s of cpus and how to program them. Ofcourse in an FPGA it would have to be a modern register style ld/st RISC. So what are you doing with your 160TP array and how would the perf compare with the same app running on 1 2-3 GHz x86, and did you check out the other NG? regards johnjakson at usa dot com
> > Derek
For me to answer your question let me tell you a little bit about
myself. In the fall of 1987 I entered college at RIT. I was exposed to
a lot of new computer hardware. Growing up I was exposed to computers
designed for data processing. I bought a Commodore Amiga to do my
school work on and it turned out to be an excellent choice because it
allowed me to work files from IBM PC and Apple Macintosh environments.
Remember at this time IBM's were still primarily CGA (4 colors -
cyan, white, magenta and black) and Macintosh's were black and white.
Commodore Amiga had a quasi-12-bit color mode called HAM. For
recreation one of the first freeware applications I discovered
raytracers. The Commodore Amiga was a 16/32-bit MC68000 at about 14 Mhz
(IBMs were 16, 20, 25 and Mac was 8 Mhz). In some of my free time
between classes I spent time at the library researching different ways
to accelerate raytracing. The first and most obvious way was to buy an
accelerator or co-processor card with a faster processor and floating
point co-processor. I think it was in byte magazine I saw an article on
Transputers and I had read articles on transputer products being
developed for the Amiga. I saved my money while waited for the products
to be completed but eventually the projects were canceled. Late one
winter with the money saved I bought a CSA Education Kit. I could
compile and run transputer applications on an IBM bridge card and the
copy them to the Amiga file system and view them from the Workbench
desktop. I also made it a habit of visiting Rochester's surplus shops
and through dump luck I found a factory tray of eight T800s. The guy
who ran the shop didn't know what they were, seeing that they were
gold told me he would have to charge me a premium for them, $10. Using
a Vector prototyping board I connected the eight processors to the CSA
card. I just wired them up so that they could properly reset. I
didn't have money buy any memory so I just used the on chip ram. I
could implement a very small raytracer and when I out grew the memory
of one processor I would pair them up. Eventually I had a tightly
coupled processor made up of an arrangement of 8 transputers in a cube
topology. I think it was about a year later I was a HAM radio flea
market found my next upgrade. This guy and his son brought a real truck
load of junk. I remember him have bar code scanners, data entry pads,
and parts of old telephone system. One of things I found was a black PC
expansion case. The front was ripped off, on the back I could see the
rows of 37 pin connectors and through the vents I could see the tops of
gold chips. I asked him how much it was. He told me it was marked and
came over and found the price for me. He charged me $20 for it. The
friend with asked me what I bought and I told him I'm not sure but
I'll show you. We took it back to the car where I removed the top.
Inside where 5 CSA 4 transputer boards, a crossbar board, an INMOS B008
with the graphics TRAM and who ever had it had tucked the cable for the
graphics TRAM inside. My transputer setup had moved from the Amiga to a
dedicated Everex Step 386/33 Mhz. My raytracer evolved into a hypercube
and I was able to let the main rendering routine recurse more or I
added on more features. As time went on, the topology evolved into a
sophisticated pipe line. A few years after graduating from college I
started buying them through eBay. My system is split between an
industrial PC, the old black PC expansion case and a VME cabinet. The
last time I spent anytime doing anything with I was having problems
with the worm program that maps the network. I could determine if the
network had gotten so big it was timing out before it had finished
discover the network or if there was a hardware failure. I do follow
the other news group (comp.sys.transputer). I haven't compared it to
a modern PC, currently it I have a PIII 500 Mhz laptop and dual 733 Mhz
desktop. But it would require a rewrite to take advantage of the PC
threading architecture.

I bought the NIOS II Development kit because I liked the development
tools and I can see the potential for doing the same kind of things
that I have done with transputers. I bought the kit and a Lancelot
video adaptor. I plan on developing a 3D graphics core for it with a
similar api to OpenGL with intentions of making it into a commercial
product. With the Stratix II development board, I see the SDRAM as the
biggest bottle neck. I have sketched out an elaborate buffering system
that should alleviate this. I would also like to be able to configure
the resolution and color depth from software. When I roll it over as a
core the wizard would give the engineer the option of letting it be
programmable with default values or hard code the settings.

I have been poking around the couple of days about and have found a
couple of post about engineers implementing multi-processor systems. I
would say have of them sounded like student projects. If anybody has
implemented multi-processors systems I would like to hear about their
experiences and any after thoughts from the experience. Since a lot of
this is still new to me, I'm still at the steep part of the learning
curve, I would appreciate if anybody has any projects that they can
share with me.

Derek

Wierd story, been along time since I went to junk stores, l used to buy
Plasma display tubes, TTL & cmos rams 20yrs ago but after getting into
VLSI (at Inmos on the Transputer) never actually built anything outside
the chip. But FPGAs allow an old VLSI guy without his own fab to do
something only a company with a Fab could do 5-10yrs ago.

I contemplated trying to turn MicroBlaze and perhaps Nios into
Transputer replacements by adding on extra HW but came away thinking it
would be better to start over with sail set in the right direction day
1. The benchmarks posted in the "NiosII Vs MicroBlaze thread" for Leon,
MicroBlaze & Opencores 1200 would seem to justify my pt but I am not
complete yet.

Good luck with your MPP endevours too!

regards

johnjakson at usa dot com

The Transputer Will be back (T2 movie)

JJ wrote:
> Wierd story, been along time since I went to junk stores, l used to buy > Plasma display tubes, TTL & cmos rams 20yrs ago but after getting into > VLSI (at Inmos on the Transputer) never actually built anything outside > the chip. But FPGAs allow an old VLSI guy without his own fab to do > something only a company with a Fab could do 5-10yrs ago. > > I contemplated trying to turn MicroBlaze and perhaps Nios into > Transputer replacements by adding on extra HW but came away thinking it > would be better to start over with sail set in the right direction day > 1. The benchmarks posted in the "NiosII Vs MicroBlaze thread" for Leon, > MicroBlaze & Opencores 1200 would seem to justify my pt but I am not > complete yet. > > Good luck with your MPP endevours too! > > regards > > johnjakson at usa dot com > > The Transputer Will be back (T2 movie) >
speaking of transputers, does enough documentation exist to accurately reproduce them in a fpga?
Yes sort of, see see the comp.sys.transputer NG

FPGA thread status (Rams post), at the last Wotug conf Tanaka etc
reported on a 24MHz near complete T425 clone cycle similar design, no
timer though, no FPU ofcourse. Its their 1st step to understanding a
new direction to build TP style design. I  decided to skip this step,
Occam capable cpus don't need to look like the old stack design and
shouldn't for perf reasons.

http://www.wotug.org/cpa2004/papers/361-tanaka.pdf

Interesting read anyway. A few months ago on another TP thread, another
student said he would do the same thing, but reverse engineering takes
alot of resources that Tanaka had at his Uni.

regards

johnjakson at usa dot com

JJ wrote:
> Yes sort of, see see the comp.sys.transputer NG > > FPGA thread status (Rams post), at the last Wotug conf Tanaka etc > reported on a 24MHz near complete T425 clone cycle similar design, no > timer though, no FPU ofcourse. Its their 1st step to understanding a > new direction to build TP style design. I decided to skip this step, > Occam capable cpus don't need to look like the old stack design and > shouldn't for perf reasons. > > http://www.wotug.org/cpa2004/papers/361-tanaka.pdf > > Interesting read anyway. A few months ago on another TP thread, another > student said he would do the same thing, but reverse engineering takes > alot of resources that Tanaka had at his Uni. > > regards > > johnjakson at usa dot com >
Thanks, ill have to keep an eye out there.. I have an old Buchsbaum book that discusses 'current' technology CPU's ( current when i bought the thing ) and it also discussed the Transputer T800, but it never did seem to have enough detail to recreate it.. Back then i was going to do it in a 8051 ( yes, i know about speed issues ) since FPGAs really didnt exist yet...
Seems like you like Transputers as well.
In my master thesis I built transputer boards and liked the hardware.
Occam was a bit weird but functional for the purpose.

If you liked the transputer links, you will also like the MicroBlaze FSL 
connections. It will allow you to built the same kind of systems but with higher 
bandwidth. The FSL are 32-bit wide compared to 4-bit on the transputer link.

G�ran Bilski

DerekSimmons@FrontierNet.net wrote:
> For me to answer your question let me tell you a little bit about > myself. In the fall of 1987 I entered college at RIT. I was exposed to > a lot of new computer hardware. Growing up I was exposed to computers > designed for data processing. I bought a Commodore Amiga to do my > school work on and it turned out to be an excellent choice because it > allowed me to work files from IBM PC and Apple Macintosh environments. > Remember at this time IBM's were still primarily CGA (4 colors - > cyan, white, magenta and black) and Macintosh's were black and white. > Commodore Amiga had a quasi-12-bit color mode called HAM. For > recreation one of the first freeware applications I discovered > raytracers. The Commodore Amiga was a 16/32-bit MC68000 at about 14 Mhz > (IBMs were 16, 20, 25 and Mac was 8 Mhz). In some of my free time > between classes I spent time at the library researching different ways > to accelerate raytracing. The first and most obvious way was to buy an > accelerator or co-processor card with a faster processor and floating > point co-processor. I think it was in byte magazine I saw an article on > Transputers and I had read articles on transputer products being > developed for the Amiga. I saved my money while waited for the products > to be completed but eventually the projects were canceled. Late one > winter with the money saved I bought a CSA Education Kit. I could > compile and run transputer applications on an IBM bridge card and the > copy them to the Amiga file system and view them from the Workbench > desktop. I also made it a habit of visiting Rochester's surplus shops > and through dump luck I found a factory tray of eight T800s. The guy > who ran the shop didn't know what they were, seeing that they were > gold told me he would have to charge me a premium for them, $10. Using > a Vector prototyping board I connected the eight processors to the CSA > card. I just wired them up so that they could properly reset. I > didn't have money buy any memory so I just used the on chip ram. I > could implement a very small raytracer and when I out grew the memory > of one processor I would pair them up. Eventually I had a tightly > coupled processor made up of an arrangement of 8 transputers in a cube > topology. I think it was about a year later I was a HAM radio flea > market found my next upgrade. This guy and his son brought a real truck > load of junk. I remember him have bar code scanners, data entry pads, > and parts of old telephone system. One of things I found was a black PC > expansion case. The front was ripped off, on the back I could see the > rows of 37 pin connectors and through the vents I could see the tops of > gold chips. I asked him how much it was. He told me it was marked and > came over and found the price for me. He charged me $20 for it. The > friend with asked me what I bought and I told him I'm not sure but > I'll show you. We took it back to the car where I removed the top. > Inside where 5 CSA 4 transputer boards, a crossbar board, an INMOS B008 > with the graphics TRAM and who ever had it had tucked the cable for the > graphics TRAM inside. My transputer setup had moved from the Amiga to a > dedicated Everex Step 386/33 Mhz. My raytracer evolved into a hypercube > and I was able to let the main rendering routine recurse more or I > added on more features. As time went on, the topology evolved into a > sophisticated pipe line. A few years after graduating from college I > started buying them through eBay. My system is split between an > industrial PC, the old black PC expansion case and a VME cabinet. The > last time I spent anytime doing anything with I was having problems > with the worm program that maps the network. I could determine if the > network had gotten so big it was timing out before it had finished > discover the network or if there was a hardware failure. I do follow > the other news group (comp.sys.transputer). I haven't compared it to > a modern PC, currently it I have a PIII 500 Mhz laptop and dual 733 Mhz > desktop. But it would require a rewrite to take advantage of the PC > threading architecture. > > I bought the NIOS II Development kit because I liked the development > tools and I can see the potential for doing the same kind of things > that I have done with transputers. I bought the kit and a Lancelot > video adaptor. I plan on developing a 3D graphics core for it with a > similar api to OpenGL with intentions of making it into a commercial > product. With the Stratix II development board, I see the SDRAM as the > biggest bottle neck. I have sketched out an elaborate buffering system > that should alleviate this. I would also like to be able to configure > the resolution and color depth from software. When I roll it over as a > core the wizard would give the engineer the option of letting it be > programmable with default values or hard code the settings. > > I have been poking around the couple of days about and have found a > couple of post about engineers implementing multi-processor systems. I > would say have of them sounded like student projects. If anybody has > implemented multi-processors systems I would like to hear about their > experiences and any after thoughts from the experience. Since a lot of > this is still new to me, I'm still at the steep part of the learning > curve, I would appreciate if anybody has any projects that they can > share with me. > > Derek >
I've been involved in a tool for software development for just thes
sort of architectures.  It turns out that the hardware design o
Transputer-style connected multiprocessors is relatively simple, bu
the software development can be a challenge.  The good news is tha
with all on-chip communication, you can exploit parallelism tha
board-level multiprocessors can't.  Have a look a
http://www.cmpware.com/ for more info.  (This is aimed at ASIC folks
but FPGAs work, too.  In fact, NIOS I, NIOS II and microblaze model
are already available)

-- Stev
-- 3/3/0