FPGARelated.com
Forums

JOP as SOPC component

Started by Martin Schoeberl August 11, 2006
The last days I played around with the Quartus SOPC builder [1].
Although I'm more a batch/make guy, I'm impressed by the easy to use
tool. In order to scratch a little bit on the dominance of the NIOS II
in the SOPC world I wrapped JOP [2] into an Avalon component ;-)

So if you are interested to give a Java processor in the SOPC
environment a try...

There is are two ready to use Quartus projects available: a generic
version for a 256Kx16 SRAM and a version for the Altera DE2 board [3].

If you have the DE2 board follow the steps to build JOP:

    1. check out the actual sources from opencores

        cvs -d :pserver:anonymous@cvs.opencores.org:/cvsroot/anonymous -z9 co -P jop

    2. Open the Quartus project .../quartus/altde2/jop.qpf
    3. Start Tools - SOPC Builder...
    4. Press Generate
    5. Press Exit and close Quartus
    6. Edit the following lines in the Makefile:
        BLASTER_TYPE=USB-Blaster
        QPROJ=altde2
    7. and then run the usual make

    .... and it should run the Java/JOP Hello World ;-)

A detailed design flow document (without the SOPC builder part) is
available at [4].

What's cool for me: I get the SDRAM controller for free and
have a JOP system with 8 MB! Now it's time to write some 'big'
embedded Java applications ;-)

However, of course there is some drawback. The performance of the
Avalon system is lower than a 'native' connection (or in my case
via SimpCon [5]) of the main memory to the CPU. I can provide some
numbers if there is interest...

BTW: The Cyclone II FPGA cannot be clocked really faster than the
Cyclone (just a few %). I hoped to get some speed-up for free due
to a new generation FPGA :-(

So, happy JOP SOPC building,

Martin

[1] http://www.altera.com/products/software/products/dsp/dsp-builder.html
[2] http://www.jopdesign.com/
[3] http://www.altera.com/education/univ/materials/boards/unv-de2-board.html
[4] http://www.jopdesign.com/doc/build.pdf
[5] http://www.opencores.org/projects.cgi/web/simpcon/overview 


Martin Schoeberl wrote:
> The last days I played around with the Quartus SOPC builder [1]. > Although I'm more a batch/make guy, I'm impressed by the easy to use > tool. In order to scratch a little bit on the dominance of the NIOS II > in the SOPC world I wrapped JOP [2] into an Avalon component ;-)
Kudos, that is excellent. Any lessons/gotchas about turning JOP into an SOPC components should someone else fancy a similar undertaking?
> However, of course there is some drawback. The performance of the > Avalon system is lower than a 'native' connection (or in my case > via SimpCon [5]) of the main memory to the CPU. I can provide some > numbers if there is interest...
Care to elaborate? I'd expect going over Avalon could add latency, but if you can exploit multiple outstanding transactions (aka "posted reads") and/or bust transfers, the bandwidth should be the same as "native".
> BTW: The Cyclone II FPGA cannot be clocked really faster than the > Cyclone (just a few %). I hoped to get some speed-up for free due > to a new generation FPGA :-(
I was surprised too when I saw that. I gather the only way the Cyclone II can gain you speed over Cyclone I is when you can use the embedded multipliers. Makes me wonder about the upcoming Cyclone III. Tommy
>> The last days I played around with the Quartus SOPC builder [1]. >> Although I'm more a batch/make guy, I'm impressed by the easy to use >> tool. In order to scratch a little bit on the dominance of the NIOS II >> in the SOPC world I wrapped JOP [2] into an Avalon component ;-) > > Kudos, that is excellent. Any lessons/gotchas about turning JOP into an > SOPC components should someone else fancy a similar undertaking?
The Avalon bus is very flexible. Therefore, writing a slave or master (SOPC component) is not that hard. The magic is in the Avalon switch fabric generated by the builder. However, an example would have helped (Altera listening?). I didn't find anything on Altera's website or with Google. Now a very simple slave can be found at [1]. One thing to take care: When you (like me) like to avoid VHDL files in the Quartus directory you can easily end up with three copies of your design files. Can get confusing which one to edit. When you edit your VHDL file in the component directory (the source for the SOPC builder) don't forget to rebuild your system. The build process copies it to your Quartus project directory. When you want to start over with a clean project the only files needed for the project are: .qpf, .qsf, .ptf The master is also ease: just address, read and write data, read/write and you have to react to waitrequest. See as example the SimpCon/Avalon bridge at [2]. The Avalon interconnect fabric handles all bus multiplexing, bus resizing, and control signal translation.
>> However, of course there is some drawback. The performance of the >> Avalon system is lower than a 'native' connection (or in my case >> via SimpCon [5]) of the main memory to the CPU. I can provide some >> numbers if there is interest... > > Care to elaborate? I'd expect going over Avalon could add latency, but > if you can exploit multiple outstanding transactions (aka "posted > reads") and/or bust transfers, the bandwidth should be the same as > "native".
Yes, the latency is the issue for JOP. JOP does not trigger several read or write transactions. However, it can trigger one transaction and than continue to execute microcode. When the (read) result is needed, the JOP pipeline is stopped till the result is available. What helps is to know in advance (one or two cycles) when the result will be available. That's the trick with the SimpCon interface. There is not a single ack or waitrequest signal, but a counter that will say how many cycles it will take to provide the result. In this case I can restart the pipeline earlier. Another point is, in my opinion, the wrong role who has to hold data for more than one cycle. This is true for several busses (e.g. also Wishbone). For these busses the master has to hold address and write data till the slave is ready. This is a result from the backplane bus thinking. In an SoC the slave can easily register those signals when needed longer and the master can continue. On the other hand, as JOP continues to execute and it is not so clear when the result is read, the slave should hold the data when available. That is easy to implement, but Wishbone and Avalon specify just a single cycle data valid.
>> BTW: The Cyclone II FPGA cannot be clocked really faster than the >> Cyclone (just a few %). I hoped to get some speed-up for free due >> to a new generation FPGA :-( > > I was surprised too when I saw that. I gather the only way the Cyclone > II can gain you speed over Cyclone I is when you can use the embedded > multipliers. Makes me wonder about the upcoming Cyclone III.
Are there any other data available on that. I did not find many comments in this group on experiences with Cyclone I and II. Looks like the CII was more optimized for cost than speed. Yes, waiting for III ;-) Martin [1] http://www.opencores.org/cvsweb.cgi/~checkout~/jop/sopc/components/avalon_test_slave/hdl/avalon_test_slave.vhd [2] http://www.opencores.org/cvsweb.cgi/~checkout~/jop/vhdl/scio/sc2avalon.vhd
"Martin Schoeberl" <mschoebe@mail.tuwien.ac.at> schrieb im Newsbeitrag 
news:44ddb2d4$0$8024$3b214f66@tunews.univie.ac.at...
>>> The last days I played around with the Quartus SOPC builder [1]. >>> Although I'm more a batch/make guy, I'm impressed by the easy to use >>> tool. In order to scratch a little bit on the dominance of the NIOS II >>> in the SOPC world I wrapped JOP [2] into an Avalon component ;-) >> >> Kudos, that is excellent. Any lessons/gotchas about turning JOP into an >> SOPC components should someone else fancy a similar undertaking? > > The Avalon bus is very flexible. Therefore, writing a slave or > master (SOPC component) is not that hard. The magic is in the Avalon > switch fabric generated by the builder. However, an example would > have helped (Altera listening?). I didn't find anything on Altera's > website or with Google. Now a very simple slave can be found at [1]. > > One thing to take care: When you (like me) like to avoid VHDL files > in the Quartus directory you can easily end up with three copies of > your design files. Can get confusing which one to edit. When you > edit your VHDL file in the component directory (the source for the > SOPC builder) don't forget to rebuild your system. The build process > copies it to your Quartus project directory.
Hi Martin, most of the SOPC magin happens in the perl package "Europe" ASFAIK. dont expect a lot of information about the internals of the package. as very simple example for avalon master-slave type of peripherals there is on free avalon IP core for SD-card support the core can be found at some russian forum and later it was also added to the user ip section of the microtronix forums. the avalon master is really as simple as the slave. Antti
Hi Antti,

> most of the SOPC magin happens in the perl package "Europe" ASFAIK. > dont expect a lot of information about the internals of the package.
That's fine for me. When the connection magic happens and I don't have to care it's fine. OK, one exception: Perhaps I would like to know more details on the latency. The switch fabric is 'plain' VHdL or Verilog. However, generated code is very hard to read.
> as very simple example for avalon master-slave type of peripherals there > is on free avalon IP core for SD-card support the core can be found > at some russian forum and later it was also added to the user ip > section of the microtronix forums.
Any link handy for this example?
> the avalon master is really as simple as the slave.
Almost, you have to hold address, data and read/write active as long as waitrequest is pending. I don't like this, see above. In my case e.g. the address from JOP (= top of stack) is valid only for a single cycle. To avoid one more cycle latency I present in the first cycle the TOS and register it. For additional wait cycles a MUX switches from TOS to the address register. I know this is a slight violation of the Avalon specification. There can be some glitches on the MUX switch. For synchronous on-chip peripherals this is absolute not issue. However, this signals are also used for off-chip asynchronous peripherals (SRAM). However, I assume that this possible switching glitches are not really seen on the output pins (or at the SRAM input). Martin
"Martin Schoeberl" <mschoebe@mail.tuwien.ac.at> schrieb im Newsbeitrag 
news:44ddc530$0$11352$3b214f66@tunews.univie.ac.at...
> Hi Antti, > >> most of the SOPC magin happens in the perl package "Europe" ASFAIK. >> dont expect a lot of information about the internals of the package. > > That's fine for me. When the connection magic happens and I don't > have to care it's fine. OK, one exception: Perhaps I would like > to know more details on the latency. The switch fabric is 'plain' > VHdL or Verilog. However, generated code is very hard to read. > >> as very simple example for avalon master-slave type of peripherals there >> is on free avalon IP core for SD-card support the core can be found >> at some russian forum and later it was also added to the user ip >> section of the microtronix forums. > > Any link handy for this example? >
http://forum.niosforum.com/forum/index.php?showtopic=4430 antti
"Martin Schoeberl" <mschoebe@mail.tuwien.ac.at> wrote in message 
news:44ddb2d4$0$8024$3b214f66@tunews.univie.ac.at...
> The Avalon bus is very flexible. Therefore, writing a slave or > master (SOPC component) is not that hard. The magic is in the Avalon > switch fabric generated by the builder. However, an example would > have helped (Altera listening?). I didn't find anything on Altera's > website or with Google. Now a very simple slave can be found at [1]. >
As you get into making your own components you'll find a lack of documentation about important things that go into the .PTF file. Altera used to have a document on their website that was invaluable called the "PTF File Reference Manual" (or something like that). They've chosen to pull that out so your only source for crucial information now is your FAE (maybe) or someone who happens to have that file available. I've complained to Altera to no avail that they need to put that document back and maintain it or at least make it available upon request to component developers. Maybe others also complaining will help as well (hint).
> One thing to take care: When you (like me) like to avoid VHDL files > in the Quartus directory you can easily end up with three copies of > your design files. Can get confusing which one to edit. When you > edit your VHDL file in the component directory (the source for the > SOPC builder) don't forget to rebuild your system. The build process > copies it to your Quartus project directory. >
Damn annoying too of the tool to do those copies like it does. You have to be very careful about which file you edit as being the 'source' or it will get overwritten because it really isn't.
> The master is also ease: just address, read and write data, > read/write and you have to react to waitrequest. See as example the > SimpCon/Avalon bridge at [2]. The Avalon interconnect fabric handles > all bus multiplexing, bus resizing, and control signal translation. >
If you're going for a very high speed design and you have multiple masters accessing a slave (i.e. multiple CPUs, or DMA controllers accessing memory) the performance degrades rather quickly using SOPC Builder to perform the arbitration. You don't necessarily need a large number of masters either, 4-5 killed it for me and necessitated redesign to work around how Avalon handled things.
> Another point is, in my opinion, the wrong role who has to hold data > for more than one cycle. This is true for several busses (e.g. also > Wishbone). For these busses the master has to hold address and write > data till the slave is ready. This is a result from the backplane > bus thinking. In an SoC the slave can easily register those signals > when needed longer and the master can continue.
What's you're describing is not an Avalon issue or a result of 'backplane bus thinking', and is not a limitation of Avalon. If it exists in your design than it's a limitation of the slave component design. The slave generates the wait request output which is used to tell the master that it needs to hold the address and data for it because it essentially doesn't have any space left to hold it itself. If the slave component design has provisions to register and hold the address and data than it can do this and leave the wait request output not asserted and the cycle completes. If you think about it, this would simply be a one deep fifo for holding the address/data/command. If you generalize a bit more you would see that the fifo wouldn't need to be restricted to being only one deep and could be any depth. So as the master device performs reads and writes these commands would be written into the fifo without asserting wait request but also remember that any fifo can fill up at which point the slave must assert wait request because it has no more room to store anything which means that the master device has to hold on to it for a bit.
> On the other hand, > as JOP continues to execute and it is not so clear when the result > is read, the slave should hold the data when available. That is easy > to implement, but Wishbone and Avalon specify just a single cycle > data valid. >
What you would need then is a signal generated by the master back to the slave to say that the master isn't ready to receive the data and would then cause the slave to hold on to the read data. But if you think about it a bit more, the only reason that the slave is providing read data in the first place is because the master device requested it in the first place. If the master wasn't ready to receive data it should simply not assert the read signal command output. By the way, Avalon has a leg up on Wishbone in regards to a cleaner logical approach to handling wait states and latency. Avalon treats the address cycle as a single phase controllable by the slave's wait request and separates that from the read data phase by allowing for latency with the 'readdatavalid' output. With Wishbone you can accomplish the same thing by extending the bus definition with 'tags' but since not all components are required to support 'tags' when you have a mismatch you're on your own for getting the interconnect right. With Avalon, they designed it right with a clear logical distinction between address and data phases so that any incompatibilites between master and slave can still be handled automatically by an automated tool (SOPC Builder). KJ
"Martin Schoeberl" <mschoebe@mail.tuwien.ac.at> wrote in message 
news:44ddc530$0$11352$3b214f66@tunews.univie.ac.at...
> That's fine for me. When the connection magic happens and I don't > have to care it's fine. OK, one exception: Perhaps I would like > to know more details on the latency. The switch fabric is 'plain' > VHdL or Verilog. However, generated code is very hard to read. >
What? You don't have a display that can show 2000 columns on your screen as is nearly required to view the VHDL/Verilog that pops out of SOPC? Actually the best place I've found to look at and understand the wait states and latency is simply the .PTF file since that's where all the information is. Although the .PTF file requires a little bit of a learning curve due to the lack of documentation on Altera's part it's not that hard and once you get a feel for it, it is very easy to see if a slave device requires wait states (and if it does, is it a fixed number or controllable by the slave) and whether the slave device has any read latency (and if it does, it is a fixed number, or controllable by the slave, and how many reads can be pending at one time). Looking at the VHDL is much harder and is not truly the source code anyway, the 'source' really is the .PTF file since the VHDL gets generated from it.
> >> the avalon master is really as simple as the slave. > > Almost, you have to hold address, data and read/write active > as long as waitrequest is pending. I don't like this, see above. >
The master side is a bit more complicated than the slave side. There is a very simple template though that one must almost always follow for the master. When you try to deviate from it you're likely to get burned (voice of experience, I've already had to fix other's code in this area). The template is process(Clock) begin if rising_edge(Clock) then if (Reset = '1') then Read <= '0'; Write <= '0'; elsif (WaitRequest = '0') then -- Put your code here for whenever it is you want to read and/or write -- When writing you would also set WriteData here -- For example, if you're not ready to receive data whenever the slave says it is -- ready than you simply set Read <= '0' until you are ready. end if; end if; end process; For sampling the data on a read it depends on whether the master is implementing the 'Readdatavalid' input (i.e. 'latency aware' in Avalon terminology) or not. If so, then you sample the data when readdatavalid is asserted, if not then sample the data when both the read output is asserted and the wait request is not.
> In my case e.g. the address from JOP (= top of stack) is valid > only for a single cycle. To avoid one more cycle latency I present > in the first cycle the TOS and register it. For additional wait > cycles a MUX switches from TOS to the address register. I know this is a > slight violation of the Avalon specification. > There can be some glitches on the MUX switch.
You might try looking at incorporating the above mentioned template and avoid the Avalon violation. What I've also found in debugging other's code that doesn't adhere to the above template is that there can be subtle errors that take just the right combination of events to occur in order to cause an actual system error of some sort (i.e. not just the Avalon generated assert in simulation). If you use the above template, you're guaranteed to be Avalon compliant and not have this issue. In my opinion, the Avalon bus and the .PTF files to completely define component I/O interfaces is a huge improvement over Wishbone. Although others disagree and don't like .PTF they don't offer any alternative definitions other than comments or documentation to defining all those interface things that one needs to know (i.e. wait states, latency, bus size, etc.). Comments and documentation are nice, but they are not synthesizable whereas .PTF files are (i.e. SOPC Builder sucks them in and spits out VHDL/Verilog)....PTF may not be a standard anywhere outside of Altera, but then is there an open standard that defines a file format that can be used to accomplish what .PTF does? I haven't run across it, and if there is one, I wouldn't mind badgering the tool vendors to support it to that I'm not locked into a vendor specific implementation until then I can be much more productive using PTF than not.
> For synchronous on-chip > peripherals this is absolute not issue. However, this signals > are also used for off-chip asynchronous peripherals (SRAM). > However, I assume that this possible switching glitches are > not really seen on the output pins (or at the SRAM input).
Again, if you use the template, you won't have the gliching even if the signals go off chip to a device. KJ
>>> as very simple example for avalon master-slave type of peripherals there >>> is on free avalon IP core for SD-card support the core can be found >>> at some russian forum and later it was also added to the user ip >>> section of the microtronix forums. >> >> Any link handy for this example? >> > http://forum.niosforum.com/forum/index.php?showtopic=4430 >
Nice, but not a real introductional example. It's a slave and a master. Do you know what the master port is for in this SD controller? And it looks like it's time for me to learn a little bit Verilog - too many Verilog examples around ;-) Martin
Martin Schoeberl schrieb:

> >>> as very simple example for avalon master-slave type of peripherals there > >>> is on free avalon IP core for SD-card support the core can be found > >>> at some russian forum and later it was also added to the user ip > >>> section of the microtronix forums. > >> > >> Any link handy for this example? > >> > > http://forum.niosforum.com/forum/index.php?showtopic=4430 > > > Nice, but not a real introductional example. It's a slave > and a master. Do you know what the master port is for in > this SD controller? And it looks like it's time for me to learn > a little bit Verilog - too many Verilog examples around ;-) > > Martin
hm I guess I said master-slave in the first place. the slave interface is to set up the sector adress and dma address and start! then the master interface transfers data from sd card to the memory on avalon bus. it looked like simple example to me :) Antti