FPGARelated.com
Forums

Open Source GPGPU core

Started by Unknown February 11, 2015
I've been designing an open source Larrabee-esque GPGPU processor in System=
Verilog and I thought people might find it interesting. Full source code, d=
ocumentation, tests, tools, etc. are available on github:

https://github.com/jbush001/NyuziProcessor

The processor supports a wide, predicated vector floating point pipeline wi=
th 16 lanes and multiple hardware threads to hide memory and computation la=
tency. It also supports multiple cache coherent cores. I've created an LLVM=
 backend for this, so C/C++ code can be compiled for it.  It includes suppo=
rt for first class vector types using the GCC vector extensions, as well as=
 a intrinsics to expose specialized instructions.

I've written a 3D engine (software/librender) that is optimized to take adv=
antage both of the vector unit and multiple cores/hardware threads.  Here's=
 a video of the standard teapot (with ~2300 triangles) rendering on a singl=
e core on FPGA running at 50 Mhz:

http://youtu.be/DsvZorBu4Uk

This image is the emulator rendering Dabrovik's Sponza atrium, ~66k triangl=
es. This took around 200 million instructions to render between 8 virtual c=
ores and 32 hardware threads (at 1024x768):

http://i.imgur.com/sHAsAU5.png

My main purpose of designing this was to be able to experiment with process=
or architecture with real, empirical data. The neat thing about having all =
the source to a cycle accurate hardware design is that it is infinitely ins=
trumentable. I've kept notes about some of my findings here:

http://latchup.blogspot.com/

Anyway, comments and suggestions are appreciated, and I'm happy to take con=
tributions if people are interested in hacking on it.
I tried building your toolchain on both  a 32 and 64 bit amd Ubuntu 14.10
system and get:

Linking CXX shared library ../../../lib/liblldb.so
Python script sym-linking LLDB Python API
Program error: Invalid parameters entered, -h for help. 
You entered:
['--buildConfig=',
'--srcRoot=/home/johne/Desktop/Nyuzi/NyuziToolchain/tools/lldb',
'--targetDir=/home/johne/Desktop/Nyuzi/NyuziToolchain/build/tools/lldb/source/../scripts',
'--cfgBldDir=/home/johne/Desktop/Nyuzi/NyuziToolchain/build/tools/lldb/source/../scripts',
'--prefix=/home/johne/Desktop/Nyuzi/NyuziToolchain/build',
'--cmakeBuildConfiguration=.', '-m'] (-1)
tools/lldb/source/CMakeFiles/liblldb.dir/build.make:282: recipe for target
'lib/liblldb.so.3.7.0' failed
make[2]: *** [lib/liblldb.so.3.7.0] Error 255
CMakeFiles/Makefile2:12189: recipe for target
'tools/lldb/source/CMakeFiles/liblldb.dir/all' failed
make[1]: *** [tools/lldb/source/CMakeFiles/liblldb.dir/all] Error 2
Makefile:133: recipe for target 'all' failed
make: *** [all] Error 2
johne@ouabache:~/Desktop/Nyuzi/NyuziToolchain/build$ 


John Eaton
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com
On Thursday, February 12, 2015 at 7:04:38 PM UTC-8, jt_eaton wrote:
> I tried building your toolchain on both a 32 and 64 bit amd Ubuntu 14.10 > system and get: > > Linking CXX shared library ../../../lib/liblldb.so > Python script sym-linking LLDB Python API > Program error: Invalid parameters entered, -h for help. > You entered:
It looks like LLDB was not building correctly when the build type wasn't set (I normally build with Debug). I pushed a change to the cmake files that should address this. Let me know if that fixes it. Thanks --Jeff
> >It looks like LLDB was not building correctly when the build type wasn't
set (I normally build with Debug). I pushed a change to the cmake files that should address this. Let me know if that fixes it.
> >Thanks > >--Jeff >
That fixed it. Ran all the tests and got the picture in the frame buffer. Do any of the tests run verilator to create a vcd dump file? John Eaton --------------------------------------- Posted through http://www.FPGARelated.com
>> > >That fixed it. Ran all the tests and got the picture in the frame buffer.
> >Do any of the tests run verilator to create a vcd dump file? > > >John Eaton > > >--------------------------------------- >Posted through http://www.FPGARelated.com >
Ok I found it. Are all of your tests all using the same vcd dump file? John Eaton --------------------------------------- Posted through http://www.FPGARelated.com
On Friday, February 13, 2015 at 6:37:53 PM UTC-8, jt_eaton wrote:
> That fixed it. Ran all the tests and got the picture in the frame buffer.=
=20 Great!
> Do any of the tests run verilator to create a vcd dump file?
Yep. All of the cosimulation tests run in Verilator. The compiler tests c= an be made to run in verilator by defining USE_VERILATOR=3D1 in the shell e= nvironment. The render tests have a target 'verirun' that will run them in= Verilator (there are READMEs in those directories with more details) VCD dumps aren't produced by default, but can be enabled by modifying the m= akefile in the rtl/ directory, uncommenting the line: VERILATOR_OPTIONS=3D--trace --trace-structs And rebuilding. A file 'trace.vcd' will be written in the same directory. T= he output files get big fast for non-trivial tests. :)
On Wed, 11 Feb 2015 09:09:18 -0800, jeffbush001 wrote:

> I've been designing an open source Larrabee-esque GPGPU processor in > SystemVerilog and I thought people might find it interesting. Full > source code, documentation, tests, tools, etc. are available on github: > > https://github.com/jbush001/NyuziProcessor > > The processor supports a wide, predicated vector floating point pipeline > with 16 lanes and multiple hardware threads to hide memory and > computation latency. It also supports multiple cache coherent cores. > I've created an LLVM backend for this, so C/C++ code can be compiled for > it. It includes support for first class vector types using the GCC > vector extensions, as well as a intrinsics to expose specialized > instructions.
How many gates does it take once synthesized? Are there any Altera- specific constructs in code or is it portable?
On Sunday, February 15, 2015 at 6:17:13 AM UTC-8, Aleksandar Kuktin wrote:

> How many gates does it take once synthesized? Are there any Altera- > specific constructs in code or is it portable?
The default configuration with 1 core takes around 70k LEs on Altera. Almos= t all of the design is generic behavioral RTL without custom megafunctions.= The exception are SRAM and FIFO modules, which generally need to be tweak= ed for the specific target to infer properly.
On Sun, 15 Feb 2015 07:37:13 -0800, jeffbush001 wrote:

> On Sunday, February 15, 2015 at 6:17:13 AM UTC-8, Aleksandar Kuktin > wrote: > >> How many gates does it take once synthesized? Are there any Altera- >> specific constructs in code or is it portable? > > The default configuration with 1 core takes around 70k LEs on Altera. > Almost all of the design is generic behavioral RTL without custom > megafunctions. The exception are SRAM and FIFO modules, which generally > need to be tweaked for the specific target to infer properly.
Okay, so this sounds fun. Gonna clone it and see what's inside. :)
Amazing project :)))