I've been designing an open source Larrabee-esque GPGPU processor in System= Verilog and I thought people might find it interesting. Full source code, d= ocumentation, tests, tools, etc. are available on github: https://github.com/jbush001/NyuziProcessor The processor supports a wide, predicated vector floating point pipeline wi= th 16 lanes and multiple hardware threads to hide memory and computation la= tency. It also supports multiple cache coherent cores. I've created an LLVM= backend for this, so C/C++ code can be compiled for it. It includes suppo= rt for first class vector types using the GCC vector extensions, as well as= a intrinsics to expose specialized instructions. I've written a 3D engine (software/librender) that is optimized to take adv= antage both of the vector unit and multiple cores/hardware threads. Here's= a video of the standard teapot (with ~2300 triangles) rendering on a singl= e core on FPGA running at 50 Mhz: http://youtu.be/DsvZorBu4Uk This image is the emulator rendering Dabrovik's Sponza atrium, ~66k triangl= es. This took around 200 million instructions to render between 8 virtual c= ores and 32 hardware threads (at 1024x768): http://i.imgur.com/sHAsAU5.png My main purpose of designing this was to be able to experiment with process= or architecture with real, empirical data. The neat thing about having all = the source to a cycle accurate hardware design is that it is infinitely ins= trumentable. I've kept notes about some of my findings here: http://latchup.blogspot.com/ Anyway, comments and suggestions are appreciated, and I'm happy to take con= tributions if people are interested in hacking on it.
Open Source GPGPU core
Started by ●February 11, 2015
Reply by ●February 12, 20152015-02-12
I tried building your toolchain on both a 32 and 64 bit amd Ubuntu 14.10 system and get: Linking CXX shared library ../../../lib/liblldb.so Python script sym-linking LLDB Python API Program error: Invalid parameters entered, -h for help. You entered: ['--buildConfig=', '--srcRoot=/home/johne/Desktop/Nyuzi/NyuziToolchain/tools/lldb', '--targetDir=/home/johne/Desktop/Nyuzi/NyuziToolchain/build/tools/lldb/source/../scripts', '--cfgBldDir=/home/johne/Desktop/Nyuzi/NyuziToolchain/build/tools/lldb/source/../scripts', '--prefix=/home/johne/Desktop/Nyuzi/NyuziToolchain/build', '--cmakeBuildConfiguration=.', '-m'] (-1) tools/lldb/source/CMakeFiles/liblldb.dir/build.make:282: recipe for target 'lib/liblldb.so.3.7.0' failed make[2]: *** [lib/liblldb.so.3.7.0] Error 255 CMakeFiles/Makefile2:12189: recipe for target 'tools/lldb/source/CMakeFiles/liblldb.dir/all' failed make[1]: *** [tools/lldb/source/CMakeFiles/liblldb.dir/all] Error 2 Makefile:133: recipe for target 'all' failed make: *** [all] Error 2 johne@ouabache:~/Desktop/Nyuzi/NyuziToolchain/build$ John Eaton --------------------------------------- Posted through http://www.FPGARelated.com
Reply by ●February 13, 20152015-02-13
On Thursday, February 12, 2015 at 7:04:38 PM UTC-8, jt_eaton wrote:> I tried building your toolchain on both a 32 and 64 bit amd Ubuntu 14.10 > system and get: > > Linking CXX shared library ../../../lib/liblldb.so > Python script sym-linking LLDB Python API > Program error: Invalid parameters entered, -h for help. > You entered:It looks like LLDB was not building correctly when the build type wasn't set (I normally build with Debug). I pushed a change to the cmake files that should address this. Let me know if that fixes it. Thanks --Jeff
Reply by ●February 13, 20152015-02-13
> >It looks like LLDB was not building correctly when the build type wasn'tset (I normally build with Debug). I pushed a change to the cmake files that should address this. Let me know if that fixes it.> >Thanks > >--Jeff >That fixed it. Ran all the tests and got the picture in the frame buffer. Do any of the tests run verilator to create a vcd dump file? John Eaton --------------------------------------- Posted through http://www.FPGARelated.com
Reply by ●February 13, 20152015-02-13
>> > >That fixed it. Ran all the tests and got the picture in the frame buffer.> >Do any of the tests run verilator to create a vcd dump file? > > >John Eaton > > >--------------------------------------- >Posted through http://www.FPGARelated.com >Ok I found it. Are all of your tests all using the same vcd dump file? John Eaton --------------------------------------- Posted through http://www.FPGARelated.com
Reply by ●February 13, 20152015-02-13
On Friday, February 13, 2015 at 6:37:53 PM UTC-8, jt_eaton wrote:> That fixed it. Ran all the tests and got the picture in the frame buffer.==20 Great!> Do any of the tests run verilator to create a vcd dump file?Yep. All of the cosimulation tests run in Verilator. The compiler tests c= an be made to run in verilator by defining USE_VERILATOR=3D1 in the shell e= nvironment. The render tests have a target 'verirun' that will run them in= Verilator (there are READMEs in those directories with more details) VCD dumps aren't produced by default, but can be enabled by modifying the m= akefile in the rtl/ directory, uncommenting the line: VERILATOR_OPTIONS=3D--trace --trace-structs And rebuilding. A file 'trace.vcd' will be written in the same directory. T= he output files get big fast for non-trivial tests. :)
Reply by ●February 15, 20152015-02-15
On Wed, 11 Feb 2015 09:09:18 -0800, jeffbush001 wrote:> I've been designing an open source Larrabee-esque GPGPU processor in > SystemVerilog and I thought people might find it interesting. Full > source code, documentation, tests, tools, etc. are available on github: > > https://github.com/jbush001/NyuziProcessor > > The processor supports a wide, predicated vector floating point pipeline > with 16 lanes and multiple hardware threads to hide memory and > computation latency. It also supports multiple cache coherent cores. > I've created an LLVM backend for this, so C/C++ code can be compiled for > it. It includes support for first class vector types using the GCC > vector extensions, as well as a intrinsics to expose specialized > instructions.How many gates does it take once synthesized? Are there any Altera- specific constructs in code or is it portable?
Reply by ●February 15, 20152015-02-15
On Sunday, February 15, 2015 at 6:17:13 AM UTC-8, Aleksandar Kuktin wrote:> How many gates does it take once synthesized? Are there any Altera- > specific constructs in code or is it portable?The default configuration with 1 core takes around 70k LEs on Altera. Almos= t all of the design is generic behavioral RTL without custom megafunctions.= The exception are SRAM and FIFO modules, which generally need to be tweak= ed for the specific target to infer properly.
Reply by ●February 15, 20152015-02-15
On Sun, 15 Feb 2015 07:37:13 -0800, jeffbush001 wrote:> On Sunday, February 15, 2015 at 6:17:13 AM UTC-8, Aleksandar Kuktin > wrote: > >> How many gates does it take once synthesized? Are there any Altera- >> specific constructs in code or is it portable? > > The default configuration with 1 core takes around 70k LEs on Altera. > Almost all of the design is generic behavioral RTL without custom > megafunctions. The exception are SRAM and FIFO modules, which generally > need to be tweaked for the specific target to infer properly.Okay, so this sounds fun. Gonna clone it and see what's inside. :)
Reply by ●February 18, 20152015-02-18






