Hi folks, Here to talk about PipelineC. https://github.com/JulianKemmerer/PipelineC/wiki What is it?: - C-like almost hardware description language - A compiler that produces VHDL for specific devices/operating frequencies I am looking for: - anyone who wants to help me develop (Python, VHDL, C) - suggestions on how to make PipelineC more useful/new features - project ideas (heyo open source folks) In the mean time, I am also here to share my most interesting example so fa= r: Using PipelineC with an AWS F1 instance.=20 https://github.com/JulianKemmerer/PipelineC/wiki/AWS-F1-DMA-Example I have made an AMI that you can use to play around with. However, it cannot= be made public; I can only share it with specific AWS accounts, please mes= sage me if interested. I want to share with you why I think PipelineC is particularly powerful: First, it can mostly replace VHDL/Verilog for describing low level, clock b= y clock, hardware control logic. Consider the following generic VHDL: -- Combinatorial logic with a storage register signal the_reg : some_type_t; signal the_wire : some_type_t; process(input, the_reg) is -- inputs sync to clk variable input_variable: some_type_t; variable the_reg_variable : some_type_t; begin input_variable :=3D input; the_reg_variable :=3D the_reg; ... Do work with 'input_variable', 'the_reg_variable' and other variables, functions, etc and it kinda looks like C ... the_wire <=3D the_reg_variable; end process; the_reg <=3D the_wire when rising_edge(clk); output <=3D the_wire; The equivalent PipelineC is some_type_t the_reg; some_type_t some_func_name(some_type_t input)=20 { ... Do work with 'input', 'the_reg' ... and other variables, functions, etc... // Return=3D=3Doutput return the_reg; } Using that functionality I was able write very RTL-esque serialize+deserial= ize logic for the AXI4 interface that the AWS F1 shell logic provides to 'c= ustomer logic' for DMA. The AXI4 is deserialized to a stream of 4096 byte i= nput data chunks that can be processed by a 'work' function. I find that most HLS tools have trouble giving the user this sort of low le= vel control, probably under the assumption that its too low level and not m= eant for software folks to be concerned with. Most hardware description lan= guages are built for exactly this though. Second, PipelineC can replace the most basic feature of other HLS tools: au= to-pipelineing functions: This AWS example sums 1024 floating point values via an N clock cycle pipel= ined binary tree of 1023 floating point adders (soft logic, not hard cores = yet).=20 Below is the PipelineC code: float work(float inputs[1024]) { // All the nodes of the tree in arrays so can be written using loops // ~log2(N) levels, max of N values in parallel float nodes[11][1024]; // Unused elements optimize away =09 // Assign inputs to level 0 uint32_t i; for(i=3D0; i<1024; i=3Di+1) { nodes[0][i] =3D inputs[i]; } =09 // Do the computation starting at level 1 uint32_t n_adds; n_adds =3D 1024/2; uint32_t level; for(level=3D1; level<11; level=3Dlevel+1) {=09 // Parallel sums at this level for(i=3D0; i<n_adds; i=3Di+1) { nodes[level][i] =3D=20 nodes[level-1][i*2] + nodes[level-1][(i*2)+1]; } =09 // Each level decreases adders in next level by half n_adds =3D n_adds / 2; } =09 // Return the last node in tree return nodes[10][0]; } (To be clear, I am NOT claiming that this is the best way to sum floats in = hardware - its just a basic example big enough to use most of the FPGA). The PipelineC tool inserts pipeline registers as needed to meet timing on t= he particular device technology + operating frequency. I find that most HLS= tools are pretty good at this (and will do alot more than inferring pipeli= nes too) but often require some ugly pragmas that - in a way - can make the= code undesirably device specific. Hardware description languages can certa= inly describe the above hardware. But the code will almost certainly descri= be a pipeline designed specific to device technology/operating frequency - = making the code hard for others to reuse even if you are kind enough to sha= re it. The very capable Virtex Ultrascale+ AWS hardware allows the PipelineC tool = to fit the work() function into a pipeline depth/latency of 15 clock cycles= (might be able to squeeze into few as 10 clocks). Running at 125MHz, it t= hus is capable of summing 1024 floating point values in 120 nanoseconds, wi= th an 8 ns cycle time. work() Pipeline: - Frequency: 125 MHz, new inputs each cycle - Latency: 15 clocks / 120 ns LUTS Registers CARRY8 CLB 322144 137181 16307 62664 Here is the 'main' function / top level for the full hardware implementatio= n: aws_fpga_dma_outputs_t aws_fpga_dma(aws_fpga_dma_inputs_t i) { // Pull messages out of incoming DMA write data dma_msg_s msg_in; msg_in =3D deserializer(i.pcis); =20 // Convert incoming DMA message bytes to 'work' inputs work_inputs_t work_inputs; work_inputs =3D bytes_to_inputs(msg_in.data); =20 // Do some work work_outputs_t work_outputs; work_outputs =3D work(work_inputs); =20 // Convert 'work' outputs into outgoing DMA message bytes dma_msg_s msg_out; msg_out.data =3D outputs_to_bytes(work_outputs); msg_out.valid =3D msg_in.valid; =20 // Put output message into outgoing DMA read data when requested aws_fpga_dma_outputs_t o; o.pcis =3D serializer(msg_out, i.pcis.arvalid); =20 return o; } On the software side, utilizing the FPGA hardware with user space file I/O = calls looks like: // Do work() using the FPGA hardware work_outputs_t work_fpga(work_inputs_t inputs) { // Convert input into bytes dma_msg_t write_msg; write_msg =3D inputs_to_bytes(inputs); // Write those DMA bytes to the FPGA dma_write(write_msg); // Read a DMA bytes back from FPGA dma_msg_t read_msg; read_msg =3D dma_read(); // Convert bytes to outputs and return work_outputs_t work_outputs; work_outputs =3D bytes_to_outputs(read_msg); return work_outputs; } So there you have it: Low level RTL-like control, working right beside high= ly pipelined logic. All in a familiar C look that could just be compiled wi= th gcc for 'simulation'. Ex. this example uses the same work() function cod= e as hardware description and as the 'golden C model' compiled with gcc to = compare against. In the sense that C abstracts away the hardware specifics of each CPU archi= tecture + memory model, but only at a very minimal level, I want PipelineC = to be the same for digital logic. The same PipelineC code should produce co= mputationally equivalent hardware on any FPGA/ASIC device technology throug= h smarts in the compiler. But C/PipelineC obviously doesn't do everything, = there isnt a whole lot of higher level abstraction done for you. Its just t= he bedrock to build shareable libraries. Some big features PipelineC lacks as of the moment - Flow control/combinatorial feed-backward signals through N clock pipeline= d logic - PipelineC can describe FIFOs, BRAMs (hard BRAM IP is the only IP support= ed right now) to work with data flows, but the equivalent off a bare combin= atorial <=3D assignment operator feedback is missing - Multiple clock domains / clock crossings (have some neat ideas about this= ). - This would likely be my next big...many month... task? - The C parser I'm using doesnt let you return constant sized arrays, but P= ipelineC as a language really should, but I think if I modified it (oh gosh= help me?) and said 'use g++' to compile this 'C code that returns arrays' = I think it could work out? Got any ideas on what you'd want to do with PipelineC? Let me know maybe we= can make something cool together. Want support for an open source synthesi= s tool, I can give Yosys a try? Thanks for your time folks
PipelineC - C-like almost hardware description language - AWS F1 Example
Started by ●March 21, 2020
Reply by ●March 22, 20202020-03-22
On 22/03/20 01:15, Julian Kemmerer wrote:> Hi folks, > Here to talk about PipelineC.With anything like this you have 30s to convince me to spend some of my remaining life looking at it rather than something else. Hence I want to see: - what benefit would it give me, and how - what won't it do for me (it isn't a panacea) - what do I have to do to use it (scope of work) - what don't I have to do if I use it (I'm lazy) - how it fits into the well-documented toolchains that many people use (since it doesn't do everything) If I see the negatives, I'm more likely to believe the claimed positives.
Reply by ●March 23, 20202020-03-23
On Sunday, March 22, 2020 at 6:43:31 AM UTC-4, Tom Gardner wrote:> On 22/03/20 01:15, Julian Kemmerer wrote: > > Hi folks, > > Here to talk about PipelineC. > > With anything like this you have 30s to convince me > to spend some of my remaining life looking at it rather > than something else. Hence I want to see: > - what benefit would it give me, and how > - what won't it do for me (it isn't a panacea) > - what do I have to do to use it (scope of work) > - what don't I have to do if I use it (I'm lazy) > - how it fits into the well-documented toolchains > that many people use (since it doesn't do everything) > > If I see the negatives, I'm more likely to believe > that many people use (since it doesn't do everything) > the claimed positives.Give a quick go: what benefit would it give me, and how: Feels like RTL when doing clock by clock logic, and can auto pipeline logic otherwise. what won't it do for me (it isn't a panacea): Not a full RTL replacement yet. Would love help to get it there. what do I have to do to use it (scope of work) Write C-looking code, tool generates VHDL that can dropped into any existing project. Mostly a matter of time to run the tool in addition to already long builds. what don't I have to do if I use it (I'm lazy): Dont have to manually pipeline all you logic to specific devices / operating frequencies. Can share 'cross-platform' code. how it fits into the well-documented toolchains: Outputs VHDL. And C-looking code can be used with gcc for debug/modeling. Thanks eh!