Little to no benefit from C based HLS
Last updated 07-Nov-2015
As I write this I am on a plane and my destination is EELive 2014 where I am going to give a talk hardware design: the grunge era. It is a shotgun introduction to three alternative hardware description languages (alt.hdl). The three languages briefly introduced in the talk are: bsv, chisel, and myhdl. The goal of the talk is simply to raise awareness of the three alt.hdls and encourage others to learn an alt.hdl.
In the talk I have one slide explaining why I do not include a "C" based high-level synthesis (HLS). The reasons I did not consider a "C" based HLS for the talk:
- Minimal micro-architecture control
- "C" is not that high-level
- Almost impossible to leverage existing code bases
- There is not an army of C programmers (this is typically a reason provided to justify C HLS)
The above is all that is said on the subject in the talk. In this post I am going to use a "C" HLS example to demonstrate the "C" version is not more concise, readable, or digestible than a hardware implementation in MyHDL.
In the latest XCell (XCell issue 86) there is an article on using the Vivado HLS (C based HSL previously known as AutoESL) to create a median filter. At the core of the median filter is a median calculation which is mainly a "sort". The first thing to point out is that the author, Daniela Bagni, makes an effort to explain a different type of sort is being implemented: a sort network.
Regardless of the language we need to make sure we are designing an appropriate algorithm for the target. In this case a sort network instead of an iterative sort like quick-sort. Figure 1 is a depiction of the algorithm. The sort will require N clock cycles where N is the number of items to be sorted and N is fixed. The median calculation will produce a new result on every clock cycle even though it requires N cycles before a result is calculated - the median calculation is pipelined.
(Figure 1: Sort network for a five element sort)
The following is the C code that calculates the sort and can be used in the Vivado HLS (see the article for the actual code with the pragmas required for HLS and additional definitions).
The following is a Python version. This Python version will only be used as reference (exploration and verification).
This Python version is different than the C version. If you are experienced with C (and not Python) you may not like this form. If you are familiar with Python or other high-level languages this might be a preferred form. In actuality it is not that different than the C version. The C version could be written with an additional function. I like the use of the function because it is explicit. We can look at the definition of our algorithm and the code and make sense of each with little effort - it is clear which part of the implementation is calculating the stage and the portion that is cascading the stages.
The hardware description is similar, there is a module to describe the stage comparisons and a separate piece to cascade the stages. In addition the hardware description includes the hardware types.
The following is the MyHDL description of the compare stages.
In the hardware version we need to define which events cause the RTL process (which is a Python generator) to execute. This is a fully synchronous implementation. We have a sequential process on the positive edge of the clock.
To build the stages, first a matrix of Signals is created to connect the compare stage modules. And then a list of the compare modules is created and connected with the signal matrix, this describes the structure described in figure 2.
(Figure 2: Sort-network logical signal matrix)
To me this makes sense, in this case we have an explicit definition of the algorithm the sort network in figure 1 and it is clear by the definition that we need to communicate between stages, whereas the C version reuses the buffers - breaking the connection to the original definition.
From my perspective the MyHDL is a more explicit implementation of our algorithm and just as concise as the C version. The addition of some of the hardware constructs (e.g. defining the events) is roughly the same effort required to understand the C version parallel-streaming description and pragmas. Sure, I am biased because I have years of experience writing HDL but as the article points out you need to be aware of the target. The difficult part is defining the correct algorithm, which the HLS nor HDL help. But with the HDL I am working at a comfortable level - just the right level of abstraction and my algorithm is not foreign compared to the C version. There are just a couple things you need to learn with the HDL (see the MyHDL manual ) but there are also things that need to be learned with the C version.
As briefly mentioned, in this context the median calculation is used in an image processing median filter. The filter is used to remove noise. In the next post the rest of the filter will be implemented.
The complete median (mm_median) is available in a github repo with other examples from the EELive (Embedded Systems Conference) presentation. The code snippets and test are available in a gist. In the next post the rest of the the median filter implementation will be covered.
- 07-Nov-2015: Updated the code snippets, the previous method was broken, using github gist to manage the code snippets.
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: