Reply by Theo Markettos July 27, 20162016-07-27
Kevin Neilson <kevin.neilson@xilinx.com> wrote:
> Even with your example of a matrix multiplication, there is still a lot to > figure out. For one thing, you usually have to fold the multiplication > because you don't have the resources to do the whole thing in parallel. > The Matlab->gates tool that I used once is long gone. > > I just spent several days implementing a design that I modeled in two > lines of Matlab code. For a tool to have converted those two lines into a > good hardware implementation would've been difficult.
I agree, you can't expect the tool to just do it - there are many things that need to be tweaked. But those are architectural design choices, which are different from writing Verilog or VHDL. By all means expose the tradeoffs to the designer in a way they understand, just don't make them write HDL. Software folks are familiar with the idea of different data structures having difference performance qualities. But by and large a serial 'program' isn't a helpful way to expose tradeoffs to either the software or hardware engineer. As I said, this ignores the elephant in the room that is communication. Your C code has this illusion that it lives in a flat uniform memory space because it's behind a cache, an MMU, a prefetcher and a standard library, that do a lot of work (in terms of time, area and power) to make it look that way. To get good performance, you need both a way to write the compute and a way to move the data around to the right place at the right time. 'HLS' tools are usually poor at handling that: I don't know a HLS 'C to gates' tool that it would make sense to write a cache in, for example. Theo
Reply by Kevin Neilson July 27, 20162016-07-27
The high-level design tools I've used weren't very abstracted.  To make something work well, I had to keep moving to lower levels of abstraction.  In tool that was supposed to be high-level I found myself instantiating DSP48s.  Not very abstract.
Reply by Kevin Neilson July 27, 20162016-07-27
Even with your example of a matrix multiplication, there is still a lot to figure out.  For one thing, you usually have to fold the multiplication because you don't have the resources to do the whole thing in parallel.  The Matlab->gates tool that I used once is long gone.

I just spent several days implementing a design that I modeled in two lines of Matlab code.  For a tool to have converted those two lines into a good hardware implementation would've been difficult.
Reply by rickman July 27, 20162016-07-27
On 7/27/2016 1:33 PM, Jecel wrote:
> On Wednesday, July 27, 2016 at 11:41:11 AM UTC-3, rickman wrote: >> On 7/26/2016 8:11 PM, Kevin Neilson wrote: >>> I think Celoxica is defunct. >> >> So there it is! > > Celoxica was a very successful company that was bought out by > a much larger competitor and immediately shut down. This left > many customers who loved the product without any good options. > > And this is why I refuse to use non open source tools no matter > what advantages they claim to have. You can't know if they will > still be available tomorrow. The exception (for now) is the > vendor tools for generating the FPGA bitfiles.
At least with open source tools you can't be disappointed by a total lack of support. My installation of Lattice Diamond errors out when I perform synthesis and I can't get any help from the vendor support... at all! -- Rick C
Reply by Jecel July 27, 20162016-07-27
On Wednesday, July 27, 2016 at 11:41:11 AM UTC-3, rickman wrote:
> On 7/26/2016 8:11 PM, Kevin Neilson wrote: > > I think Celoxica is defunct. > > So there it is!
Celoxica was a very successful company that was bought out by a much larger competitor and immediately shut down. This left many customers who loved the product without any good options. And this is why I refuse to use non open source tools no matter what advantages they claim to have. You can't know if they will still be available tomorrow. The exception (for now) is the vendor tools for generating the FPGA bitfiles. -- Jecel
Reply by rickman July 27, 20162016-07-27
On 7/26/2016 8:11 PM, Kevin Neilson wrote:
> I think Celoxica is defunct.
So there it is! -- Rick C
Reply by Theo Markettos July 27, 20162016-07-27
Mark Curry <gtwrek@sonic.net> wrote:
> They're quite capable of this. Problem is they DONT WANT to. They'd prefer > to be moving their software coding to a higher level of abstraction (through > advances is SW languages and techniques). Then leave all these > "fiddly hardware details" to the hardware designers.
Indeed. It puzzles me why hardware designers would think that a pile of nested for loops consist of a high level abstraction. If we're concentrating on compute to the exclusion of all else (as HLS seems to), the algorithm might be defined in terms of matrix operations, so surely it's that which should be the input to the HLS toolchain? In a matrix multiply, say, the parallelism is inherent and it is friendly to the programmer: they want to compute matA*matB and all the other details can be left to the tool to figure out. At the very least it leaves plenty more scope for the tool to improve, rather than trying to unpick C loops that represent matA*matB. Matlab isn't a great language for many reasons, but it does make it possible to write code with implicit parallelism pretty easily, without even thinking about it. That would be heading towards my definition of 'high level'. (I'm not familiar with Simulink-to-gates flows, because I'm not a great fan of schematics. Perhaps the tools are better in this space) Theo
Reply by Kevin Neilson July 26, 20162016-07-26
I think Celoxica is defunct.
Reply by Tom Gardner July 26, 20162016-07-26
On 26/07/16 18:43, Kevin Neilson wrote:
> That's what they like to say. It sounds nice. But software engineers still can't program parallel processor arrays well, let alone FPGAs. > > These tools can all make a functional FPGA, but if it uses too many gates and has too many levels of logic, you're better off using software.
Be aware that the "high frequency trading" mob put trading algorithms (i.e if X then buy shares Y) into FPGAs to shave off the odd millisecond latency. Obviously development turnaround time for the algorithms is very important. That's a good use-case for software->gates. They've also laid their own $600m trans-Atlantic fibre optic cable to avoid contention and latency, and have bought up all the microwave transmission towers between Chicago and New York because the speed of light in fibres is noticeably slower than that in air.
Reply by rickman July 26, 20162016-07-26
On 7/26/2016 1:30 PM, Kevin Neilson wrote:
> They did claim that software engineers with no hardware experience could be designing FPGAs after a very short training period. Which might have been true. But there's a difference between doing FPGAs and doing them well.
I suppose the proof of the pudding is in the eating. Who is using this tool for production in this way? -- Rick C