# Cutting a Path Forward

## Introduction

As a newcomer to the community, I thought I would start off by introducing myself, and give a little information about what has drawn me to start working with FPGAs.

My day job is as a professional software developer:  Figure out what people want; figure out how to make it happen (if possible); and then wrangle code, databases, networks, and servers into giving the correct responses or actions as necessary.

By night, however, I've been working on my own research in the field of robotics and autonomous navigation.  Basically, I've got an idea on how to tackle a tough problem for air, land, and submersible drones, and I've spent a good portion of the last 8 years figuring out how to make this a reality.

Being a software guy, all of my work up to this point has been, of course, implemented in software.  While rewarding and fun for it's own sake (and it pays the bills), I quickly came to the realization that working purely in software is probably not the best answer for several reasons.

The biggest reason is that I'm pretty convinced the end product should be simple, cheap, and effective.  Unfortunately, the trend in the software world is to just throw more CPU horsepower and RAM at a problem until it becomes "practical."  If you can't do it onboard the local system, you hook up an AWS or Azure account and have theoretically unlimited computing power at your disposal.  Sadly, that cuts out the "cheap" requirement (and really puts a ding in "simple"--tethering to a cloud system brings its own set of issues).

Alternatively, you can just add more local computing power on board, right?  "Moore's law is your friend" is a common saying amongst software types.  Practically speaking, most affordable drones are not going to be able to carry around a miniature Cray computer and still do anything fun or worthwhile.  (And I'm probably not going to live long enough for credit card-sized quantum computers to be listed on eBay!)

The next biggest reason for looking at FPGAs is timing and reliability.  Again, with a purely software system, you can do a lot of interesting work, but the trade off is that you often can't guarantee how long it will take.  Small, lightweight systems like the Raspberry Pi are certainly promising, but between OS scheduling, memory requirements, and standard software bugs and edge cases, it becomes very difficult to make something happen fast enough for a real-world application.  Of course, you can always beef up the computing system (RAM, CPU, specialized OS or RTOS), but now you're edging toward the same problem as before:  Too many resources to haul around, too complex an implementation, and "cheap' is just a catch phrase as you try to wire up yet another dedicated computer into the system.

The last reason, and probably the most personal, is that I just like to learn new things.  FPGAs have always fascinated me with their possibilities, ever since I first heard of them about 10 years ago.  Along with trying to make this concept work for its own sake, this is the perfect chance for me to learn something new.

## The Big Idea

If you're not familiar with the concept of Graph Cuts, Wikipedia has several articles that introduce the basic concepts, as well as a discussion on how they are applied to vision and segmentation problems.  For those who don't want to wade too far into the math or Google Scholar, here is a basic primer.

### Basic Graph Cuts

Graph cut algorithms work by taking a network model (pipes, wires, packets, probabilities, etc) and finding the maximum amount of "flow" that can move from a source node to a sink node.  At first glance, this might not seem very useful outside of actual networks, but it turns out that it's very useful in other realms, especially for image processing.

In the image below, for example, the question being solved is "What's in the road?"  Using graph cut algorithms, the pixels in the image are turned into a network, with neighboring pixels connected together (called n-links), and a source and sink "virtual" pixel added.  The label nodes (as they are normally called) are also connected to each pixel in the image (called a t-link).  The strength of the connection is based on how close each pixel is to the definition of the label, and the individual pixels are connected to each other, based on how similar they are to their neighbors--left side (a) of the image below.

When the process is finished, the products of the graph cut are shown in the right side (b).  The deer can be pulled out of the image, and the background is left behind.  In the middle is the green outline, known as the "cut path", and a numeric value for the total flow.

Typically, the cut path and the total flow are discarded.  The total flow is image dependent, and if you are trying to process high-speed video, holding onto that value doesn't pay any dividends.  The same goes for the cut path:  Since it's actually a slice between pixels, it's not very useful when you are pixel-oriented.

When you zoom into what's happening at the cut, however, things start to look a little more interesting.  The cut path separates label regions, and meanders between both labeled nodes and strongly-connected neighbors.  In a sense, cut paths are a lot like hyperplanes that intersect with the model (a 2D image, in this case), and they always go between the nodes of a graph structure.  They also have a few other interesting properties:

• Globally optimal:  For any given graph, there might be more than one path with optimal value X, but a graph cut algorithm will always settle on one of those optimal values.
• Parameter dependent:  If you take a fixed graph structure and play with the t-link and n-link values, the cut path will move accordingly.
• It's only math:  It doesn't matter if your values are in amps, ohms, gigabits, gallons per hour, or probabilities--as long as the modeling makes sense, then the path will, too.

(On a side note:  I don't have the reference in front of me, but I found that, during WWII, the Allies used graph cut algorithms to figure out where to bomb supply lines to most effectively cut off German supplies.  Just goes to show how widely this idea can be applied.)

I spent a lot of time working on this for medical image analysis in grad school.  One day, I said to myself, "Hmmm...  That looks an awful lot like an obstacle avoidance path.  Is there a model that would bear that out?"  And thus an 8-year personal project was born.

### Graph Cuts for Guidance

The standard formula for a generic graph cut is really just finding the minimal value for $E(\mathcal{L})$ in the formula below:

$$E(\mathcal{L}) = \Sigma D_{p} (\mathcal{L}_{p}) + \Sigma V_{p,q}(\mathcal{L}_{p},\mathcal{L}_{q})$$

In plane English, this boils down to "Find the minimum combination of t-links and n-links to cut, out of all possible combinations."  There are a lot of different approaches for finding the optimal combination, each with their own strengths and weaknesses.  But, as I mentioned before, this is a numbers problem, so as long as it's correct, the implementation is not that important, mathematically.  (Time and memory constraints are another matter).

To build out the model for a path-finding system, let's start with how a drone might see the world (illustrated below):

Using one or more sensors, our intrepid 2D-constrained drone looks out on the world and builds up a picture of what's in front of it.  Even if it has on board image processing, the system itself doesn't know if it's looking at a Mack truck or a down pillow--only that there is something at a relative azimuth and distance.  And, depending on the sensor, it might not actually be entirely sure that the "something" is actually a something.  Noisy signals, imperfect sensors and processing, and even limitations of the sensors themselves all come into play.  For example, a radar system will definitely see a tank sitting in front of it, but might miss a fluffy rabbit.  Conversely, an IR camera will probably pick up that a rabbit or elephant is on a particular azimuth, but not be too sure if it's a close rabbit or distant elephant.  Sonar sensors will see a "wall" of whatever hard material across their FOV, but it could be a section of a bigger wall, or just a narrow metal pole in the middle.

Fortunately, this all adds up to a very simple model that will work for our guidance system.  In essence, each sensor has a probability that something is where it says it is, and those probabilities can be converted into a negative log-likelihood (NLL) of successful transit.  NLL values are nice, because they are additive, and therefore multiple sensor returns can be loaded onto our network model in succession.

In essence, then, our cut path is the least likely to fail (hit something) moving forward, based on one or more sensors.  We are building a risk-averse pathfinding solution, based only on various sensor inputs.

## Other FPGA Implementations

The idea of using FPGAs to speed up graph cut analysis is definitely not my own.  In 2012, Researchers at the University of Tsukuba were able to get a 482x321 pixel image processed between 7 and around 30 milliseconds, depending upon the complexity of the image and the starting "seeds".  More recently, researchers at the Robotics Research Center, IIIT-Hyderabad, India did a comparative study of FPGA implementations, and were able to automatically process 512x288 images in just over 175 milliseconds.

Both of these examples were focused on the standard image segmentation problem, and their graph structures were a lot more data intensive than what I expect to process at this stage.  I'm hopeful that I can achieve 30 milliseconds without driving myself too insane.  For a slow moving vehicle, this would allow real time obstacle avoidance at any reasonable speed.

## Plan of Attack

For the old FPGA hands around here, I imagine that there is already a running list of things I'll have to learn before I can even hope to get a proof of concept up and running.  Starting from the top down, here's my own list (and I'm probably missing more than half the to-do/to-learn items):

1. Algorithm implementation
1. Using shared memory to/from host
3. FPGA implementation details:  Finite State Machines, appropriate math/logic implementations, squeezing it all on the hardware I have (or moving up to bigger FPGAs)
4. How do I know it worked? Debugging, testing, and validation
2. Sensor processing:  If the graph cut is fast, why wait on software-based sensors?
1. Digital Signal processing (1D, 2D sensors--can't afford 3D yet, but who knows?)
2. Probability math
3. Coordinate transformation (parameters, values, etc)
4. Sensor fusion (multiple and mixed modes)
5. More DSP (a topic for later)
3. Output Visualization:  Might as well see what the system "sees"
1. Image and video creation (image formats, HDMI output)
2. Overlaying multiple data sets
4. Motor controls:  Now that you have a path, how are you going to use it?
1. PWM actuators
2. Control and feedback
5. Actual navigation:  Wandering aimlessly isn't much good--how about reaching a destination?
1. Local and map-based environments
2. INS, GPS, and other location data
3. More math and optimization
6. ...and many more I'm probably skipping over

### Next Steps

Since the purpose of all of this is to both create something different and learn something new, my next steps are to delve into the steps above, learn how to create that part, and then write up a little article on what I did, how I did (or did not) make it happen, and talk over the design choices and limitations that came up.  Sometimes it'll be a happy lessons-learned, and I'm sure there will be other times where an idea didn't pan out.

In the end, I'm hopeful that I'll have gained a lot of knowledge, a good hobby, and left behind some practical examples for others who are also starting out.

[ - ]
Comment by April 6, 2018

Hi David, as a relative newcomer myself, I wish you all the best with your grand project. I will follow it with interest. My own pet project is to see what can be achieved with floating-point centric FPGAs, in closed-loop systems, where every ns counts. In a few days I will get my 1st real results from a journey has been sometimes frustrating but mostly rewarding.

I might add a bit of time to the schedule for a) fighting with the FPGA design tools, b) getting to grips with FPGA timing closure and c) wondering how MATLAB optimizes the number of add-on packages you need to a maximum (just kidding :-).

. It Looks like you have picked a topic that is valuable and highly relevant and maybe some of the very latest FPGA's like the Intel Cyclone GX 10 could be appropriate in the cost/performance sweet spot ?

All the best, Steve

[ - ]
Comment by April 7, 2018

Steve,

Thanks!  I'm not to the point of ever ns counting (yet), but I'm trying to move forward with that in mind.

Since I'm at the complete newcomer level, myself, I'm starting with a couple of cheap dev boards to practice with.  As some wiser hands pointed out, it's good to start that way since you'll probably end up burning out a few by miswiring at one point or another.

The Cyclone was mentioned as a good board to work on, and I'll definitely keep that on my list of next items to work with.

I've done a MATLAB/Octave implementation of the concept in question, which at least proves that the idea is workable.  OTOH, it's painfully slow, and I'm not a MATLAB expert at all.  If you or anyone else want to take a look at it, let me know and I'll post a link.  (I decided not to do it out of the gate because I didn't want to come across as a click-baiter).

When working in a new language or technology, it usually helps to have a definite goal when starting out.  Marrying up my pet project with learning FPGAs seemed like a good way to go.

If you have anything you can show here or elsewhere on your work, I'd love to take a look.  I'm sure I don't know enough to understand everything yet, but I like to see how other people solve a problem.

[ - ]
Comment by April 8, 2018

David,

Yes, I managed to fry an OptoMos switch and it's driver when I shorted them to a -5V rail. Fortunately I could get it repaired by the guy that assembled it for me.

Most of my current MATLAB activity is pretty basic and revolves around s-domain and z-domain transfer function synthesis/analysis along with FFTs,IFFTs and a bit of algebra here and there, so I have had no need to get into heavy duty algorithm optimization .... yet.

I have written a series of Blogs (8 so far) on the DSPRelated site about using Feedback Control/DSP/FPGA/Mixed-Circuit Electronics to make software/firmware defined hardware characteristics. The intention is much like your project description, to follow a project from concept to the completion of a working prototype.There will probably be another 2 parts and then maybe the occasional application example or maybe some completely different DSP/Feedback control topics.

One thing that I am interested in is the use of high-level languages to describe algorithms and turn them into FPGA code. Specific examples being OpenCL, Intel HLS and MATLAB DSP Builder. At present my algorithms are not very complex and can be hand crafted/optimized for speed & resources by using the Intel FPGA IP blocks. As the complexity increases I will be more inclined to see if I can move to a higher-level description and let a compiler worry about complexity, optimization and timing closure. It will be interesting to try that on some proven blocks and see how well they do.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.