FPGARelated.com
Blogs

The Art of Debugging

Mike December 11, 2015

Debugging electronics is similar to any technological process.  In theory we know how things are supposed to work, in reality they don't behave as expected.  The challenge of engineering boils down to making things work, and debugging is the fundamental task we use to go from lumps of sand to picosecond accurate switching networks.  Debugging is an art that requires a lot of time to learn.  Like any skill, the more we work at it, the better we become.  

The first question to ask when dealing with anything electronic that does not work is: "Is it plugged in?"  I don't know how many times I've looked at something and found the batteries were dead and that was NOT the first thing I checked.  Always start with the power supply with any electronics problem.  Is there power coming from the line?  No?  Is the circuit breaker blown? No.  IS IT PLUGGED IN????  GAAAA!  NO!  Why didn't I check that first?!?!?

This question gets expanded when diving into a complex circuit.  Many devices have multiple power levels, $\pm$12V, 5V, 3.3V, 1.8V, 1.2V and shrinking more every year.  When dealing with weird problems, checking each power supply level is the first step.  "Is it plugged in?" goes to every supply - if the output is not at the required voltage that chunk of circuit won't run because it is not plugged in.  

If all the voltages check, the second question to ask is: "Is it grounded?"  I have witnessed things work for a while, and then everything goes haywire.  A chips ground was not connected which is OK when it first stars up because the AC capacitively couples to ground. But as it builds up charge its internal potentials rise to the supply rail and there is no complete circuit.  This is a more subtle problem since most boards are built with ground planes.  

When it is clear all the supplies are connected, then we have to determine where things are going wrong.  Since I don't have access to JTAG tools I use the method of "binary search", which means I cut off half my problem and see if it still is a problem or not.  This is normally a software process, but it works just as well with hardware.  If one chip appears to be a problem, it might well be that what it is driving is the problem.  Sometimes it is not possible to lift a pin or cut a trace to accomplish this break.  If the pin in question is on a programmable device the next option is to turn it from an output to an input and drive the line with moderate resistance.  If the pin is internally shorted to ground, you don't want to melt anything.  

How does a pin get internally shorted to ground?  The protection circuit wasn't enough when someone walked across a rug and touched a button.  I've learned the hard way that before I touch anything, I touch ground. 

Hardware is the easy part of debugging.  Every piece of electronics these days has some kind of programmable device in it.  Debugging software is much much harder.  To see why, let's look at some numbers.  Take a typical 16 bit microcontroller or DSP.  They can have 30 to 60 lines which can toggle.  So the total number of possible states the device can be in is $2^{60}$.  A big number certainly.  But these devices have anywhere from 32k to 1Meg of ROM.  The total number of possible states the ROM can have is $2^{2^{20}}$. Granted, most of those states are illegal instructions and the processor won't do anything, but the point is that the number of states a microcontroller can be in is many orders of magnitude larger than the number of states its external wires can be in.

Finding problems in software (or firmware if you prefer that term) can be done in several ways.  Having an emulator or in circuit debugger helps a lot, and binary search is my main tool.  But sometimes you want to know how things are failing with subtle problems.  These include interrupts that don't do what you expect, or connections between main and some subroutine.  That is when I use hardware to debug software.  

Typically I have a spare pin or two on a processor.  If not, I sacrifice a chunk of hardware to debug the zone of problem software.  The pins are set high when I enter a routine, and low when I exit.  This tells me how much time I spend in a routine, and how often I actually call it.  This is exceptionally useful in debugging interrupts. If the line never goes high, the interrupt never actually got called.  If it oscillates rapidly, I know I forgot to flip a bit to clear the interrupt generator. 

Using hardware to debug software and software to debug hardware introduces more bugs.  However, these can make us think differently about how we want to solve a problem, and can give rise to a better way to build the whole system.  By asking more questions about why something is misbehaving, we might find out it is doing exactly what was specified.  In other words, it's not a bug, it's a feature! Debugging a system is more than just fixing the hardware and ensuring the software works.  It is making a product useful to the end user.

The most important step in debugging is getting stuck and confused.  When you get to the point where you have checked all you can think of and things simply don't make sense because "it can't do that!" it's time to get up and walk away.  When you drive home, go to sleep, take a shower, your mind continues to work in the background.  You will suddenly find a moment when you ask "did I check that?"  While it may not have anything to do with the problem, it can get you to think about other connections to where the problem might be.

The reason debugging is an art is because every situation is different.  While there are certain tricks you can use in every case, not all of them are going to be useful.  There is no "right order" - should you binary search or use hardware pin toggles first?  It depends, and after many times of doing both it will be obvious which to try first in a given situation.  If software goes into an infinite loop printf's are not your friend.  Killing the program on a counter and adjusting the limit to be 2 steps into the infinite loop and then adding printf's will help you find why the loop continues.  

Like everything, "experience is what you have after you need it."  Designing things to make debugging easy actually reduces the number of bugs in the first place.  But there is no getting around reality.  Systems will fail in strange ways because the whole is more than the sum of the parts.  Because it's both challenging and fun to solve real world puzzles debugging is the essence of engineering.



To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: