Delayed printf for real-time logging

You often debug by adding a few printfs and looking at the logs. In some real-time/low-level contexts though, you don't have time for text formatting.

You don't want prints to affect timing too much, because then timing-related bugs you're chasing might disappear. And you certainly don't want the system to stop functioning altogether because prints cause it to miss real-time deadlines.

A common alternative to prints is more "raw" logging - an event buffer, where event is a union keeping the event type, and a few variables depending on the type. This is fast, but not nearly as readable (or writable) as free-form printf messages.

"Delayed printf" is a way to get it all - printf's convenience together with event buffers' speed. What we do is we take printf's arguments and save them into a buffer like this:

format_string_pointer
num_args
arg0
arg1
...
format_string_pointer
num_args
...

Then, outside of the program logging these messages, we can read the actual format strings from the executable file, and call printf to get formatted text. As long as all the format strings and %s arguments are constant, this works fine - and is very fast.

I wrote a working example at GitHub - reading and writing log files takes just 100+ LOC. In what follows, we'll walk through the implementation.

How do we read strings from the executable file? We could parse the file - it's a bunch of data sections, that is, base addresses and the bytes to be copied to those addresses by the program loader. For instance, on Linux, #include <elf.h> has the structs describing program sections in ELF binaries.

But there's a simpler way to lay our hands on the strings - just ask the debugger. In gdb, "print (char*)0x8045008" will print the string starting at that address (or garbage if a string doesn't start there.) We'll use something similar, except we'll script gdb in Python:

charp = gdb.lookup_type('char').pointer()
return gdb.Value(ptrbits).cast(charp).string()


Ugly, ain't it? Well, we could use the shorter

gdb.parse_and_eval('(char*)0x%x'%ptrbits).string()

...but parse_and_eval is dog-slow - slow enough to make the longer incantations well worth it.

Armed with read_string, we can take a raw log buffer - a list of words (32-bit in my examples) - and convert it to text:

def print_messages(words):
i = 0
while i<len(words):
n = words[i+1]
args = words[i+2:i+2+n]
print fmt % convert_args(fmt, args),
i += n+2

This simply reads a format string pointer, a number of arguments, and then that number of arguments until it runs out of words.

Why do we need convert_args? Because args is a bunch of integers, which is fine with %d or %x but not with %s or %f.

With %s, the integer is really a string pointer, and so we want to read_string. And with %f, we want to reinterpret the word bits as a floating point number - what C does with *(float*)&int_var.

def convert_args(fmt, args):
types = [s[0] for s in fmt.split('%')[1:]]
unpack = {'s':read_string, 'd':int, 'x':int, 'f':to_float}
return tuple([unpack[t](a) for t,a in zip(types,args)])

def to_float(floatbits):
return struct.unpack('f',struct.pack('I',floatbits))[0]

This is a tad naive - it doesn't support things like %.2f - but you get the idea.

And now we can add a user-defined gdb command called dprintf like so:

class dprintf(gdb.Command):
def __init__(self):
gdb.Command.__init__(self, 'dprintf', gdb.COMMAND_DATA)

def invoke(self, arg, from_tty):
def word(b): return struct.unpack('i',b)
words = [word(bytes[i:i+4]) for i in xrange(0, len(bytes), 4)]
print_messages(words)

dprintf()

We derive a class from gdb.Command so that __init__ registers a command named dprintf. When the command "dprintf mylog.raw" is entered, we open mylog.raw, read its words into a Python list, and call print_messages. We don't need to be in an interactive gdb session to decode log files - we can issue the command directly from the shell:

gdb -q prog -ex 'py execfile("dprintf.py")' -ex 'dprintf mylog.raw' -ex q

This can be made shorter by importing dprintf.py (the file with all the Python code above) somewhere in gdb's initialization scripts - then we won't need the execfile part.

Writing log files

To save printf arguments, we need a way to iterate over them. I figured C++11 variadic templates are faster than fiddling with C's va_list, which wants you to know the argument types to access them, which means you must read the format string, which I'd rather not. So here's the C++11 version:

typedef int32_t dcell;
struct dbuf { dcell *start, *curr, *finish; };
extern dbuf g_dbuf;

template<class... Rest>
void dprintf(const char* fmt, Rest... args) {
int n = sizeof...(args);
if(p + sz < g_dbuf.finish) {
*p++ = (dcell)fmt;
*p++ = n;
dwrite(p, args...);
}
}

I'm using a global buffer, g_dbuf - of course you could prefer passing a buffer as a dprintf's argument. But actually, a global buffer isn't that bad because dprintf updates the buffer atomically, with GNU's __sync_fetch_and_add. So it's OK to call dprintf from multiple threads and even from interrupt handlers (the latter you wouldn't do with a lock-protected buffer, because an interrupt meeting a buffer locked by a suspended thread is game over. But it's fine with a lock-free buffer - the point of lock-freedom being that someone busy with the buffer can never block someone else by getting suspended).

As to sizeof...(args) - in case you've never seen that, it's the number of a variadic template's arguments, and has nothing to do with their sizes in bytes.

dprintf uses dwrite to write the arguments into a buffer:

inline void dwrite(dcell* p) {}

template<class T, class... Rest>
inline void dwrite(dcell* p, T first, Rest... rest) {
*p++ = tocell(first);
dwrite(p, rest...);
}

This is actually a perfectly prosaic recursively self-instantiating variadic template - "the" way to iterate over C++ variadic template function arguments (is there any other way?)

In turn, tocell takes an argument of an arbitrary type and extracts the 32 bits. In my example, I overloaded it for ints, floats and char*, along the lines of:

inline dcell tocell(int x) { return x; }
inline dcell tocell(const char* p) { return (dcell)p; }
inline dcell tocell(float x) { union { int n; float f; } nf; nf.f = x; return nf.n; }

That's it; the only other thing we need is a dflush function writing the buffer out to a file, and we can use dprintf:

int i;
for(i=0; i<10; ++i) {
dprintf("i=%d i/2=%f %s\n", i, i/2., (i&1)?"odd":"even");
}
dflush();

dprintf("going to crash...\n");
volatile int* p=0;
dprintf("i=%d p=0x%x\n", i, (dcell)p);
p[i]=0;

What's with the "going to crash" part? Well, I think a really important part of logging is being able to look at not-yet-flushed messages in a debugger - either in a core dump or in a live debugging session. (Not being able to do this is an annoying thing about buffered printfs; of course you can work around this by sprintfing into a buffer of your own.)

So let's extend our dprintf gdb command to print messages directly from g_dbuf instead of reading from a file:

intpp = gdb.lookup_type('int').pointer().pointer()
def invoke(self, arg, from_tty):
if arg:
else: # no file passed
buf = gdb.parse_and_eval('&g_dbuf').cast(intpp)      start = buf[0]      curr = buf[1]
n = curr - start
words = [int(start[i]) for i in xrange(n)]
print_messages(words)

Here we do an ugly thing - we treat &g_dbuf as an int** instead of a dbuf*. Why do that - why not avoid any casts and simply access start and curr with buf['start'] and buf['curr']? Because that only works if we have DWARF debug info - that is, if we compiled with -g. But the uglier way above works even without -g. The debugger only needs to know where g_dbuf is, but nothing about its type. And where g_dbuf is debuggers know from the ELF symbol table - which is always there unless someone actively stripped the executable.

And now we can see the last messages in a core dump like so:

gdb -q progname core -ex dprintf -ex q

...and in a live debugging session, we'd simply type "dprintf". With our example, this should print:

going to crash...i=10 p=0x0

Conclusions

• You can debug with free-form text prints in the lowest-level contexts - interrupt handlers, systems with tiny memories, etc. etc.
• If you use C++, it's good to upgrade to C++11
• If you use gdb, it's good to have a repository of user-defined Python commands
• A debugger can be very useful even with release builds

If your strings are logged from a shared object, then the logged string pointers will not match the addresses stored in the .so file - there'll be an offset. It's OK if you have a debugged process or a core dump, because gdb knows where the .so was loaded. The problem is when you parse files logged by a dead process. One workaround is to log the runtime address of some global variable with a known name, and use that to compute the offset relative to the address of that name in the .so.

Previous post by Yossi Kreinin:
How FPGAs work, and why you'll buy one

[ - ]
Comment by October 25, 2013
How would you handle delayed printf of C strings that happen to be on the stack?
The memory will be thrashed by the time you print it out later.
[ - ]
Comment by October 25, 2013
I wouldn't :) As the article says, "As long as all the format strings and %s arguments are constant, this works fine" - I could have said "literals" but that's not strictly true, and "constants" isn't strictly true, either; well, we both know what I mean... In my production C code, I'm trying to make as many strings (and other pointers as well) point to static locations precisely because these addresses are the only ones you can always make sense of when things go awry.
[ - ]
Comment by August 5, 2015
Really interesting article, thanks!

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.