Georg's Log

Sat 16 July 2016

On sprintf() Fails

Posted by Georg Sauthoff in C   

The sprintf() and snprintf() C library functions are commonly used for formatting a string and writing the result into a buffer. But sometimes it is surprising in how many ways sprintf() can be misused.

The Ugly

Some enterprise C programs look like submissions to the useless-use-of-sprintf award. It appears as sprintf() is used as a hammer and suddenly everything looks like a nail to hit, e.g.:

char s[128];
const char *t = get_some_string();
sprintf(s, "%s", t);

Instead of:

char s[128];
const char *t = get_some_string();
strcpy(s, t);

The problem with sprintf() here: it is less efficient because the format string is parsed at runtime. But at least it works.

The variant snprintf() works like sprintf() but truncates the output if the result buffer is too small. Following real-world example increases the uselessness of the previous example:

char s[128];
const char *t = get_some_string();
snprintf(s, strlen(t), "%s", t);

It is like an emulation of sprintf() using snprintf().

If truncation is the objective, snprintf() at least should be used like that:

char s[128];
const char *t = get_some_string();
snprintf(s, sizeof(s), "%s", t);

But following idiomatic solution is more efficient because no format string needs to be parsed at runtime:

char s[128];
const char *t = get_some_string();
size_t l = strlen(t);
size_t n = sizeof(s)-1 < l ? sizeof(s)-1 : l;
memcpy(s, t, n);
s[n] = 0;

Or even on a GNU libc system:

char s[128];
const char *t = get_some_string();
*(char*)mempcpy(s, t, MIN(sizeof(s)-1, strlen(t)) = 0;

The MIN macro may come from e.g. <sys/param.h> - even if implemented in the obvious way - an optimizing compiler is able to generate code where strlen() is only called once.

One might argue that snprintf makes the code easier to grasp. I don't think so. And that the runtime overhead surely can't be that bad. Perhaps. But: if runtime is not that important, then one even can use a scripting language that provides very convenient string formatting facilities (e.g. Python) for that task. And if C is not a requirement than e.g. C++ is the better alternative to write high-level code that looks sane and is efficient.

Btw, often, silent truncation is also not the best solution - i.e. perhaps it even makes more sense to just return an error if the destination buffer is too small.

The Pitfalls

Like with other functions that accept C format strings, the usual pitfalls apply. There is no safety mode at runtime that checks, if the number of arguments and types indicated by the format string actually match the supplied arguments. For example:

sprintf(s, "Hello %s");

The C-string argument is missing, likely to segfault at runtime. A wrong type:

int i = 1
sprintf(s, "Hello %s", i);

Effectively, the int is cast to a char pointer.

Those format string issues can be more subtle, e.g.

sprintf(s, "sizeof(long) = %d", sizeof(long));

has the issue that the type of sizeof() is size_t which should be referenced with the %zu specification - such that the code stays portable.

Modern compilers like GCC and Clang helpfully warn about such format string related issues. But compilers like Solaris Studio (as of 12.3) still don't.

As always, if format strings are constructed from user input, one has to take measures against injected format strings.

The Bad

The real WTF is using sprintf()/snprintf() with overlapping buffers because this yields undefined behaviour. Classic example:

sprintf(s, "%s<tag>%s</tag>", s, t);

Here, the first argument completely overlaps with the destination buffer. Besides the effect of this statement being undefined, even if it works as intended, it is very inefficient. Not only is a format string parsed at runtime - already written prefixes are rewritten with each such call, leading to quadratic runtime (instead of a linear one).

That sprintf()/snprintf() on overlapping buffers is undefined behaviour isn't special to those functions. It is a common theme with C functions that copy between buffers. It is the reason d'etre for memmove().

The C standard is pretty clear about this, too:

If copying takes place between objects that overlap, the behavior is undefined.

(ISO C99, 7.19.6.6 (2))

Sure, not everybody has a copy of the C standard lying around. But the POSIX standard (or SUSv2) is freely available and uses a similar wording:

If copying takes place between objects that overlap as a result of a call to sprintf() or snprintf(), the results are undefined.

(fprintf(3p))

Of course, the POSIX man pages are not installed by default on every system, but even the Linux man page is helpful, as well:

C99 and POSIX.1-2001 specify that the results are undefined if a call to sprintf(), snprintf(), vsprintf(), or vsnprintf() would cause copying to take place between objects that overlap (e.g., if the target string array and one of the supplied input arguments refer to the same buffer). See NOTES.

(sprintf(3))

Where the notes even give a concrete example:

Some programs imprudently rely on code such as the following

sprintf(buf, "%s some further text", buf);

to append text to buf. However, the standards explicitly note that the results are undefined if source and destination buffers overlap when calling sprintf(), snprintf(), vsprintf(), and vsnprintf(). Depending on the version of gcc(1) used, and the compiler options employed, calls such as the above will not produce the expected results.

(sprintf(3), emphasis theirs)

Although there are man pages that don't document this behavior (e.g. Solaris 10, FreeBSD 10.3) - it should be pretty well-known.

Still, some enterprise C programmers seem to like this anti pattern a lot. Of course, such code has optimizations disabled (because they "can't be trusted and are often buggy" - blame the compiler, not the code ...) and contains other cargo cult measures. A variation, a combination of the bad with the ugly:

char *trim_left(char *s)
{
  while (*s == ' ')
    sprintf(s, "%s", s+1);
  return s;
}

Fun comment about a similar function I stumbled on: 'fast code because the loop is tight!11!'

(The obvious and efficient way to do this is to determine the prefix length first and then call memmove() once.)

Because this issue has come to my notice relatively often in work related contexts (and since GCC 5/Clang 3.9 don't warn about it), I think about writing a check for clang-tidy.