Friday, December 19, 2008

printf-acular

Ahh, printf. What would we do without printf, sprintf, etc? It makes it easy to format integers, floats, strings, MAC addresses... Oh wait, there isn't a format specifier for MAC addresses is there? Well, lets rectify this obvious omission.

glibc (and uClibc) implement a facility to add new format codes to the printf family of functions. It is even possible to override the builtin implementations of the C standard formats, if the Gentle Reader has an irrational hatred for some aspect of the output. The GNU documentation for how to do this is adequate, but a bit obscure. So lets dive into an example by adding a %M specifier to handle MAC addresses:

uint8_t mac[6] = { 0x00, 0x11, 0x22, 0x33, 0x44, 0x55 };

if (register_printf_function ('M',
        printf_output_M, printf_arginfo_M)) {
    ... error handling ...
}

printf("%M\n", mac);

Output:
00:11:22:33:44:55

Huzzah! Now we'll dig into the implementation.


 
printf_arginfo_M

The first routine to be implemented tells glibc the arguments required by the new specifier:

#include <stdio.h>
#include <printf.h>

static int
printf_arginfo_M(const struct printf_info *info,
                 size_t      n,
                 int        *argtypes)
{
    /* "%M" always takes one argument, a pointer to uint8_t[6]. */
    if (n > 0) {
        argtypes[0] = PA_POINTER;
    }

    return 1;
} /* printf_arginfo_M */

The fields of the printf_info structure are described in the GNU manual. For the arginfo() implementation the only important field is spec, the character code of the specifier for which the information is sought. This allows a single arginfo() routine to implement support for multiple new specifiers.

glibc also passes in an array of argtypes[] with n members, which are to be filled in with the types of the arguments expected. The available types are documented in the GNU manual. The arginfo() routine must populate the argtypes[] with the expected arguments, after checking to ensure there is room. It should return the number of arguments it expects. If you define a format specifier which takes a large number of arguments, there might not be sufficient room in argtypes[] to store them all. If the arginfo() routine returns a larger value than n, glibc will call it again with a larger argtypes[] array.

For %M there is only one argument, the MAC address to be formatted. The ability to write a format specifier with more than one option is potentially useful, but also confusing as none of the standard format options do so.


 
printf_output_M

Next we need to implement the output routine, to generate the formatted output for the new specifier. Output always goes to a FILE *, no matter whether called from printf(), sprintf(), vfprintf(), etc.

One is free to use the stdio.h routines like fprintf(), so long as you do not use your new specifier in the format string. If you do, fprintf will call back into your output routine, which will call fprintf() again, which will call your output routine again... and hijinks ensue.

static int
printf_output_M(FILE *stream,
                const struct printf_info *info,
                const void *const *args)
{
    const unsigned char *mac;
    int len;

    mac = *(unsigned char **)(args[0]);

    len = fprintf(stream, "%02x:%02x:%02x:%02x:%02x:%02x",
                  mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

    return len;
} /* printf_output_M */

We only take one argument for %M, so we only look at args[0]. There is no need to check for NULL: glibc validates the format before calling your output routine, you'll never get a NULL pointer. We call fprintf() to do most of the work, and return the number of bytes output.

printf("%30M\n", mac);

00:11:22:33:44:55

Voila! I hope you've enjoyed this article... oh wait, that wasn't very good was it? The "30" prepended to the M specifier was meant to right-justify the output within a 30 character region. The output did not reflect this. Oops.


 
printf_output_M, with modifiers

glibc parses out the modifiers prepended to the specifier, and passes them to the output routine via the printf_info structure. Many of the standard modifiers don't make sense for MAC addresses. For example, I have no idea what "%llM" would mean. A few modifiers do make sense, such as a field width and the '-' argument to left-justify.

We certainly could write a bunch of code to insert the proper number of spaces before or after the MAC address, but we'll use a slightly different approach. We call fprintf() to generate the output anyway, so we'll generate a format string for fprintf() to make it handle the width and justification.

static int
printf_output_M(FILE *stream,
                const struct printf_info *info,
                const void *const *args)
{
    const unsigned char *mac;
    char macstr[18]; /* 6 hex bytes, with colon seperators and trailing NUL */
    char fmtstr[32];
    int len;

    mac = *(unsigned char **)(args[0]);
    snprintf(macstr, sizeof(macstr), "%02x:%02x:%02x:%02x:%02x:%02x",
            mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

    /* We handle some conversion specifier options as appropriate:
     * '-' and field width */
    if (info->width > 0) {
        snprintf(fmtstr, sizeof(fmtstr), "%%%s%ds",
                (info->left ? "-" : ""), info->width);
    } else {
        snprintf(fmtstr, sizeof(fmtstr), "%%s");
    }

    len = fprintf(stream, fmtstr, macstr);

    return len;
} /* printf_output_M */

We generate the formatted MAC address as a string buffer on the stack. The bolded code examines the modifiers in printf_info, and generates a format string in fmtstr[]. For example if the arguments to printf were ""%-30M", the fmtstr[] will be "%-30s"

Now the output is much more pleasing:

printf("%30M\n", mac);

               00:11:22:33:44:55

 
printf_output_M, Now with More Cisco

Which looks more natural to the Gentle Reader?

00:11:22:33:44:55

0011.2233.4455

If you chose the first one, you likely come from a desktop background with Operating systems such as Windows, MacOS, Linux, Solaris, or the various other Unixes which all use a colon-seperated MAC address.

If you chose the second one, you likely come from a Cisco networking background. Cisco uses a MAC address format which takes up three fewer columns on the screen, important if one's user interface is a text CLI.

Could we accommodate this variation? Certainly it could be done using another conversion specifier, perhaps using "%C" for a Cisco-style MAC address, but this seems inelegant. Coming to our rescue is a little used format modifier in printf: the 'alternate representation' hash mark. This modifier implements optional formatting, for example when used with the hex specifier as "%#x" it will prepend 0x to the output.

So we'll implement the Cisco MAC address format using "%#M" in printf_output_M, like so:

    if (info->alt) {
        /* Cisco style formatting */
        snprintf(macstr, sizeof(macstr), "%02x%02x.%02x%02x.%02x%02x",
                mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    } else {
        /* Unix/Linux/Windows/MacOS style formatting */
        snprintf(macstr, sizeof(macstr), "%02x:%02x:%02x:%02x:%02x:%02x",
                mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    }

If invoked with "%#M" we'll take the first code path and produce a Cisco-style MAC address. Otherwise, we generate a colon separated MAC address.


 
Whither -Wall?

With the -Wformat (or -Wall) argument, gcc will sanity check the arguments passed to the printf family of routines. It can ensure that the number of arguments matches the number of specifiers in the format string, for example. Unfortunately, it does not know about custom additions:

gcc -Wall testM.c
testM.c:57: warning: unknown conversion type character 'M' in format
testM.c:57: warning: too many arguments for format

I know of no way to teach gcc about the new specifiers, unfortunately. If the Gentle Reader knows of a way, please describe it in the comments.


 
GPL

I am not a lawyer, I do not play one on TV, and I do not want to play one on TV. So my opinions about GPL issues are really just quaint little notions of How The World Should Work.

An argument has been advanced in other contexts that if the only implementations of a particular API are licensed under the GPL, then any caller of that API should be considered a derivative work and be subjected to the GPL. As the register_printf_function() API only exists in glibc and uClibc, both of which are licensed under the LGPL, it could be argued that code calling register_printf_function() is therefore encumbered by the LGPL. Opinions may differ, but this is one possible interpretation.

Personally, I choose to treat custom printf formatters as LGPLd code.


 
Closing Thoughts

It is of course possible to implement routines to format a MAC address as a string, using "%s" in stdio.h routines for output. However I think it is more elegant to teach stdio.h how to deal with common output requirements for your application.

The C standard may claim additional lower case specifiers for standard formats in the future. It is best to rely on upper-case codes for custom formats.