Google Protocol Buffer Deserialization The Hard Way
By Mike Schiffman
This article is a low-level tutorial explaining why and how to write your own Google Protocol Buffer (protobuf) deserializer using the C programming language. Using the techniques discussed here, you can write your own code to supplant or replace the code generated via protobuf-c or Nanopb. This tutorial is specific to Farsight Security's nmsg package. If you write C code that deals with NMSGs, or directly with protobufs, this article is for you.
Before reading this article, it is recommended that you read the Farsight Security Blog series on NMSG with specific attention to Farsight's Network Message, Volume 3: Headers and Encoding.
Counting NMSG Payloads
Farsight Security has a simple use case in which we need to count the number of payloads in on-wire NMSG containers. The environment this needs to be done is our real-time SIE switch fabric, which, in aggregate, has extremely high bit rates.
Farsight solved this problem by rolling our own protobuf deserializer. We
think the concept and overall process to be useful to our peers and customers.
As such, we extracted the code from our production tool and wrapped it into a
standalone program named
nmsgpcnt-fsi. Also included for benchmarking are
two reference implementations,
nmsgpcnt-ptc which are
written using popular open source C-based protobuf implementations.
All three tools accomplish the same thing. Each parses on-disk NMSG protobufs and counts the number of containers and total number of payloads. They differ mainly in two areas, code complexity and speed of execution.
Why In The World Would You Do This?
The reason we wrote this low-level code when it would be much easier to use a
generated API is singular: speed of execution. As you'll see,
is quite a bit faster than the other two benchmark programs. Our protagonist,
nmsgpcnt-fsi should be faster – it is purpose-built to perform very specific
operations on protobufs. The protobuf code generators are designed to be
multipurpose and support a wide range of operations across many different data
types. As such, for many operations on protobufs, there are calls to
nmsgpcnt-fsi has none of these.
Because of this, the comparison between
nmsgpcnt-fsi and the other two
is definitely skewed in favor of
nmsgpcnt-npb and nmsgpcnt-ptc
The first reference program ,
nmsgpcnt-npb, is built using Nanopb, a C-based
protobuf implementation with a "small code size". Targeted for embedded systems
and 32-bit microcontrollers, Nanopb touts minimal requirements for RAM and
code space. While there are no
free() calls during
deserialization, some speed has been sacrificed for code size. This will be
clearly shown in our tests below.
The second reference program,
nmsgpcnt-ptc, is built using protobuf-c, a
feature-rich pure C protobuf implementation.
The source code complexity for both is relatively low.
Both programs are invoked identically, with a single file argument referring to
a binary NMSG file. For benchmarking, we'll use a large NMSG file containing
just over 2,000,000 payloads as captured using
nmsgtool on Farsight Security's Passive DNS feed on
SIE Channel 202.
$ nmsgpcnt-npb nmsgpcnt-npb parse an NMSG file and emit container/payload count usage: ./nmsgpcnt-npb file.nmsg
$ ./nmsgpcnt-npb 202-2000000.nmsg containers: 720 payloads: 2000481 $ ./nmsgpcnt-ptc 202-2000000.nmsg containers: 720 payloads: 2000481
nmsgpcnt-fsi is the Farsight Security in-house built protobuf deserializer.
It follows the same overall logic as the other two but the code is quite a bit
larger and more complex. This is because
nmsgpcnt-fsi has no external
dependencies and carries with it exactly and only the code it needs to decode
an NMSG protobuf and count each payload.
The workhorse in
nmsgpcnt is the
decode_pb() function. It is responsible
for deserializing and parsing the NMSG container and processing
fields. The function is about as fast as it can be. It iterates over an NMSG
container and populates a passed-by-referenced structure. It does no memory
allocation or deallocation, no memory copies, and has just four possible
nmsgpcnt-fsi is faster than its contemporaries, the source
code is more than twice the size and more tedious to read, write, and
maintain. Also, the very nature of dealing with code that operates at the
bit-level practically invites mistakes and requires a lot of testing. Caveat
Benchmarking NMSG Payload Counting
Included with the
nmsgpcnt distribution is a shell script
test.sh that executes and times all three
programs, displaying the results. You can see the crosswalk below:
$ ./test.sh 202-2000000.nmsg Running nmsgpcnt-npb (nanopb) against 202-2000000.nmsg containers: 720 payloads: 2000481 real 2.31 user 1.84 sys 0.47 Running nmsgpcnt-ptc (protoc-c) against 202-2000000.nmsg containers: 720 payloads: 2000481 real 1.47 user 0.97 sys 0.49 Running nmsgpcnt-fsi (Farsight Security) against 202-2000000.nmsg containers: 720 payloads: 2000481 real 0.59 user 0.15 sys 0.44
nmsgpcnt-fsi program is more than twice as fast as the protobuf-c
variant, and almost five times faster than the Nanopb version. In production,
this translates into faster processing of NMSG datagrams and lower packet loss.
The following is a discussion of the full
nmsgpcnt-fsi C source code
with in-line annotations.
The first section contains the source code license and the standard C header file include progression:
Define Protobuf Symbolic Constants
The following symbolic constants are defined for convenience and code clarity.
Define Varint Decoder
All of the magic performed below is done using C's bitwise operators. If you're not already, you'll want to get familiar with them and with how protobufs are encoded. You also might find interesting this discussion of varints and accompanying C++ code from which the following was inspired.
CHNM_DCDE_VARINT() macro decodes an unsigned varint. It has the following
- shifter: Used to shift decoded bits into their correct position.
- buf: The buffer containing the stream of protobuf bytes.
- val: An accumulator that will contain the decoded value.
- idx: The index into buf.
With varint encoding, the lower 7 bits of each byte are used to store the two's complement representation of the number in groups of 7 bits, least significant group first. The most significant bit is used as a sentinel value to let the decoder know if more bytes are coming.
The macro works by iterating over protobuf bytes in
buf, decoding each byte
and iteratively storing the result in
val. It strips the most significant
bits from each byte and accumulates them in
val. It does this until
buf[idx] & 0x80 evaluates to false, which means it has decoded the last
byte in the varint. To see why this is true, recall the following:
- Most significant bit set: More bytes to come.
- Most significant bit cleared: No more bytes to come.
CHNM_DCDE_VARINT() is declared as a macro in order to ensure the
code is placed inline and obviate the overhead a function call would require.
Define NMSG Protobuf Data type
The following opaque structure is used to hold data about an NMSG container.
For the use case in this article,
nmsgpcnt-fsi is only concerned about the
number of payloads, but it can be extended by adding additional members to
nmsg_pb_data_t and adding more decoding logic to
Define The Decoder
The function that does the decoding is defined here. The workflow is here straightforward.
- An outer loop walks the entire contents of the protobuf.
- The current byte is logically ANDed with the
PB_WIRETYPE_MASKwhich masks off the protobuf wire type bits. This value is used to branch from in the switch table.
- Depending on the wire type, the appropriate action is taken. Here,
nmsgpcnt-fsionly cares about wire type
PB_WT_LDwith a field number of
1. This heralds the start of an NMSG payload. The payload counter is incremented and the length of payload is decoded (which itself is a varint encoded value). The payload bytes are stepped over and control passes back to the top of the loop.
Sidebar: The switch table makes extending the decoder to understand other NMSG protobuf fields quite easy. In fact this code is actually part of a larger, more extensible framework used inside Farsight for programs that need to get at other parts of the NMSG protobuf beyond an ordinal payload count.
Main Entry Point
Here is the main entry point into the program. The only notable bit here
load_container() function (contained in
common.c). It is responsible
for the following:
- Open the NMSG file specified at the command line.
- Figure out how big it is.
- Allocate enough memory to hold its contents.
- Read contents into memory.
- Return a buffer containing the NMSG container.
A successful call to
load_container() needs to be balanced with a
subsequent corresponding call to
Loop over Containers
Next is the main event loop. An NMSG can have one or more containers (that each can have zero or more payloads). The program iterates over conntainers and serially processes each one.
Process NMSG Header
nmsgpcnt-fsi can decode a protobuf, it needs to first do some header
verification. If any of the following checks fail, the program breaks from the
event loop and exits.
- Confirm the NMSG magic number in the header.
- Confirm the NMSG protocol version (is
- Confirm the container does not contain compressed payloads.
- Confirm the container does not contain fragmented payloads.
- Get the length of the container.
The NMSG header is now processed and
nmsgpcnt-fsi is ready to decode the
Decode the Protobuf
Next the decoder is run. If no errors are encountered, the container and payload counters are incremented and control returns to the top of the event loop.
Wrap up and Shutdown
After all of the containers are exhausted the totals are emitted to
Next, the memory held by
buf is released and the program exits.
Mike Schiffman is a Packet Esotericist for Farsight Security, Inc.