Passive DNS and SIE File Formats

By

RSS

I. Introduction

When working with DNSDB passive DNS data files or the Security Information Exchange ("SIE"), you may run into three primary file formats:

  1. MTBL files (usually actually DNSTABLE format MTBL files)

    The immutable sorted string table files that power DNSDB. MTBL format files support compression, and tends to be a very space-efficient format.

  2. NMSG files

    This is the file- and wire-format used for most Security Information Exchange data. It leverages Google Protocol Buffers, and supports different message types via a plugin system. Like MTBL-format files, NMSG-format files also support compression.

  3. JSON Lines format files

    A popular human- and machine-readable key-value format for sharing data. Each observation ends with a newline (unlike regular JSON, which looks like one huge "run-on" line). JSON Lines format files are very verbose relative to MTBL and NMSG format files.

Those three formats can be converted as shown in the following diagram:

File Format Conversion Diagram

Figure 1. Relationship Between DNSTABLE Format MTBL FIles, NMSG Files, and JSON Lines Files

The above figure shows that:

  • dnstable_unconvert can be used to take a DNSTABLE format MTBL file and produce an NMSG format file

  • dnstable_convert can take (some) NMSG format files and produce DNSTABLE format MTBL files

  • dnstable_dump (with the -r and -j options) can dump a DNSTABLE format MTBL file in JSON Lines format

  • nmsgtool (with the -r and -J options) can dump NMSG files in JSON Lines format

  • nmsgtool (with -j and -w options) can create NMSG files from JSON Lines format input.

This article will NOT be considering the proprietary file formats supporting DNSDB Flexible Search, nor the process of ingesting raw DNS sensor traffic.

II. DNSTABLE Format MTBL files

We use DNSTABLE format MTBL files to store the main DNS data that powers DNSDB. As such, MTBL format files are very important.

Working with MTBL files requires the mtbl library. To retrieve and build a copy of the mtbl library:

$ git clone https://github.com/farsightsec/mtbl.git
$ cd mtbl
$ sh autogen.sh
$ ./configure
$ make
$ make check
$ sudo make install
$ cd

Note: Some Macs (which often may have installed the snappy compression library via the homebrew package manager) may sometimes be unable to automatically find libsnappy when configuring.

If so, you may first need to adjust your library path, perhaps with:

$ export LDFLAGS="-L/opt/homebrew/opt/curl/lib -L/opt/homebrew/Cellar/snappy/1.1.9/lib"

In addition to installing the mtbl library itself, some mtbl utility programs will also be installed, typically into /usr/local/bin/

Command            Purpose
--------------------------------------------------------------------------------------
mtbl_dump          print key-value entries from an MTBL file
mtbl_info          display information about an MTBL file
mtbl_merge         merge MTBL data from multiple input files into a single output file
mtbl_verify        verify integrity of an MTBL file's data and index blocks

Each is described in a corresponding man page, which will typically be installed in a subdirectory of /usr/local/share/man/

Some mtbl-related commands may do more than their plain name may imply. For example, the mtbl_merge command is often used to combine multiple mtbl files, as you'd expect from its name and description above, but it can also be used to convert from the mtbl file's current compression scheme to a new one (supported options are none, snappy, zlib, lz4, lz4hc, and zstd). Using the mtbl_merge command requires that two environment variables be set first. On a typical Mac, those might look like:

$ export MTBL_MERGE_DSO="/usr/local/lib/libdnstable.0.dylib"
$ export MTBL_MERGE_FUNC_PREFIX="dnstable_merge"

Once those have been set, you can then run the mtbl_merge command.

For example, assume you have an mtbl minutely file, such as dns.20211101.1825.m.mtbl, and you'd like to convert that mtbl minutely file to use an alternative compression algorithm, such as Snappy. To do that we'd say:

$ mtbl_merge -c snappy dns.20211101.1825.m.mtbl dns.20211101.1825.m.snappy-mtbl

Some compression algorithms allow various compression levels. To specify a non-default level, use the dash ell option:

$ mtbl_merge -c zstd -l 5 dns.20211101.1825.m.mtbl dns.20211101.1825.m.zstd-5-mtbl

We suspect that many people may be curious to see how the various compression algorithms compare.

While this article is not primarily about mtbl compression, we wanted to at least provide a rough sense of how various compression options look for our sample minutely file. Here are some approximate results run on a Mac M1 laptop with 16GB of memory and no particular optimizations:

File:                                    File Size   Time        Compression Rate
---------------------------------------------------------------------------------
dns.20211101.1825.m.mtbl (base file)     79,397,449

dns.20211101.1825.m.zlib-1-mtbl          82,409,127   2.90 sec   1,198,449 ent/sec
dns.20211101.1825.m.zlib-2-mtbl          81,790,534   2.90       1,198,173
dns.20211101.1825.m.zlib-3-mtbl          81,357,511   2.98       1,165,439
dns.20211101.1825.m.zlib-9-mtbl          79,663,682   3.99         870,772

dns.20211101.1825.m.zstd-1-mtbl          85,364,913   2.02       1,724,310
dns.20211101.1825.m.zstd-2-mtbl          82,612,800   2.06       1,683,887
dns.20211101.1825.m.zstd-3-mtbl          81,201,060   2.19       1,584,405
dns.20211101.1825.m.zstd-4-mtbl          80,287,782   2.41       1,440,404
dns.20211101.1825.m.zstd-5-mtbl          79,110,457   2.71       1,281,033
dns.20211101.1825.m.zstd-9-mtbl          78,699,877   4.51         771,417
dns.20211101.1825.m.zstd-19-mtbl         76,488,503  26.02         133,583
dns.20211101.1825.m.zstd-22-mtbl         76,488,374  26.25         132,411

dns.20211101.1825.m.lz4hc-mtbl           97,268,208   2.80       1,243,362

dns.20211101.1825.m.snappy-mtbl         104,020,092   1.68       2,064,586

dns.20211101.1825.m.lz4-mtbl            107,123,435   1.67       2,080,784

dns.20211101.1825.m.none-mtbl           179,084,469   1.84       1,885,218

These values are just an illustration; performance on other mtbl files (or other system configurations) will vary. 

For context, Farsight had historically used zlib for MTBL file compression, but we're moving to zstd -3 since it seems to hit the "sweet spot" when considering the combination of:

  • Compressed file size
  • Decompression time, and
  • Compression time.

III. DNSTABLE Format MTBL Files

You might be tempted to try some of the other mtbl commands, such as perhaps trying the mtbl_dump command to dump the contents of an mtbl files. At least in the case of DNSDB MTBL files (where the data is stored in DNS "wire format"), dnstable_dump is a far better option than mtbl_dump since dnstable_dump knows how to properly handle DNS "wire format" data.

You'll need to install dnstable to be able to use dnstable_dump.

dnstable requires libmtbl (which we've just installed), plus yajl and libwdns. On a Mac, you can install yajl with brew:

$ brew install yajl

We'll install libwdns from source:

$ git clone https://github.com/farsightsec/wdns.git
$ cd wdns
$ sh autogen.sh
$ ./configure
$ make
$ make check
$ sudo make install
$ cd

You should now be ready to build dnstable:

$ git clone https://github.com/farsightsec/dnstable.git
$ cd dnstable
$ sh autogen.sh
$ ./configure
$ make
$ make check
$ sudo make install
$ cd

You can then try dumping records from our sample dnstable format mtbl file by saying:

$ dnstable_dump --rrset_full dns.20211101.1825.m.mtbl | more
;;  bailiwick: sn.ac.
;;      count: 1
;; first seen: 2021-11-01 18:24:02 -0000
;;  last seen: 2021-11-01 18:24:02 -0000
sn.ac. IN A 193.223.78.230

;;  bailiwick: ac.
;;      count: 1
;; first seen: 2021-11-01 18:23:59 -0000
;;  last seen: 2021-11-01 18:23:59 -0000
sn.ac. IN NS l1.ns.divido.org.
sn.ac. IN NS l2.ns.divido.org.
[etc]

The results shown above are in presentation format. If you'd rather have JSON Lines format output, just add a dash lowercase jay option to the command:

$ dnstable_dump --rrset_full dns.20211101.1825.m.mtbl -j > temp.jsonl
$ more temp.jsonl
{"count":1,"time_first":1635791042,"time_last":1635791042,"rrname":"sn.ac.","rrtype":"A","bailiwick":"sn.ac.","rdata":["193.223.78.230"]}
{"count":1,"time_first":1635791039,"time_last":1635791039,"rrname":"sn.ac.","rrtype":"NS","bailiwick":"ac.","rdata":["l1.ns.divido.org.","l2.ns.divido.org."]}
{"count":1,"time_first":1635791042,"time_last":1635791042,"rrname":"sn.ac.","rrtype":"NS","bailiwick":"sn.ac.","rdata":["l1.ns.divido.org.","l2.ns.divido.org."]}
[etc]

You can also use the dnstable_lookup command to search MTBL files for specific entries.

You can search either a single mtbl file, or a set of mtbl files. Set either:

  • The DNSTABLE_FNAME environment variable (to search just a single file) or

  • The DNSTABLE_SETFILE environment variable (to search a fileset).

Do not attempt to set both at the same time.

To look at just a single file, such as our sample minutely file, you'd say:

$ unset DNSTABLE_SETFILE      <-- shouldn't normally be already defined, but "just in case"
$ export DNSTABLE_FNAME="dns.20211101.1825.m.mtbl"
$ dnstable_lookup rrset www.google.com
[...]
;;  bailiwick: google.com.
;;      count: 78
;; first seen: 2021-11-01 02:24:13 -0000
;;  last seen: 2021-11-01 15:18:13 -0000
www.google.com. IN AAAA 2a00:1450:4010:c02::63
www.google.com. IN AAAA 2a00:1450:4010:c02::68
www.google.com. IN AAAA 2a00:1450:4010:c02::6a
www.google.com. IN AAAA 2a00:1450:4010:c02::93

;;; Dumped 2 entries.

If you want to look at data from a set of mtbl files, first put the names of those files into a text file. For example:

$ cat fileset.txt
dns.20211111.0000.m.mtbl
dns.20211111.0001.m.mtbl
dns.20211111.0002.m.mtbl
dns.20211111.0003.m.mtbl
dns.20211111.0004.m.mtbl
dns.20211111.0005.m.mtbl
dns.20211111.0006.m.mtbl
dns.20211111.0007.m.mtbl
dns.20211111.0008.m.mtbl
dns.20211111.0009.m.mtbl

Then try:

$ unset DNSTABLE_FNAME          <-- just in case that's still defined from our earlier run
$ export DNSTABLE_SETFILE="fileset.txt"
$ dnstable_lookup rrset www.google.com
[...]
;;  bailiwick: google.com.
;;      count: 683
;; first seen: 2021-11-10 05:07:17 -0000
;;  last seen: 2021-11-10 14:44:12 -0000
www.google.com. IN A 74.125.205.99
www.google.com. IN A 74.125.205.103
www.google.com. IN A 74.125.205.104
www.google.com. IN A 74.125.205.105
www.google.com. IN A 74.125.205.106
www.google.com. IN A 74.125.205.147
[...]

See $ man dnstable_lookup for more on dnstable_lookup options, or the classic article at https://www.farsightsecurity.com/blog/txt-record/realtime-dnsdb-20151028/ for a more detailed example

IV. Converting MTBL Files to NMSG Files

nmsg format files are another type of file you may run into when working with DNSDB or the Security Information Exchange (SIE). nmsg files are described at https://www.farsightsecurity.com/blog/txt-record/intro-20150128/

Assuming you have an mtbl file, you can convert it to nmsg format using dnstable_unconvert. dnstable_unconvert is available as part of dnstable-convert, which is a separately installed package.

In addition to the libraries we've already installed, dnstable-convert requires libnmsg (see https://github.com/farsightsec/nmsg) and sie-nmsg, a plugin that's needed for libnmsg to understand SIE data (see https://github.com/farsightsec/sie-nmsg). Those libraries have dependencies of their own.

On the Mac, begin by installing the pre-requisites needed for libnmsg and sie-nmsg with brew:

$ brew install libpcap
$ brew install protobuf
$ brew install protobuf-c
$ brew install zeromq
$ brew install zlib

We assume that you've already installed wdns and yajl as described in a previous section of this handout. You should then be ready to build libnmsg:

$ git clone https://github.com/farsightsec/nmsg.git
$ cd nmsg
$ sh autogen.sh
$ ./configure
$ make
$ make check
$ sudo make install
$ cd

Now you're also ready to install the also-required sie-nmsg package:

$ git clone https://github.com/farsightsec/sie-nmsg.git
$ cd sie-nmsg
$ sh autogen.sh
$ ./configure
$ make
$ make check
$ sudo make install
$ cd

And finally, we're now ready to build dnstable-convert:

$ git clone https://github.com/farsightsec/dnstable-convert.git
$ cd dnstable-convert
$ sh autogen.sh
$ ./configure
$ make
$ sudo make install
$ cd

Once you have the dnstable-convert package installed, you could run dnstable_unconvert by saying, for example:

$ dnstable_unconvert dns.20211101.1825.m.mtbl dns.20211101.1825.m.nmsg
Reading RRSets from dns.20211101.1825.m.mtbl into nmsg file dns.20211101.1825.m.nmsg
processed 969807 RRSets in 1.82 sec, 532604 rrsets/sec

To go the "other direction," you'd use dnstable_convert.

As normally used in DNSDB, DNS data is normally split into two parts:

  • Records with DNS RRtypes and
  • Records with DNSSEC RRtypes

The two two types of records are normally saved in separate mtbl files. Because an nmsg file might have either DNS or DNSSEC RRtypes, or both, we need to nominate output filenames for both DNS and DNSSEC resource record mtbl files. If either filename isn't needed, that file will be automatically unlinked as highlighted below for this article:

$ dnstable_convert dns.20211101.1825.m.nmsg dns.20211101.1825.m.mtbl-demo \ 
dnssec.20211101.1825.m.mtbl-demo
dnstable_convert: reading input data
processed 969,807 messages, 5,542,135 DNS entries, 0 DNSSEC entries, 0 merged in 1.13 sec, 861,334 msg/sec, 4,922,246 ent/sec
dnstable_convert: writing tables
wrote 5 entries in 0.00 sec, 43,103 ent/sec [dnssec]
dnstable_convert: finished writing table [dnssec]
wrote 1,000,000 entries in 1.24 sec, 803,238 ent/sec [dns]
wrote 2,000,000 entries in 1.80 sec, 1,109,094 ent/sec [dns]
wrote 3,000,000 entries in 2.57 sec, 1,166,101 ent/sec [dns]
wrote 3,485,419 entries in 2.98 sec, 1,170,134 ent/sec [dns]
dnstable_convert: finished writing table [dns]
processed 969,807 messages, 5,542,135 DNS entries, 0 DNSSEC entries, 2,056,721 merged in 6.46 sec, 150,064 msg/sec, 857,573 ent/sec
no DNSSEC entries generated, unlinking dnssec.20211101.1825.m.mtbl-demo

V. Dumping NMSG Format Files in JSON Lines Format

The standard tool for accessing NMSG format files is nmsgtool, one of the commands you got when you built libnmsg in section IV.

Let's now try using nmsgtool to read the nmsg file we previously produced above:

$ nmsgtool -r dns.20211101.1825.m.nmsg
[45] [2021-11-10 01:28:21.314749000] [2:1 SIE dnsdedupe] [00000000] [] [] 
type: INSERTION
count: 0
time_first: 2021-11-01 18:24:02
time_last: 2021-11-01 18:24:02
bailiwick: sn.ac.
rrname: sn.ac.
rrclass: IN (1)
rrtype: A (1)
rdata: 193.223.78.230

[76] [2021-11-10 01:28:21.315025000] [2:1 SIE dnsdedupe] [00000000] [] [] 
type: INSERTION
count: 0
time_first: 2021-11-01 18:23:59
time_last: 2021-11-01 18:23:59
bailiwick: ac.
rrname: sn.ac.
rrclass: IN (1)
rrtype: NS (2)
rdata: l1.ns.divido.org.
rdata: l2.ns.divido.org.
[etc]

If we prefer JSON Lines format output, we can simply add dash capital J and a filename (sample output wrapped for display in this article):

$ nmsgtool -r dns.20211101.1825.m.nmsg -J dns.20211101.1825.m.jsonl
$ more dns.20211101.1825.m.jsonl
{"time":"2021-11-10 01:28:21.314749000","vname":"SIE","mname":"dnsdedupe",
"message":{"type":"INSERTION","count":0,"time_first":"2021-11-01 18:24:02",
"time_last":"2021-11-01 18:24:02","bailiwick":"sn.ac.","rrname":"sn.ac.",
"rrclass":"IN","rrtype":"A","rdata":["193.223.78.230"]}}
{"time":"2021-11-10 01:28:21.315025000","vname":"SIE","mname":"dnsdedupe",
"message":{"type":"INSERTION","count":0,"time_first":"2021-11-01 18:23:59",
"time_last":"2021-11-01 18:23:59","bailiwick":"ac.","rrname":"sn.ac.",
"rrclass":"IN","rrtype":"NS","rdata":["l1.ns.divido.org.","l2.ns.divido.org."]}}
[etc]

Just to "close the loop," if you've got a JSON Lines file and you want to create an nmsg file, nmsgtool can handle that conversion as well:

$ nmsgtool -j dns.20211101.1825.m.jsonl -w dns.20211101.1825.m.nmsg-2

VI. An Applied Example: Creating MTBL Files from SIE Channel 208

DNSDB data comes from a global network of sensors into the Security Information Exchange (SIE). At the SIE, observations flow through a waterfall process as shown in Figure 2:

SIE Waterfall Diagram

Figure 2. SIE Waterfall Diagram.

Normally, DNSDB is fed from Ch204 (after deduplication, bailiwick verification, and filtering), and contains all RRtypes.

However, let's assume we want to make DNSDB-like queries against the non-filtered Ch208 traffic, and just for an enumerated subset of RRtypes. We can use the tools we've just described to sketch out such an application. Actually deploying such a system would normally use different mechanisms and have many details that would need to be considered and addressed – this is just a notional/"by way of demonstration" example.

The first thing we need for this project is some data.

We'll begin by capturing a few minutes of data from Ch208 on a leased blade server at the SIE using nmsgtool.

We'll use the -t 60 -k '' options to nmsgtool to "kick out" a new output file once every sixty seconds:

$ nmsgtool -C ch208 -t 60 -k '' -w ch208

Those files will have names beginning with ch208 (since that's what we supplied with the dash w option), followed by a timestamp. For example:

$ ls -lat *.nmsg
[...] 406099601 Nov 12 00:45 ch208.20211112.0045.1636677900.001817025.nmsg
[...] 464624829 Nov 12 00:44 ch208.20211112.0044.1636677840.002312737.nmsg
[...] 434479578 Nov 12 00:43 ch208.20211112.0043.1636677780.001059399.nmsg

There may be many different resource record types ("RRtypes") in those files. To allow us to investigate what RRtypes are actually present, and to make it easy for us to filter those files, we'll begin by converting those files into JSON Lines format. Normally we'd convert those files using a little script, but since we only have three files, we'll simply say:

$ nmsgtool -r ch208.20211112.0043.1636677780.001059399.nmsg -J ch208.20211112.0043.1636677780.001059399.jsonl

$ nmsgtool -r ch208.20211112.0044.1636677840.002312737.nmsg -J ch208.20211112.0044.1636677840.002312737.jsonl

$ nmsgtool -r ch208.20211112.0045.1636677900.001817025.nmsg -J ch208.20211112.0045.1636677900.001817025.jsonl

$ wc -l *.jsonl
   2574388 ch208.20211112.0043.1636677780.001059399.jsonl
   2402763 ch208.20211112.0044.1636677840.002312737.jsonl
   2424825 ch208.20211112.0045.1636677900.001817025.jsonl

Now let's concatenate those JSON Lines files into a single combined file:

$ cat ch208.20211112.004*.jsonl > combined.jsonl

$ wc -l combined.jsonl
7401976 combined.jsonl

We can then check the RRtypes in our combined file by leveraging jq (see https://stedolan.github.io/jq/ ):

$ jq -R 'fromjson? | .message.rrtype' combined.jsonl | sort | uniq -c | sort -nr > rrtypes.txt

The jq 'fromjson? |' element ensures that we only process valid JSON (one line may have had a potentially invalid record – without that "guard" command, we see "parse error: Invalid literal at line 4977152, column 20.")

The .message.rrtype bit extracts just the RRtype field from the combined JSON Lines format records.

We then sort and count those records, and resort them in descending order by their frequency:

$ more rrtypes.txt
2044933 "A"
1602287 "CNAME"
1187361 "RRSIG"
 800411 "AAAA"
 752396 "NS"
 307531 "PTR"
 288695 "SOA"
 120628 "NSEC3"
 105082 "TXT"
  57060 "NSEC"
  45479 "DS"
  41112 "MX"
  36636 "NULL"
   5771 "DNSKEY"
   4881 "<UNKNOWN>"
   1355 "SRV"
    130 "HINFO"
    117 "WKS"
     61 "RP"
     19 "SPF"
     12 "NAPTR"
      8 "TLSA"
      6 "CAA"
      2 "SSHFP"
      1 "NSEC3PARAM"
      1 "DNAME"

We can then sum up the RRtypes we saw – the count we obtain agrees (with the exception of the one unparseable record we previously mentioned):

$ cat rrtypes.txt | awk '{print $1}' | paste -sd+ | bc
7401975

We're now ready to filter by RRtype. Let's assume we only care about "A" records, "CNAME" records, and "AAAA" records (obviously we could specify whatever subset of records we might want here):

$ egrep '"rrtype":("A"|"CNAME"|"AAAA")' combined.jsonl > combined2.jsonl

$ wc -l combined2.jsonl
4447632 combined2.jsonl      <-- significantly smaller file (just 60% of our original line count)

We'll now flop the filtered results back to nmsg format:

$ nmsgtool -j combined2.jsonl -w combined2.nmsg

And finally, we'll convert that nmsg file into a DNSTABLE format MTBL file for search purposes:

$ dnstable_convert combined2.nmsg dns.combined2.mtbl dnssec.combined2.mtbl
dnstable_convert: reading input data
processed 1,000,000 messages, 5,273,446 entries (0 DNSSEC, 0 merged) in 1.80 sec, 555,610 msg/sec, 2,929,982 ent/sec
processed 2,000,000 messages, 10,665,465 entries (0 DNSSEC, 0 merged) in 3.68 sec, 543,621 msg/sec, 2,898,987 ent/sec
processed 3,000,000 messages, 16,002,201 entries (0 DNSSEC, 0 merged) in 5.49 sec, 546,607 msg/sec, 2,915,639 ent/sec
processed 4,000,000 messages, 21,355,399 entries (0 DNSSEC, 0 merged) in 7.33 sec, 545,725 msg/sec, 2,913,547 ent/sec
processed 4,447,631 messages, 23,720,967 entries (0 DNSSEC, 0 merged) in 8.14 sec, 546,166 msg/sec, 2,912,918 ent/sec
dnstable_convert: writing tables
wrote 0 entries in 0.00 sec, 0 ent/sec [dnssec]
dnstable_convert: finished writing table [dnssec]
wrote 1,000,000 entries in 3.77 sec, 265,352 ent/sec [dns]
wrote 2,000,000 entries in 4.98 sec, 401,550 ent/sec [dns]
wrote 3,000,000 entries in 6.17 sec, 486,219 ent/sec [dns]
wrote 4,000,000 entries in 7.44 sec, 537,836 ent/sec [dns]
wrote 5,000,000 entries in 8.35 sec, 599,028 ent/sec [dns]
wrote 6,000,000 entries in 8.82 sec, 680,248 ent/sec [dns]
wrote 7,000,000 entries in 9.65 sec, 725,472 ent/sec [dns]
wrote 8,000,000 entries in 10.56 sec, 757,914 ent/sec [dns]
wrote 9,000,000 entries in 11.63 sec, 773,687 ent/sec [dns]
wrote 10,000,000 entries in 12.65 sec, 790,572 ent/sec [dns]
wrote 11,000,000 entries in 13.80 sec, 797,266 ent/sec [dns]
wrote 12,000,000 entries in 14.95 sec, 802,853 ent/sec [dns]
wrote 13,000,000 entries in 15.83 sec, 821,336 ent/sec [dns]
wrote 14,000,000 entries in 16.75 sec, 835,883 ent/sec [dns]
wrote 14,359,850 entries in 17.08 sec, 840,985 ent/sec [dns]
dnstable_convert: finished writing table [dns]
processed 4,447,631 messages, 23,720,967 entries (0 DNSSEC, 9,361,117 merged) in 53.87 sec, 82,569 msg/sec, 440,374 ent/sec
no DNSSEC entries generated, unlinking dnssec.combined2.mtbl

At this point we're ready to try doing a sample search. We've got just a single combined mtbl file, so we'll just say:

$ export DNSTABLE_FNAME="dns.combined2.mtbl"
$ dnstable_lookup rrset www.google.com
;;  bailiwick: google.com.
;;      count: 1
;; first seen: 2021-11-11 16:01:17 -0000
;;  last seen: 2021-11-11 20:41:23 -0000
www.google.com. IN A 142.250.186.164

;;  bailiwick: google.com.
;;      count: 7,651
;; first seen: 2021-11-11 08:44:46 -0000
;;  last seen: 2021-11-11 21:55:26 -0000
www.google.com. IN A 142.250.188.4
[...]

;;  bailiwick: google.com.
;;      count: 81
;; first seen: 2021-11-11 12:42:43 -0000
;;  last seen: 2021-11-11 23:04:43 -0000
www.google.com. IN AAAA 2a00:1450:4010:c0a::63
www.google.com. IN AAAA 2a00:1450:4010:c0a::67
www.google.com. IN AAAA 2a00:1450:4010:c0a::69
www.google.com. IN AAAA 2a00:1450:4010:c0a::6a

;;; Dumped 17 entries.

Some might wonder, "Why bother using dnstable_lookup given that you've got JSON Lines format data you could just search with grep instead?" There are many potential motivations for using dnstable_lookup, including:

  • Speed: Forward and reverse indexing of the data makes using dnstable_lookup much faster than just linearly searching the data.

  • Aggregation: dnstable_lookup will automatically aggregate results across multiple files in a fileset, a tremendous convenience

  • Complex Queries: dnstable_lookup supports a wide range of queries, including things like CIDR queries and IP address range queries.

  • "Pretty Printed" Datetime Stamps: dnstable_lookup allows the user to get nicely-converted human-readable output for things like datetime stamps, which might otherwise appear in raw Un*x ticks (number of seconds that have elapsed since Jan 1, 1970).

The dnstable_convert command we demonstrated in this example for Ch208 traffic will NOT work for traffic from some other SIE channels. For example, if you tried to use that command with SIE Ch202, Ch206, or Ch207, you'd see:

  • Ch202: Assertion `vid == NMSG_VENDOR_SIE_ID' failed. (Needs to use SIE/dnsdedupe schema, but doesn't).

  • Ch206: Assertion `vid == NMSG_VENDOR_SIE_ID' failed. (Needs to use SIE/dnsdedupe schema, but doesn't).

  • Ch207: Assertion `dns->has_bailiwick' failed. (Bailiwick validation hasn't been done as of Ch207)

On the other hand:

  • Ch204: Ch204 is downstream of Ch208, and works fine (like the Ch208 example we showed).

VII. Conclusion

You've now had a "whirlwind tour" of some of the file formats used by DNSDB and at the Security Information Exchange. You've learned about the tools that are available to convert files between these formats, and even saw a little example of how you can construct a custom MTBL you can query. We hope you've found this introduction to DNSDB and SIE file formats to be helpful!

Acknowledgements

Thanks to Ben April, Dan Nunes, David Waitzman and Eric Ziegast for their helpful suggestions on a draft of this article.

Any remaining issues are solely the responsibility of the author.

Updates

  • 11/22/2021 Corrected dependency ordering in Section IV and added explanation of compression objectives plus other miscellaneous updates.