Visualizing the Hard-to-Visualize: DNS and SSL/TLS Cert Existence Over Time Using R Graphics
By Joe St Sauver
Understanding what's going on in DNS data (or in data from other potential sources) often involves time: to understand security data, we may need to understand when data was collected, when events happened, and when changes took place. That's why DNSDB supports time fencing, and why we've previously discussed analyzing DNS traffic volumes over time in previous blog articles: Finding Top FQDNs Per Day in DNSDB Export MTBL Files (Part 1), Volume-Over-Time Data From DNSDB Export MTBL Files, Part 2,Analyzing DNSDB Volume-Over-Time Time Series Data With R and ggplot2 Graphics, Part 3, and Crushing Monolithic Data Results ("Rock") Into "Gravel:" dnsdbq New -g Volume-Across-Time Option.
Today we're going to show you a couple of the approaches we've come up with for representing DNS and other security data graphically over time (complete with
R/ggplot programs). By supplying
R/ggplot programs, you'll at least have a starting point if you need to create similar graphs.
II. Showing The Interaction Between Bailiwicks and Nameservers
A couple of years ago, we did a blog post describing and explaining the bailiwick concept, but we think that graphics can help to explain how domain owners can use different nameservers to "call an audible" (change nameservers on the fly).
For example, consider internal−message[dot]app, a domain mentioned in a recent article by Brian Krebs. [Note that this domain has been "defanged" here and elsewhere in the article by replacing the normal period with [dot]. We've done this to avoid accidental visits to a domain that has been identified by Krebs as malicious.]
internal−message[dot]app has different name servers defined at the TLD (e.g., at "app") than are defined at the 2LD (e.g., at "internal-message[dot]app"). We can see this with
$ dnsdbq -r internal-message.app/NS [snip] ;; record times: 2019-02-13 08:28:01 .. 2019-09-24 03:12:08 ;; count: 3313; bailiwick: app. internal-message.app. NS ns1.internal-message.app. internal-message.app. NS ns2.internal-message.app. ;; record times: 2019-02-13 10:10:26 .. 2019-04-15 18:05:11 ;; count: 129; bailiwick: internal-message.app. internal-message.app. NS ns1.clientshostname.com. internal-message.app. NS ns2.clientshostname.com.
But how can we show this behavior visually? One option for more general time graphs in
R might be Vistime, but we could never get Vistime to present exactly the sort of graphs we were after, even though it looked initially promising.
After becoming frustrated trying to get what we wanted from Vistime, we ended up using
ggplot to produce the following graph (note that the "ending" date for the second line reflects the time when this chart was created for this article, rather than an end-of-life time for the domain itself):
We built this graph using
ggplot, based on JSON-format output extracted from
dnsdbq. [Note: If you want to recreate this graph but don't currently have
ggplot installed, we described how to install them in this blog article].
The data for that graph was generated with:
$ dnsdbq -r internal-message.app/NS -j > internal-message.app-3.jsonl
We then ran the code (as shown in Appendix I) by saying:
The graphic created by that run is what's shown above.
III. Graphing Sample Non-DNS Data – SSL/TLS Certificate Dates from Certificate Transparency Logs
A second example of graphing security data over time can be seen around SSL/TLS certificates. Each certificate includes a starting ("not valid before") date and an ending ("not valid after") date. These dates can be found by checking certificate transparency sites, such as Censys.
For example, looking at some of the certificates issued for FQDNs under internal-message[dot]app, we see:
But how do those certs relate to each other? Are there really multiple brands being targeted at the same time via this domain? If we graph those dates, we can check. Again,
ggplot is our friend. Running the code shown in Appendix II, we get a graph that looks like:
Obviously, yes, all of these campaign-related domains were actually up with valid SSL/TLS certs during the same period of time.
IV. Combining DNSDB and Certificate Data On A Single Graph
We've previously shown examples of just graphing passive DNS data, or just certificate data. Sometimes, however, we may want to combine data from diverse sources. For example, we know that a domain must exist before a certificate can be created for it, but which "goes away" or "ceases to be seen" first, the cert or the domain? Let's create a little example to show this.
Our data file was constructed by combining data from DNSDB with data from certificate transparency logs. While our first example read JSON Lines output, and our second example read a CSV spreadsheet file, this example reads a flat file with times in Unix "ticks." When building that file, we converted "human times" to Unix tick time with commands similar to:
date -jf "%Y-%m-%d %H:%M:%S" "2018-01-02 09:36:54" +%s
Note that the date/time conversion shown above works on/was tested on a Mac; other Unix-based operating systems may require different options to the date command in order to perform this same conversion. In any event, our simplified sample data file for this run looks like the following:
*.papajohns-secure-login[dot]com dnsdb_time_first_seen 1550494554 *.papajohns-secure-login[dot]com cert_not_valid_before 1550519598 *.papajohns-secure-login[dot]com cert_not_valid_after 1558291998 *.papajohns-secure-login[dot]com dnsdb_time_last_seen 1569434017
Note that to make it easy to see what's going on, our data consolidates all domain names for this delegation point into a single wildcard, and we show just a single certificate. Also note that the X axis is NOT to scale, unlike our previous examples. The
R code we used to process this data, originally based on the example shown here, can be seen in Appendix III.
Our output from this run (sent to the screen rather than to a file this time) looks like:
You've now seen a few examples of how you can use various plotting routines to graph security-related data. Visualizing data over time can help make it clear how one data stream relates to another, whether we're talking about nameserver differences by bailiwick, overlapping SSL/TLS certificates, or a combination of data from different sources.
These graphs were developed by Farsight as part of a project with our partner Anomali. Farsight would like to thank our colleagues at Anomali, particularly Paul Sheck and Parthiban Rajendran, for their contributions.
Appendix I. sample-ggplot.R
Appendix II. sample-ggplot-2.R
Appendix III. timeline-test-3.py
Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security®, Inc..