Visualizing the Hard-to-Visualize: DNS and SSL/TLS Cert Existence Over Time Using R Graphics



I. Introduction

Understanding what's going on in DNS data (or in data from other potential sources) often involves time: to understand security data, we may need to understand when data was collected, when events happened, and when changes took place. That's why DNSDB supports time fencing, and why we've previously discussed analyzing DNS traffic volumes over time in previous blog articles: Finding Top FQDNs Per Day in DNSDB Export MTBL Files (Part 1), Volume-Over-Time Data From DNSDB Export MTBL Files, Part 2,Analyzing DNSDB Volume-Over-Time Time Series Data With R and ggplot2 Graphics, Part 3, and Crushing Monolithic Data Results ("Rock") Into "Gravel:" dnsdbq New -g Volume-Across-Time Option.

Today we're going to show you a couple of the approaches we've come up with for representing DNS and other security data graphically over time (complete with R/ggplot programs). By supplying R/ggplot programs, you'll at least have a starting point if you need to create similar graphs.

II. Showing The Interaction Between Bailiwicks and Nameservers

A couple of years ago, we did a blog post describing and explaining the bailiwick concept, but we think that graphics can help to explain how domain owners can use different nameservers to "call an audible" (change nameservers on the fly).

For example, consider internal−message[dot]app, a domain mentioned in a recent article by Brian Krebs. [Note that this domain has been "defanged" here and elsewhere in the article by replacing the normal period with [dot]. We've done this to avoid accidental visits to a domain that has been identified by Krebs as malicious.]

internal−message[dot]app has different name servers defined at the TLD (e.g., at "app") than are defined at the 2LD (e.g., at "internal-message[dot]app"). We can see this with dnsdbq:

$ dnsdbq -r
;; record times: 2019-02-13 08:28:01 .. 2019-09-24 03:12:08
;; count: 3313; bailiwick: app.  NS  NS

;; record times: 2019-02-13 10:10:26 .. 2019-04-15 18:05:11
;; count: 129; bailiwick:  NS  NS

But how can we show this behavior visually? One option for more general time graphs in R might be Vistime, but we could never get Vistime to present exactly the sort of graphs we were after, even though it looked initially promising.

After becoming frustrated trying to get what we wanted from Vistime, we ended up using ggplot to produce the following graph (note that the "ending" date for the second line reflects the time when this chart was created for this article, rather than an end-of-life time for the domain itself):

We built this graph using R with ggplot, based on JSON-format output extracted from dnsdbq. [Note: If you want to recreate this graph but don't currently have R and ggplot installed, we described how to install them in this blog article].

The data for that graph was generated with:

$ dnsdbq -r -j >

We then ran the code (as shown in Appendix I) by saying:

$ ./sample-ggplot.R

The graphic created by that run is what's shown above.  

III. Graphing Sample Non-DNS Data – SSL/TLS Certificate Dates from Certificate Transparency Logs

A second example of graphing security data over time can be seen around SSL/TLS certificates. Each certificate includes a starting ("not valid before") date and an ending ("not valid after") date. These dates can be found by checking certificate transparency sites, such as Censys.

For example, looking at some of the certificates issued for FQDNs under internal-message[dot]app, we see:

But how do those certs relate to each other? Are there really multiple brands being targeted at the same time via this domain? If we graph those dates, we can check. Again, R with ggplot is our friend. Running the code shown in Appendix II, we get a graph that looks like:

Obviously, yes, all of these campaign-related domains were actually up with valid SSL/TLS certs during the same period of time.

IV. Combining DNSDB and Certificate Data On A Single Graph

We've previously shown examples of just graphing passive DNS data, or just certificate data. Sometimes, however, we may want to combine data from diverse sources. For example, we know that a domain must exist before a certificate can be created for it, but which "goes away" or "ceases to be seen" first, the cert or the domain? Let's create a little example to show this.

Our data file was constructed by combining data from DNSDB with data from certificate transparency logs. While our first example read JSON Lines output, and our second example read a CSV spreadsheet file, this example reads a flat file with times in Unix "ticks." When building that file, we converted "human times" to Unix tick time with commands similar to:

$ date -jf "%Y-%m-%d %H:%M:%S" "2018-01-02 09:36:54" +%s

Note that the date/time conversion shown above works on/was tested on a Mac; other Unix-based operating systems may require different options to the date command in order to perform this same conversion. In any event, our simplified sample data file for this run looks like the following:

*.papajohns-secure-login[dot]com dnsdb_time_first_seen 1550494554
*.papajohns-secure-login[dot]com cert_not_valid_before 1550519598
*.papajohns-secure-login[dot]com cert_not_valid_after  1558291998
*.papajohns-secure-login[dot]com dnsdb_time_last_seen  1569434017

Note that to make it easy to see what's going on, our data consolidates all domain names for this delegation point into a single wildcard, and we show just a single certificate. Also note that the X axis is NOT to scale, unlike our previous examples. The R code we used to process this data, originally based on the example shown here, can be seen in Appendix III.

Our output from this run (sent to the screen rather than to a file this time) looks like:

V. Conclusion

You've now seen a few examples of how you can use various plotting routines to graph security-related data. Visualizing data over time can help make it clear how one data stream relates to another, whether we're talking about nameserver differences by bailiwick, overlapping SSL/TLS certificates, or a combination of data from different sources.


These graphs were developed by Farsight as part of a project with our partner Anomali. Farsight would like to thank our colleagues at Anomali, particularly Paul Sheck and Parthiban Rajendran, for their contributions.

Appendix I. sample-ggplot.R
$ cat sample-ggplot.R

# printed output goes to a file
sink ("sample-plot.output")

# PDF graphic output goes to a different file
pdf("sample-plot.pdf", width = 10, height = 4.5)

# read our JSONL output from dnsdbq
f = stream_in(file(description = "./", open = "r"))

# get our starting and ending dates set -- note that these are "magic" names
f$start <- anytime(f$time_first)
f$end <- anytime(f$time_last)

# tidy/sanitize the output
f$rdata <- gsub("c(", "", f$rdata, fixed="TRUE")
f$rdata <- gsub(")", "", f$rdata, fixed="TRUE")
f$rdata <- gsub('"', "", f$rdata, fixed="TRUE")

# defang the annotation to prevent accidental visits
f$rdata <- gsub(".app.", "[dot]app", f$rdata, fixed="TRUE")
f$rdata <- gsub(".com.", "[dot]com", f$rdata, fixed="TRUE")

f$bailiwick <- gsub("app.", "app", f$bailiwick, fixed="TRUE")
f$bailiwick <- gsub(".app", "[dot]app", f$bailiwick, fixed="TRUE")

plot(chart <- ggplot(f, aes(start, bailiwick)) +
   labs(title = "\ninternal-message[dot]app shows differences in nameservers defined by bailiwick\n\nThe 2LD explicitly overrides its own nameservers while the domain is being misused (immediately\nafter creation); the nameservers known to the TLD remain unchanged throughout the domains's life.\n") +

   geom_point(f, mapping = aes(start, bailiwick)) +
   geom_segment(aes(xend = end, yend = bailiwick), size=1.0) +
   geom_point(f, mapping = aes(end, bailiwick)) +

   theme(legend.title = element_blank()) +
   theme(legend.position = "none") +

   scale_x_datetime(name = waiver(), breaks = waiver(),
   date_breaks = "1 month", date_labels = "%e %b %Y",
   date_minor_breaks = "1 day", timezone = "UTC", position = "bottom") +
   geom_label(aes(label = rdata), data = f, alpha = 1.0, hjust=-0.01,
   vjust=-0.5) +
   xlab("\n\nstart/end dates\n"))

Appendix II. sample-ggplot-2.R

# printed output goes to a file
sink ("sample-plot-2.output")

# PDF graphic output goes to a different file
pdf("sample-plot-2.pdf", width = 10, height = 7.5)

f = read.csv("", header=TRUE, sep=",",

# fix the dates (read in as strings with two digit years (command spans 2 lines))
f$start <- as.Date(f$not_valid_before, format="%m/%d/%Y %H:%M") %>%  format("20%y/%m/%d") %>% as.Date("%Y/%m/%d")

f$start <- as.POSIXct(f$start)

f$end <- as.Date(f$not_valid_after,  format="%m/%d/%Y %H:%M") %>% format("20%y/%m/%d") %>% as.Date("%Y/%m/%d")

f$end <- as.POSIXct(f$end)

# defang the annotation to prevent accidental visits
f$domain <- gsub(".app.", "[dot]app", f$domain, fixed="TRUE")

plot(chart <- ggplot(f, aes(start, domain)) +
   labs(title = "\ninternal-message[dot]app FQDNs\n\nSSL/TLS cert not_valid_before and not_valid_after dates\n") +

   geom_point(f, mapping = aes(start, domain)) +
   geom_segment(aes(xend = end, yend = domain)) +
   geom_point(f, mapping = aes(end, domain)) +

   theme(legend.title = element_blank()) +
   theme(legend.position = "none") +

   scale_x_datetime(date_breaks = "1 month", date_labels = "%e %b %Y",
   date_minor_breaks = "1 day", timezone = "UTC", position = "bottom") +
   xlab("\n\nstart/end dates\n") +

   # note that the discrete labels need to be manually ordered to get the
   # specific ordering shown in the graph
   scale_y_discrete(limits = c(
   "[dot]app #2",
   "[dot]app #1",
   "[dot]app #2",
   "[dot]app #1",

Appendix III.

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
from datetime import datetime

with open("", "r") as f:
    data = f.readlines()

domains = []
labels = []
dates = []
ip_dates = []

for line in data:
   words = line.split()



   mydatetime = datetime.utcfromtimestamp(int(words[2])).strftime('%Y-%m-%d %TZ')


# ----------------------------------------------------------------------------

# tweaked for the data to ensure labels don't overlap
levels = np.tile([1, -1, -1, 1], int(np.ceil(len(dates)/7)))[:len(dates)]

# Create figure and plot a stem plot with the date
fig, ax = plt.subplots(figsize=(11, 5.5))
ax.set(title="*.papajohns-secure-login[dot]com timeline\n")

markerline, stemline, baseline = ax.stem(dates, levels, linefmt="C3-", 
    basefmt="k-", use_line_collection=True)

plt.setp(markerline, mec="k", mfc="w", zorder=2.5)

# Shift the markers to the baseline by replacing the y-data by zeros.

# annotate lines
vert = np.array(['top', 'bottom'])[(levels > 0).astype(int)]
for d, l, r, va in zip(dates, levels, ip_dates, vert):
    ax.annotate(r, xy=(d, l), xytext=(-3, np.sign(l)*3),
               textcoords="offset points", va=va, ha="center")

plt.setp(ax.get_xticklabels(), rotation=30, ha="center")

# remove y axis and spines

for spine in ["left", "top", "right"]:

ax.margins(x=0.2, y=0.2)

Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security®, Inc..