Using The SIE Batch API to Find Matching Names in Newly Observed Domains (NOD)



1. Introduction

In previous articles, we showed you how to use SIE Batch to pull data for select Security Information Exchange channels, both via an interactive point-and-click web page and via the SIE Batch API.

In this article, we're going to show how you can use sie_get_rb (described here) as a building block for a little bash script to find domains of interest in Channel 212, our Newly Observed Domains (NOD) channel. We call this little example sie_batch_match.

What Newly Observed Domains might you be interested in?

  • A brand owner might want to watch NOD for their brand names or trademarks to see if a third party is using them as part of a unauthorized "knock-off" site

  • IT or security professionals focused on phishing attacks might want to watch for the name of a bank they're protecting

  • Those fighting Covid-19 scams, misinformation, and profiteering might want to watch for (and review) domains with names such as "covid", "corona", "pandemic", "ventilator", "n95" or "sanitizer"

  • Political campaigns might want to watch for domains related to contests, candidates, or issues/topics they're supporting (or opposing).

Ultimately, what you want to watch for is really up to you.

2. The Little Bash Scripts

In a nutshell, our scripts will:

  • Pull a one-minute batch of data from the Newly Observed Domains Channel using sie_get_rb

  • Extract just the RRname field from those records using jq (the RRname is the "left hand side" of DNS resource records)

  • Use grep to search the names for domains that contain one of a number of substrings of interest

Other program design choices:

  • Because this is intended to be a relatively stable/persistent watcher tool, we'll simply use a text editor to put the patterns of interest into a file called ./strings-to-match.txt

  • We'll archive the files we download into a directory called ~/processed-jsonl-files/ Eventually we'll need to do housekeeping on those files (compressing or deleting them, etc), but since this is just a proof of concept, we won't go into those detail today.

  • We'll send the matches we find to stdout (we can always redirect stdout to a file or whatever should we need to do so)

  • We'll also routinely confirm that the needed programs and files are all available when we run the script.

The script itself is short:

$ cat sie_batch_match.bash

# make sure we have sie_get_rb installed
command -v sie_get_rb >/dev/null 2>&1 || \
{ echo >&2 "We use sie_get_rb but it's not installed. Correct and rerun."; exit 1; }

# make sure we have jq installed
command -v jq >/dev/null 2>&1 || \
{ echo >&2 "We use jq but it's not installed. Install and rerun."; exit 1; }

# make sure we have a file of strings to match
if [ ! ./strings-to-match.txt ]; then
   echo "Need ./strings-to-match.txt Create that file then rerun"

# make sure we have the directory we need to process the jsonl-format files
if [ ! -d ./.process-sie-get-jsonl-files ]; then
  mkdir -p ./.process-sie-get-jsonl-files

# make sure we have the directory we'll use to save the processed jsonl files
if [ ! -d ./processed-jsonl-files ]; then
  mkdir -p ./processed-jsonl-files

# grab data for a minute for a jsonl channel (in this case ch212)
sie_get_rb 212 now 1

# move the resulting data to the file for processing
mv -f sie-*.jsonl ./.process-sie-get-jsonl-files

# find and display matches
jq -r '.message.rrname' ./.process-sie-get-jsonl-files/sie-*\.jsonl | \
grep --ignore-case --no-filename --color --file ./strings-to-match.txt 

mv -f ./.process-sie-get-jsonl-files/* ./processed-jsonl-files/.

Having created that script, you could then run it from cron once a minute, or you could invoke it interactively with a 2nd little runit bash script such as:

$ cat runit.bash
while true
  echo -n "TIME STAMP: "
  date -u
  sleep 60

Ensure both those files are executable:

$ chmod a+rx sie_batch_match.bash
$ chmod a+rx runit.bash

3. Sample Run

So as a test, we ran with keywords that looked like:

$ cat strings-to-match.txt 

We saw output that looked like the following (since these are new domains of unknown provenance, we've replacing one dot in each of these domain names with [dot] for display here):

$ ./runit.bash
TIME STAMP: Fri Mar 27 15:33:40 UTC 2020
TIME STAMP: Fri Mar 27 15:34:43 UTC 2020
TIME STAMP: Fri Mar 27 15:35:45 UTC 2020

Are those domains good? Are those domains bad? That's not something Farsight evaluates — after all, we might not all see a given domain the same way. We only tell you that these are objectively new domains we've just seen for the first time on one of our sensors.

After that, it's up to you (or the domain reputation vendor of your choice) to carefully dig into the "goodness" or "badness" of that name (should you desire to do so).

4. Other Enhancements/Changes?

a) Different Cadence?

Currently we're pulling a new batch of data every minute. In many cases, a more relaxed retrieval schedule might be fine (e.g., perhaps pull an hour's worth of data every 3600 seconds.)

To do so, you'd change the duration in the sie_get_rb call in sie_batch_match.bash and update the sleep duration in the runit.bash script (note that one of those is in minutes, and the other is in seconds).

b) Different Matcher?

Currently we use grep to do a very straightforward match against the SIE Batch files we download, but obviously you could easily modify the script to use a different matcher of your choice. For example, you could replace grep with agrep ("approximate GREP for fast fuzzy string searching"), see

c) Reporting More Than Just The RRname?

Or you might want to report more than just the RRnames. Perhaps you also want to output the record type and Rdata for each record?

That's an easy thing to change in the script's jq command. Replace:

jq -r '.message.rrname' ./.process-sie-get-jsonl-files/sie-*\.jsonl


jq -r '"\(.message.rrname) \(.message.rrtype) \(.message.rdata)"' ./.process-sie-get-jsonl-files/sie-*\.jsonl

and then when you run you'll see output that looks like (once again sanitized for display here, since this is a brand new domain):

TIME STAMP: Fri Mar 27 17:12:32 UTC 2020
coronatestkit[dot]es. NS ["docks19.rzone[dot]de.","shades01.rzone[dot]de."]

d) Watching A Different Channel?

You might want to use sie_batch_match to watch an SIE Batch channel other than Channel 212.

You can obviously modify the sample script to do so, but if you do, please be alert to the fact that some of the channels available via SIE are in NMSG format (rather than JSON Lines format).

To convert an NMSG format file to JSON Lines format, you'd use nmsgtool like so:

$ nmsgtool -r sie-ch204-{2020-02-13@20:54:00}-{2020-02-13@20:55:00}.nmsg -J –

Q. "But I don't have nmsgtool…"

A. If you’re using Debian Linux, you can install nmsgtool as a package, see

Source code is also available for those who prefer to build from source, or for use on systems other than Debian Linux. See

Build instructions for nmsgtool for the Mac in particular can be found in Appendix I of our recent whitepaper

5. Conclusion

We hope you've found this an intriguing little example of how you might practically use the SIE Batch API.

To arrange access to SIE, please contact Farsight Security Sales at or give them a call at +1-650-489-7919. Be sure to mention that you want to try SIE Batch.

Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security®, Inc.