What's a UUID?

By

RSS

I. Introduction

When working with DNSDB or Security Information Exchange (SIE) DNS-related channels, you may occasionally see domain name labels with a very distinctive "dash-separated pattern" such as the following:

    8a1c7f6a-ac5e-4898-af1e-2654d0fa8e45.probe.performance.dropbox.com.
    c34b98c1-02d5-4020-a6e0-c89af9a9b56e.sync.upravel.com.
    0fe9da84-55ab-48fc-847e-4da8807419ee.mitdmp.whiteboxdigital.ru.
    bec89e42-fe87-44a0-b53d-55b8bc7b7a7a.notifications.api.brightspace.com.
    9565a982-6467-4d43-94ac-a5094ad877cc.us.u.fastly-insights.com.
    f5489548-9f97-4a48-b22b-2f03aec465aa.edge1.pingone.com.
    eaf4f4b1-65fa-5480-5013-05ab140f8498.z1.dca0.com.

That is, the bolded portion of each of those names all follow the pattern:

  • Eight hexadecimal digits followed by a dash
  • Four hexadecimal digits followed by a dash (repeated two additional times)
  • Twelve hexadecimal digits

These names are almost certainly "Universally Unique Identifiers," or "UUIDs".

There are RFC4122 and non-RFC4122 UUIDs. The four main types of RFC4122 UUIDs are:

  • Type 1 UUIDs (a value derived from the host's hardware address and the current time)

  • Type 3 UUIDs (an MD5-hashed version of a namespace identifier and a corresponding namestring (such as a domain name or URL or ISO OID or X.500 DN))

  • Type 4 UUIDs (essentially a pseudo-random value), and

  • Type 5 UUIDs (a SHA1-hashed version of a namespace identifier and a corresponding namestring (such as a domain name or URL or OID or X.500 DN)).

II. RFC4122 Type 4 UUIDs

Let's experiment a little with some UUIDs using Python3's uuid library.

We'll begin by starting an interactive Python3 shell, and importing the Python uuid library:

$ python3
>>> import uuid

We can then try creating some RFC4122-format UUIDs. For example, if you ever need a unique random identifier (perhaps to use as a transaction identifier or as part of a temporary filename), RFC4122 Type 4 UUIDs may be just the ticket. Each Type 4 UUID you create will be unique:

>>> uuid.uuid4()
UUID('2ab6bdda-fd0b-4d6c-b886-b7c388cca8c7')

>>> uuid.uuid4()
UUID('64847737-f0b4-401c-932b-452af79e3264')

>>> uuid.uuid4()
UUID('b06c22b6-2124-4702-a450-714f42898422')

As we will later see in actual traffic, RFC4122 Type 4 UUIDs tend to be the most commonly seen.

III. RFC4122 Type 1 UUIDs

Let's now look at a type of UUID that's a little more complex, the RFC4122 Type 1 UUID. We'll get a Type 1 UUID and assign it to a variable called myuuid (bolding added in the following output by me):

>>> myuuid=uuid.uuid1()
>>> myuuid
UUID('c8e85954-bdc8-11eb-a376-a0369f710741')

If we "re-request" that Type 1 UUID, we can see that the last (bolded) field of the Type 1 UUID doesn't change:

>>> myuuid=uuid.uuid1()
>>> myuuid
UUID('f8b26182-bdca-11eb-9022-a0369f710741')

If we compare the last field of that UUID to the system's hardware Ethernet ("MAC") address, retrieved using

$ ifconfig -a

or

$ ip a 

we can see the last part of the Type 1 UUID is set to the default interface's hardware address (albeit without the colon formatting that's normally part of a hardware Ethernet address):

link/ether a0:36:9f:71:07:41

If we wanted to, we could programmatically verify that the Type 1 UUID is an RFC4122 Type 1 UUID (and get the host's Ethernet MAC address) by saying:

>>> myuuid.variant
'specified in RFC 4122'
>>> myuuid.version
1
>>> hex(myuuid.fields[5])
'0xa0369f710741'

The above confirms that the (last part of a) Type 1 UUID has the potential to act as a persistent host identifier (although beware of the impact of things like Ethernet adapter replacement).

IV. RFC4122 Type 3 and RFC4122 Type 5 UUIDs

Now let's consider the RFC4122 Type 3 (MD5) and Type 5 (SHA-1) UUIDs. These are hashed values produced from a namespace identifier and a corresponding namestring (such as a domain name, URL, ISO OID, or X.500 DN). The two hashing functions (MD5 vs SHA1) yield different values, but the principle's the same.

Note that neither MD5 nor SHA-1 are considered particularly cryptographically robust against well-funded opponents these days, but both are pragmatically adequate for the purposes for which most UUIDs get used.

Let's create a Type 5 (SHA-1) UUIDs as an example. Part of making a Type 5 UUID is specifying a namespace (the namespace declares the type of namestring being processed). In the notation of the Python UUID library we're using, those name spaces are defined as:

uuid.NAMESPACE_DNS	         Domain name
uuid.NAMESPACE_URL	    	 Web URL
uuid.NAMESPACE_OID    		 ISO OID
uuid.NAMESPACE_X500  	 	 X500 Distinguished Name (DN)

For example, to build a Type 5 UUID (SHA-1) for the domain name www.farsightsecurity.com, we'd say:

 uuid.uuid5(uuid.NAMESPACE_DNS, 'www.farsightsecurity.com')
 UUID('c8b275e4-2990-5eef-af17-a96026a19f71')

If we re-run that Type 5 UUID call for www.farsightsecurity.com, note that we get the same UUID result:

 uuid.uuid5(uuid.NAMESPACE_DNS, 'www.farsightsecurity.com')
 UUID('c8b275e4-2990-5eef-af17-a96026a19f71')	

A natural question when thinking about a Type 5 UUID is, "Can I 'decode' a UUID in order to extract the domain name that was used to construct that UUID?" The answer is that no, you can't, at least not algorithmically.

However, data-driven approaches should also be considered. Remember that if you re-run a Type 5 UUID, it will yield the same result every time it is run with the same input. Thus, conceptually, you could build a "dictionary" mapping billions of known namestrings to their corresponding Type 5 UUIDs, and then use that table of "precomputed" Type 5 UUIDs to "lookup" the corresponding "encoded" namestring.

For that reason, please be careful when it comes to assuming that Type 5 UUIDs are "strictly one way" or are "absolutely non-reversible" – that may not always be true, particularly if the set of potential namestrings is known and finite.

V. UUIDs in Security Information Exchange (SIE) Channels

Farsight SIE channels are used to share near-real time security traffic, including DNS traffic. For example, SIE Channel 204 contains "Data from Farsight’s global sensor array that has been deduplicated, filtered and verified." (A list of available channels can be found here.

We pulled ten million Channel 204 records from a leased SIE blade server using nmsgtool, saving just the RRnames that have the appropriate pattern.

This took less than ten minutes, and yielded 14,348 records (0.143% of the total observations we pulled):

$ time nmsgtool -C ch204 -c 10000000 -J - | jq -r '.message.rrname' | \
egrep '[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12}' > \
uuid-pattern-rrnames.txt
real	9m21.825s
user	6m7.292s
sys	0m37.948s

$ wc -l uuid-pattern-rrnames.txt 
12351 uuid-pattern-rrnames.txt

We then visually confirmed that those names were of the expected UUID-containing pattern:

$ more uuid-pattern-rrnames.txt
01ef8657-5088-4e2d-a90c-f135796b4abc-pdata-v4.unique.k.fastly-insights.com.
aa6a8b17-18e2-4cec-a6d3-b9526ef30673-pdata-v4.unique.k.fastly-insights.com.
99de0c1d-c56c-4942-ac6e-3d924687a038-pdata-v4.unique.k.fastly-insights.com.
[...]
210a718b-10f8-425d-9918-d399a97e6a67-pdata-v4.unique.k.fastly-insights.com.

Top effective 2nd-level domains that were seen in 50 or more RRnames looked like:

$ 2nd-level-dom < uuid-pattern-rrnames.txt | sort | uniq -c | sort -nr
   6141 fastly-insights.com
   1088 bugfender.net
    401 permutive.app
    348 aadrm.com
    303 wix-code.com
    272 verkada-lan.com
    231 mts.ru
    168 seondnsresolve.com
    158 elliq.co
    153 filesusr.com
    139 msappproxy.net
    135 azure.com
    124 mysimplestore.com
     99 godaddy.com
     91 lever.co
     88 rlets.com
     85 thirdptop.com
     85 cloudflareresolve.com
     79 pipedrive.email
     60 petzila.com
     53 brightspace.com
     [etc]

There were at least few comment-worthy things to note:

  • 451 domain names from remotewd.com had UUID-format names. Given that, you might wonder why remotewd.com does not appear in the list of top-effective 2nd-level domains show above. To understand that behavior, note that remotewd.com appears in the PSL (Public Suffix List) and thus is an effective top-level domain in its own right. This means that those 451 remotewd.com domains are all tallied separately rather than being aggregated into a single effective 2nd-level-domain large enough to get listed in the top table.

  • Some UUIDs had been "customized" or "encapsulated" by having text pre- or post-pended to the UUID. For example, we saw unusual-looking domains that looked like the following:

    dropthishost-c9ca947d-3c30-48db-89a2-33f6a36a67a0.biz.
    dropthishost-09a9fadf-de97-41e7-8dea-fae21b069c4a.biz.
    dropthishost-01c4781b-3816-4081-b1c9-3b7e9df16d9a.biz.
    dropthishost-01c4781b-3816-4081-b1c9-3b7e9df16d9a.biz.
    dropthishost-426903b8-79aa-4e10-b80e-0fc520637ac7.biz.
    dropthishost-a24631d1-6cdc-4eb9-a099-76c4a3fb0d38.biz.
    dropthishost-21a416d3-dd34-4c4b-a2cd-fdc484cc21b6.biz.
    dropthishost-e57a0268-a3d3-449d-b424-98ed9cb07d29.biz.
    dropthishost-e57a0268-a3d3-449d-b424-98ed9cb07d29.biz.
    dropthishost-9ccc43ef-27f0-4717-9767-3dcf5453a8cd.biz.
    dropthishost-81b53cb7-6640-4ea9-ba71-d28f39a069cc.biz.
    dropthishost-dbf7973f-9dc1-46ab-bc10-3b3a29695906.biz.
    dropthishost-a9f2037f-e43b-4640-8993-db18303e93a0.biz.
    dropthishost-a08e8b8d-2efa-4ee4-baa0-76fc107c370a.biz.
    [etc]

All of the domains of this sort that we checked in DNSDB appear to point at 216.194.64.193

What do we see if we dig into the UUIDs themselves? Are they all of a particular type, for example?

We'll extract just the the apparent UUIDs from the file by saying:

$ grep -oP '([0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12})' \
uuid-pattern-rrnames.txt > just-uuids.txt

We'll then check the status of each of those UUID with a little Python3 script:

#!/usr/bin/python3
import sys
import uuid

myuuid=''
myuuid=uuid.UUID(sys.argv[1])

myvariant=''
myvariant=myuuid.variant
myvariant2=myvariant.replace("specified in RFC 4122","4122")

myversion=''
myversion=myuuid.version

print(myuuid,",",myvariant2,",",myversion)

For example:

$ ./checkit.py 3ee99252-8700-414d-a1ef-e3d4e77a63f3
3ee99252-8700-414d-a1ef-e3d4e77a63f3 , 4122 , 4

Considering all 12,351 UUIDs in our sample, 12,292 (99.5% of our sample) were RFC4122 Type 4 "random" UUIDs (the remaining 0.5% consisted of sundry other UUID type). Clearly "random" UUIDs dominate the UUID usage we observed.

VI. Conclusion

We hope you've found this little introduction to UUIDs to be of some interest.

We encourage you to explore all the interesting and unusual phenomena waiting to be discovered in Farsight's Security Information Exchange and DNSDB!

Joe St Sauver, Ph.D. is a Distinguished Scientist and Director of Research for Farsight Security, Inc..