Coronavirus (COVID-19) Information Read here

← Farsight Blog

What is Globbing?

By

RSS

I. Introduction

Today Farsight Security anounced DNSDB 2.0 Flexible Search for DNSDB API. Flexible Search offers powerful new search capabilities that enhance DNSDB API, and which make it possible to easily do the DNSDB searches you've always wished you could make.

Early Adopter Access is available on August 19th, 2020 and General Availability on October 20th, 2020. If you're interested in applying for Early Adopter Access, please contact support@farsightsecurity.com.

Flexible Search will be bundled at no charge for paid DNSDB API customers (and customers given access to DNSDB API under a grant from Farsight), but will NOT be included as part of DNSDB Community Edition, the free, entry-level version of our flagship solution.

II. Search Syntax Options

Flexible Search is a "finding aid" that supplements and enhances (but does not replace) Standard DNSDB API.

Flexible Search offers users three search syntax modes in DNSDB Scout, and two otherwise.

  • Keyword Search: easily search for a brand name or domain name – just type in a word or string of characters to match.

    Keyword Search is meant to provide an easy starting point for novice searchers. [Available In DNSDB Scout Web Edition Only]

  • Regular Expressions: Regular expressions are the industry-standard way of expressing patterns. Regular expressions support simple keyword searches, but also gives you the most power when you want or need to begin making more-complex pattern searches.

    To learn a little about regular expressions in general now, before Flexible Search is actually available to you, see the companion article "What's a Regular Expression?" (or the comparative table below in Section VI of this article).

  • Globbing: Globbing is the other pattern matching option that will be available in DNSDB 2.0. We offer globbing as an option for those who may prefer it, but please note that it is simultaneously:

    o More syntactically complex when it comes to doing basic keyword searches, and

    o Much more limited when it comes to supporting non-trivial pattern matching.

Nonetheless, in this article we are going to take a closer look at globbing since it is an available option for those who may prefer it (most users who aren't familiar with either regular expressions or globbing should focus on learning to work with regular expressions).

III. Globbing's Very Limited "Bag of Tricks"

Let's explain the very limited pattern matching "tricks" that globbing supports. In general, Farsight's glob implementation follows standard Un*x glob(7) semantics, but not what's sometimes referred to as "extended globbing." This means that recognized characters are:

  • Question Mark Character. If you enter a question mark character (?), that special globbing character stands for "match exactly any one character here."

  • Asterisk Character. The asterisk (*) character is the globbing wildcard. It means "match any zero or more characters here."

  • Square Bracket Characters. The square bracket characters ([]) are used to "bookend" a range of characters, any of which will match at that location.

    We discuss square bracket character range intricacies further at Section VI

  • A Backslash Followed By a Special Character (question mark character, asterisk character, left square bracket character or backslash character):

    Normally special characters do special things, as described above in this section.

    However, when preceded with a backslash character, the backslash "escapes" the following special character, meaning that that character then gets "taken literally as what it looks like, rather than being empowered with special magical matching capabilities.

  • Everything Else. Any other characters you enter as part of a globbing pattern get matched exactly as written (except that characters are NOT case sensitive).

  • Globbing Patterns Are "Anchored" Front and Back By Default. This means that if you search for just the string poodle you will NOT find matching names with that string in the middle of the name.

    You'd ONLY match a string:

    • which starts with a p
    • and which was followed by oodl
    • and which ended in an e

    If you DO want to match strings that may be in the middle of a domain name, you MUST remember to put an asterisk at the START of that pattern and an asterisk at the END of that pattern.

    Forgetting to include starting and ending asterisks is the most common thing people overlook/forget when doing glob searches!

IV. Some Specific Globbing Examples

Let's apply the preceding rules we've just described.

  • Use Globbing to Match An Exact String. For example, if you wanted to just match the exact domain name phloem.uoregon.edu. you should just enter exactly that

      phloem.uoregon.edu.
    

    That said, normally you WON'T know an exact domain name string to search for — if you did already knew an exact name that's of interest, you should just use Standard DNSDB (rather than using Flexible Search).

    Note that when doing a globbing match, any dots in the pattern are treated as actual LITERAL dots.

  • Use Globbing to Match A Simple Keyword. You can find domains that include a brand name (such as toyota) anywhere in the name. To do so, you'd enter the brand name with an asterisk on each side of it:

      *toyota*
    
  • Matching A Slightly More Complicated Globbing Wildcard Pattern. Assume you want to match any domain that includes the string north followed by anything (or nothing) and then the string bank

      *north*bank*
    
  • Square Bracket Alternation Example. Assume you want to match any domains that begin with ns following by 1-9 or a-f followed by anything from the dot net domain:

      ns[1-9a-f]*.net.	
    
  • Predefined Square Bracket Classes. You can also use predefined square bracket character classes as a shorthand. For example:

      [[:alnum:]]
    

    is functionally equivalent to:

      [a-z0-9]
    

    We generally find that it's as easy/short to just explicitly list the characters you want to match rather than using a predefined square bracket class.

  • BACK-Anchored Searches. If you only want to find matches that end with a specific literal string (such as .com.), end your search pattern with that specific literal string [Note: all domains in Flexible Search end with a formal dot]:

      *north*bank*.com.
    

    Important: to ensure the match ends with that literal string, do NOT end the glob pattern with an asterisk.

  • FRONT-Anchored Searches. Similarly, if you want to only find names that begin with the string ns followed by the literal string server followed by anything, do NOT begin the glob pattern with an asterisk:

      ns*server*
    
  • Glob Patterns That Match A Specific Number Of Characters. For example, if you wanted to match names that contain the literal string south followed by exactly any two characters followed by west you could use the globbing pattern:

      *south??west*
    

V. Square Bracket Intricacies

Square brackets act as "bookends" around a range of alternative characters at a specific location, any of which will work as a match option there.

This can be expressed a number of ways, including:

  • Specific Individual Characters.

      [aeiou567]
    
  • Character Ranges:

      [0-9]
      [d-r]
    
  • Negated Ranges: E.G., use a leading exclamation point within the brackets to match anything EXCEPT…

      [!s-z]
    
  • Ranges that Include a Dash: If you want to actually include the dash character as literal possible match, make that character first in the range you define:

      [-pmnoq]
    

V. Globbing vs. Regular Expression Feature Crosswalk Table

The regular expression rules are similar (but different) than the globbing rules: [Note: Wide table, please use scroll bar to see right-most column.]


	MATCH TYPE				GLOBBING				REGULAR EXPRESSION
	
	EXACT MATCH			       phloem.uoregon.edu.		       ^phloem\.uoregon\.edu\.$
	SIMPLE STRING			       *toyota*			          	toyota
	MID STRING WILDCARD		       *north*bank*		  	        north.*bank
	BACK ANCHORED SEARCH	               *north*bank*.com.                        north.*bank.*\.com\.$
	FRONT ANCHORED SEARCH	               ns*server*				^ns.*server
	MATCH EXACT # OF CHARS        	       *south??west*           	                south..west
	ALTERNATIVE MATCHING	               ns[1-9a-z]*.net.    			^ns[1-9a-z].*\.net\.$

Summarizing the specific differences shown in that table:

  • Match any single character: Globbing: Match any one character with a question mark
    Regular Expression: Match any one character with a dot

  • Matching zero or more characters Globbing: Match zero or more characters with an asterisk
    Regular Expression: Match zero or more characters with dot asterisk

  • Matching just specific characters listed in square brackets Globbing: List the characters to match in square brackets
    Regular Expressions: Same

  • Literal dot character: Globbing: Just enter the dot
    Regular Expression: enter a backslash followed by the dot

  • Front anchor: Globbing: Front anchored UNLESS the string begins with an asterisk
    Regular Expression: To explicitly front anchor the pattern, you MUST begin it with a caret (^) character

  • Back anchor: Globbing: Ending anchored UNLESS the string ends with an asterisk (if there's NOT an asterisk, any domain name should end with a formal dot)
    Regular Expression: To explicitly back anchor the pattern, you MUST end it with a dollar sign ($) character (and for any domain name, the dollar sign should be proceeded with a backslash dot)

VI. Summary

That's about it for globbing for now. Now you know the rudiments of entering glob-format queries.

If you're using regular expressions, however, you can do lots more than just the comparatively rudimentary stuff globbing supports.

Joe St Sauver Ph.D. is a Distinguished Scientist and Director of Research with Farsight Security®, Inc.