Introduction to Reputation Systems
By Kelly Molloy
Paraphrasing Wikipedia, a reputation system computes and/or publishes a judgment, often in the form of a numeric score or grade, for a set of objects within a community or domain based on a collection of opinions from other objects or entities. For example, you may have seen mail or web traffic that had a "low score" and was rejected by a reputation system. This actually happens quite frequently on the Internet and many people do not know what a reputation system is or how it arrives at its verdict. In this introductory article, I will explain what a reputation system is and how it works.
IP- and Domain-based Reputation
There are many different kinds of reputation systems in active use today. There are reputation systems that deal with rating the trustworthiness and "spamminess" of individuals, such as the seller ratings at eBay, or the upvote/gold system at Reddit. Additionally, there are systems that rank and rate retail businesses based on user reviews. For this article, we will only be focusing on reputation systems that deal with IP and domain reputation.
In Reputation We Trust
Reputation systems such as these tell you if the IP or domain that you're accepting (or about to accept) traffic from is considered trustworthy, untrustworthy, or somewhere in between. It is important to note that a reputation system is not intended to tell you if a particular message is spam or not, but rather to assign a degree of trust to its source. Other anti-spam systems may take this reputation into account when rendering a verdict. Reputation systems themselves are macro; they deal with the behavior of traffic from the IP or domain.
You Can't Escape Your Past
It is simple, in theory; reputation systems consider past behavior to be predictive of future behavior, much like a credit score. If you've paid your bills on time in the past, you're likely to continue to do so in the future. But remember that a credit score doesn't consider factors like your level of education, or what kind of car you drive; a PhD with a Mercedes could have a lower credit score than a high school graduate with a Yugo. Those factors are not reliable predictors of your willingness and ability to pay a debt. Online reputation systems are much the same – you need to use relevant inputs to receive a meaningful output. In this context, "meaningful" is dependent on what is considered bad – systems that are intended to identify IPs or domains that disseminate malware will use different inputs than those intended to identify systems sending email spam.
Choosing what inputs are relevant is a large component of a reliable and useful reputation score. Some common ones are:
- How old is the domain?
- When and where was the domain registered?
- When was traffic from the domain first observed?
- How many domains have resolved to the IP in a particular quantum of time?
- Do forward and reverse DNS lookups match?
- Has the IP or domain name been recently listed in a blocklist?
- Have any of your systems recently received spam or malware from this IP or mentioning this domain name?
- Does it do what it says on the tin? That is, does
mx.example.comsend SMTP traffic? Does
www.example.com answeron port TCP/80?
- Are the IPs surrounding this IP good citizens, or is this a bad neighborhood? For domains, are other domains owned by the same registrant good or bad? What about the registrar?
- Is the domain registration information privacy protected?
- Is this IP or domain behaving consistently with how it has behaved in the past? Has there been a drastic change in volume or in the type of traffic?
- Is the geolocation of the IP in an area known for network malfeasance?
- Is the IP IPv4 or IPv6?
- What ASN does the IP belong to?
Different systems can and do vary greatly in what they find germane, depending on what they're trying to accomplish. Most commonly, these and other factors are weighted according to the creator's goals and are regularly reviewed for efficacy and false positives. It is not unusual for a single input to become less efficacious over time and get switched out for a new, more effective metric.
Reputation Front Runners
Some reputation systems make their verdicts publicly known; the most well known examples are Cisco's Senderbase and Return Path's Sender Score. Both track domain reputation as well as IP reputation. If you are monitoring your own IPs, a change in Senderbase or Sender Score is definitely worth investigating. Sometimes they can react more quickly to a malware infection than your abuse@ alias.
Mind Your P's and Q's
If you do have an issue, such as a malware infection or a spamming customer or business unit, what can you expect will happen to your reputation? As soon as a sensor (a server sharing data with the reputation system) sees your unwanted or malicious traffic, your reputation score will plummet. It is in your best interest to remedy the issue as quickly as possible, but don't expect your score to recover immediately. Just like your credit score doesn't recover immediately if you have a late payment, your reputation score can take time to rebuild. It may be tempting to decide you'll just stop using the IP for a while, but that will starve the reputation system of new and benevolent information to drive out the bad information. Keep the IP in use even though it may be a bit of a painful process. Additionally, don't expect a newly commissioned IP or domain to have a great reputation immediately. On the Internet, new IPs and domains are often considered guilty until proven innocent. It can take a while for good information to disseminate.
I hope you now understand what reputation systems are, what they do, what kind of data they consume, and how they react to bad information. In the next article, we will talk about what kind of data you can use to build your own reputation system.
Kelly Molloy is a Senior Program Manager for Farsight Security, Inc.
Read the next part in this series: Building a Reputation System From Available Data