Sensor Strategy, Monitoring Platforms
Tools and techniques we use, plus a summary of platforms we like
Friends, in this monthly update we focus on two distinct areas of interest. First, we dig into some details of our sensor infrastructure, giving you the audience a look at what is involved in choosing, deploying, and managing well over 200 systems in over 70 different networks. Our second focus is on a handful of distributed debugging, monitoring, and measurement platforms outside of our own. Oftentimes we will be presented with a project idea that is either not well suited for our infrastructure or is already being done elsewhere. A few of those complimentary projects we think you should know about are summarized. As always, we close with an organization update.
Sensor Strategy
The majority of systems we operate are commonly called “sensor” nodes. A sensor node’s primary purpose is to monitor Internet activity seen in-the-wild, while limiting interaction and traffic origination of its own. A sensor node can be seen as somewhere between a darknet collector and a honeypot. As long as these nodes are free to observe unsolicited traffic aimed at their public addresses, we can collect, process, measure, and relay what we see to the community. Sensors are responsible for most of the Signals data today.
The base OS and installed software on sensors is standardized to simplify setup and maintenance. We utilize Debian and Ansible almost exclusively. No OS is perfect, but we find Debian the most convenient, widely available, and simplest Linux distribution to use in heterogeneous environments. Ansible is a relatively straightforward agent-less system management tool kit that adequately meets our needs. Our needs are relatively modest. A sensor’s network usage is quite modest, making high-speed links or high-limit usage caps unnecessary. We have sensors running on systems with a single CPU, having less than 512 MB of RAM, and less than 10 MB of disk. Such a modest setup poses no problem for our preferred OS or management tool. We have larger systems too, but sensors can run with minimal resources to keep costs down.
We maintain a list of server hosting providers in a GitHub repo. There you find many familiar names and probably many not-so-familiar. How do we go about choosing providers? Early on in the life of Dataplane.org, cost was a major determining factor. We hunted for deals and often found them. We have several systems that cost under $10 US/year, which is cheaper than the current going rate for an IPv4 address alone. We remain cost conscious, but as we have become more serious and seasoned, we are increasingly more discerning. In our relatively short existence we have seen many providers come and go, and when they go, you usually have to make an effort to get a refund for services not delivered. Over time we’ve become more adept at selecting or avoiding certain providers.
Factors that influence our hosting provider selection include:
Unique or desired origin (geographic, network, IPv4 address)
Support for “consumer-safe” payment methods (e.g., PayPal)
Provider history, reputation, and value
Fraud detection practices (i.e. avoid those that request government-issued ID)
Most of the well-known, large hosting providers satisfy those all of the points above to a relatively high degree. The so-called low end providers, usually one or two-person operations that provision and sell VMs on a handful of dedicated servers, are the ones you have to be careful with. If they haven’t been in operation for more than six months and are offering a deal that sounds too good to be true, then this is where the payment method and attention to security practices becomes very important.
Thankfully, our sensor diversity and simplicity means no one sensor is of any particular importance. If any one sensor goes down or a provider goes away, it is a minor inconvenience, but never an emergency. This means we continue to value low cost sensors if they adequately satisfy our sensor needs (origin, payment type, provider standards).
One final note. We’ve had offers from kind people to run a sensor for us. We appreciate these offers, but we prefer to enter into official provider-customer relationships rather than rely on favors. We appreciate our friends, but we would rather not rely on favors for support and uptime when it comes to our infrastructure.
Measurement and Monitoring Platforms
The Internet community has come to depend upon community-supported and non-profit organizations to help foster and undertake certain types of activities that would otherwise lack support. We want to highlight four platforms we feel help address a variety of Internet measurement and monitoring needs. For those interested in what Dataplane.org does, you may find these platforms complementary in nature.
Censored Planet is an academic research platform led by Professor Roya Ensafi at the University of Michigan. The platform aims to monitor and measure Internet censorship without requiring distributed software or hardware installations. This project infers censorship by utilizing novel infrastructure techniques like open resolvers, ECHO services, and IP ID prediction. Access is automatically granted for those with a valid .edu email, but an account can be manually requested, presumably granted with valid justification and within limits of acceptable use.
NLNOG Ring is a network of Linux systems shared among network operators. If you have an ASN and a free IPv4/IPv6 address upon which to run a VM, the NLNOG will be happy to install a node in their distributed shell system. This then grants you access to all the other systems in the “ring.” The intent is for network operators to be able to perform the basic network troubleshooting and debugging from a system outside their network.
OONI or Open Observatory of Network Interference is another censorship-oriented monitoring and measurement platform. Unlike Censored Planet, OONI is composed of probes installed by anyone with permission on a system they control. OONI probes are run by people in countries all over the world. OONI probes measure several specific applications such as Tor, Telegram, and WhatsApp to evaluate whether they function uninhibited. A wealth of OONI data from probes is available to explore.
RIPE Atlas may be the most extensive distributed monitoring platform in this lineup. The RIPE NCC managed platform consists of hardware and software-based probes run by volunteers around the globe. Several built-in tests are regularly performed, and user-specified tests can be created for specialized analysis projects. This platform is very popular with the Internet measurement community due to its breadth in tests and the size of the network. Practically anyone can participate, but you must obtain “credits” to perform your tests. You earn credits by running your probe, but probe operators can transfer excess credits to anyone with an account on the system.
The State of Dataplane.org
Our lawyer has begun filing the paperwork necessary for Illinois and US federal non-profit status. We expect a little back and forth over the next few weeks as that process progresses, primarily separated by periods of waiting. We don’t expect to have much to say about this process for awhile. However, one thing we know we will likely need to do is find either another lawyer or more lawyers. No, we are not firing our lawyer nor are we unhappy with our lawyer’s services. We will be interacting with people and organizations from around the globe, as well as navigating complex cyber-specific activities. This will lead to legal services beyond what our current lawyer alone can provide. It is unclear how quickly this expansion will take place, but we anticipate needing additional help as soon as we begin to develop fundraising campaigns.
On the technical front, we have three big work items we’re primarily focused on. One, is the API. Since we’ve released the preview, we are already working on needed improvements, particularly on performance. Two, we are working on deploying new back-end systems that will house our collection engines, databases, and reporting tools. This need became particularly acute when our current singly homed back-end lost IP connectivity for approximately 24-hours this past weekend. Third, is the public repository of Signal data. We hope this will become a valuable new, unique resource for researchers.
You know where to find us. Feel free to reach out via email, Twitter, or Slack (request an invite if you need one).