In August 2023 the Dataplane.org sensor network began observing a unique pattern of source address-spoofed DNS scanning activity. The most peculiar thing about these scans is that the source IP addresses were faked to a neighbor address of the target. We call this Destination-Adjacent Source Address (DASA) Spoofing. In the vast majority of these DASA spoofed scans the adjacency was within the same IPv4 /30 or /31 prefix as the destination.
We have also seen IPv6-based DASA spoofed scans as well, but the adjacency range has been as large as a /48 encompassing the destination address. In almost all cases these DNS-based DASA spoofed messages are A resource record (RR) recursion-desired queries. Is this a new Internet surveying technique, some sort of novel attack, or perhaps just a broken scanner configuration? This is the story of what we saw, why it intrigued us, what we did to triangulate where it was coming from, and a demonstration of our network intelligence (NETINT). We’re convinced we’re well poised to help the community produce interesting analysis such as this, and we want to help foster a new future of Internet measurement and monitoring.
Is Someone Trying to Map Our Infrastructure?
In a prior newsletter we explained how our UDP-based DNS signal feeds may report spoofed source addresses and we’ve stressed our feed data is not intended for use in block lists. We continue to publish these feeds because we believe even spoofed queries can be valuable NETINT. We developed the DNS TCP-based signal feed to help highlight differences in sources for this reason.
We knew the queries we were now seeing were not only historically unusual, but practically impossible to be legitimate, because we often had control over both destination and source addresses. What was going on… and could we do something about it?
The DASA spoofed addresses were being published in our associated DNS signal feeds. We noticed this almost immediately. Why were neighbor addresses of some of our sensors starting to show up in our feeds?! Was this some sort of sensor exposure attack?
Digging Into the Queries
From August 2023 to January 2024 we observed DASA spoofed DNS queries across our infrastructure, although the overall volume was relatively modest. The scans appeared to be run on an ad-hoc basis, where we observed no scanning activity for days at a time. The query domains and host name prefix were also inconsistent over the duration of activity.
The DNS messages resembled what we’d expect to see from an open resolver surveying project. The host name portion of the query included a hex-encoded target address and sometimes a date string. This is usually done to do match targets that answer to the queries sent, limit the effects of caching, and to match targets to any forwarders that may have been used to query authoritative servers for the query name.
We observed DNS A RR recursion-desired queries with one of at least four distinct query name structures. If our vantage point (VP) had the IPv4 address of 192.0.2.1 one form of the query name was as follows:
c0000201-19700101.l.secshow.net
The first label was a combination of the target IPv4 address (192.0.2.1) in hex (c0000201) separated by a dash and a date formatted as YYYYmmdd. The encoded date was often a few days before the scan arrived. The suffix zone (l.secshow.net) was fixed.
Another form of a query name we saw was:
c0000201-19700101-1000ps-rate.l-rate.secshow.net
Here the name is similar in all but the “-1000ps-rate” string appended to the first label and the use of different subdomain.
We observed some queries with a name in the the form of:
c0000201.d1201.savme.xyz
The first label in this name is only the hex-encoded target IPv4 address, followed by the fixed d1201.savme.xyz suffix.
Yet another another query name was of the form:
c0000201-bai1du.com
Again the destination IPv4 address was encoded in the first label followed by the “-bai1du” string, but notice these were essentially all invalid domains in the .com zone.
The secshow.net domain was registered through HiChina, an Alibaba subsidiary, savme.xyz through NameCheap, and none of the *-bai1du names appear to have been registered. We considered registering a few of random *-bai1du.com names, pointing them at our name servers, and watching to see what we’d get, but decided this was not worthwhile.
We had to wonder, were these registrars involved in this spoofed scanning activity, did someone behind the scanning have access to those authoritative server logs, was the scanning incorrectly setup, or was this something else we could not yet explain? Why would a registrar be involved in this type of source-spoofed activity? That would be highly unusual and generally frowned upon. We assumed these scans were not associated with the registars, but we did not try to verify with them. We didn’t feel this activity warranted more from anyone other than ourselves to satisfy our curiosity.
We did perform some rudimentary analysis on the domain names and associated Internet resources. For example, we found a few zones that would answer under secshow.net, but names in the “l.secshow.net” zone only seemed to provide an answer when the scans were active. The answer seemed to point at a generic Ubuntu system hosted in the China Education and Research Network (CERNET.edu.cn).
Now What?
When we shared our concerns with some of our advisors, we suggested we might wish to limit the availability of some public data in case this was a sensor exposure attack. One response to this suggestion was that maybe this was what an attacker would want, to discourage sharing our NETINT publicly. Another colleague suggested that the DASA spoofing scanning technique might be intentional to limit response traffic sent back onto the Internet, something an Institutional Review Board (IRB) overseeing research experiments might appreciate. We liked this explanation, but how would a researcher then evaluate their findings if they didn’t presumably have access to the registar’s authoritative servers?
We found scant few reports of this activity elsewhere [1] and no satisfying explanation for the activity. We aren’t ruling out the possibility that something more interesting than a benign experiment was afoot, but at this stage we found no reason to be alarmed by it.
We decided to take our analysis one step further. Using our distributed infrastructure we wanted to see if we could confirm if the spoofed packets were originating from where we suspected.
Source Address Spoofing Triangulation
It stands to reason we might be able to narrow down the true origin if there was one system emitting DASA spoofed packets and the initial IP TTL remained consistent. Our initial analysis of spoofed packets seemed to suggest these premises were reasonable assumptions. Our vantage points are fairly well distributed in many unique networks around the globe so we thought this was a good opportunity to see if we could triangulate by IP TTL distance from each of our sensors that saw these packets. That is, which of our vantage points saw the shortest hop count from the true origin as told by the IP TTL value received.
Based on existing evidence we suspected the scans were originating from a Chinese academic institution as part of a research project. Many of our sensors saw the scans so we plotted sensor geo-located IP addresses on a map and colored each location according to the calculated hop count distance away from the true origin. Our first plot attempt is shown in Figure 2.
From this plot it is hard to say for sure, but vantage points in Eastern Europe, Eastern United States, and New Zealand appear furthest away. East Asia and the west coast of the Americas appear nearer the origin. This doesn’t rule out our guess that the packets originated from within China, but the evidence is far from conclusive. It may be that today’s Internet routing landscape and our large number of vantage points make visualizing the source difficult. We decided to implement a low-pass filter, mapping only a smaller set of vantage points with the lowest IP TTL disance. This revised plot is shown in Figure 3 below.
This new rendering seems to have narrowed things down considerably. The true source would appear to be in East Asia. We see both limitations and possibilities with this analysis. It is difficult to get as close as we’d like to every possible true origin, but we can eliminate many regions, locations, and even networks from consideration. There is still plenty of room for uncertainty, but we are cautiously optimistic this analysis demonstrates that our capabilities and platform can be used to uncover exciting new Internet insights.
Conclusions
It may be, as we initially feared, that a destination-adjacent source address (DASA) spoofing technique was being used to expose sensors. Even if true, this information leak would be a mere blip in Internet phenomena the community need concern itself with. More importantly, we quickly observed the activity, mitigated it, told the community about it, and leveraged our platform to produce additional insights.
We have tentatively concluded these scans were a human-driven, imperfect experiment, and probably part of an academic research effort based in China.
Our public signal feeds are just a glimpse of what data we have available and what we can do. The full capabilities of our distributed network is full of possibilities we hope has sparked the interest of our readers. We have taken some initial steps to partner with researchers and funding benefactors to do more, putting what we have and can do to good use. We are eager to help usher in the next generation of network intelligence (NETINT) such as Internet trend analysis, early traffic anomaly detection, and geographically diverse network behavior research. If you’re conducting academic research or have funding ideas, please reach out.
Dataplane.org is U.S. 501(c)3 non-profit. Unrestricted donations are tax deductible.
Update 2024-03-06: Colleague and passive DNS inventor Florian Weimer wisely noted something we missed, lots of networks are not ingress filtering packets with their own source addresses. In fact, we have observed that between 80% and 90% of our sensors received these spoofed packets! His comment made us realize something else we failed to consider as an explanation for these packets. This may have been a quantitative measurement exercise to identify networks that do not perform that ingress filtering.