SAD DNS and NLnet Labs DNS software

SAD DNS and NLnet Labs DNS software
Update 18 November 2021: we are aware of the follow-up paper published by the researchers. The text below remains accurate for Unbound users. Please note that Unbound 1.13.2 and newer has IPv6 PMTU disabled  for UDP.

During the ACM CCS conference 2020, held November 9-13, researchers from UC Riverside in the US, and Tsinghua University in China presented a clever new variant of DNS cache poisoning attack that they call "SAD DNS", which exploits a side channel exposed by most operating system network stacks.

This is yet more proof that DNS cache poisoning remains a threat, and all the more reason to invest in the only real protection against this type of attack: deploying DNSSEC.

Just patchwork – Of course any mitigation against this latest attack on unprotected DNS is just patchwork in the end. The real solution to this type of attack is to guarantee the integrity and authenticity of DNS messages using DNSSEC. As a starting point, if you have not done so yet, we strongly recommend that you enable DNSSEC validation on your DNS resolver.

To explain the attack in a nutshell, the authors of the paper exploit the fact that there is a shared rate limit for ICMP responses. An attacker can leverage this to collapse the search space for which port a resolver sent a DNS query to an authoritative name server from. This effectively reduces the number of potential guesses an attacker needs to make in order to successfully fool a resolver into accepting a forged DNS response from over 4 billion to just two times 65,536. To make the attack even more effective, the paper's authors also combine it with an attack on Response Rate Limiting implemented in authoritative name servers to "mute" them temporarily, further increasing the chances of a forged DNS packet reaching the resolver under attack before the "real" response arrives. An excellent in-depth explanation of the attack is provided in this blogpost from Cloudflare, and more intrepid readers may just want to read the paper.

What you can do

In the remainder of this blog post, we briefly discuss the impact of this attack, and what you can do as a user of NLnet Labs DNS software.

What should I do if I operate a DNS resolver?

Regardless of what software you use, first and foremost, you should update your operating system. The authors of the paper have worked with (among others) the Linux kernel development team, and a patch is available and integrated in the 5.10 Linux kernel and is likely to be backported to other versions of the kernel by the various Linux distributions.

If you cannot easily patch or upgrade your kernel, you could also consider implementing firewall rules that block outgoing ICMP Port Unreachable messages. Note, though, that such a wide block may have unwanted side effects.

How does this affect Unbound?

This attack does not directly affect Unbound, as the vulnerability is in the network stack of the underlying operating system. Unbound already implements mitigations against cache poisoning attacks such as query ID and port randomisation, and also supports detection of unwanted responses from authoritative name servers.

Nevertheless, the paper specifies two properties of Unbound that may give an attacker an advantage when executing this attack against a system running Unbound:

  1. Unbound has a retransmission timeout (RTO) approach that provides attackers with a wide(r) window to successfully execute the attack in. While the authors of the paper suggest that this is something that could be changed, we note that this does not really mitigate the attack. Real mitigation is a matter of patching the network stack, blocking outgoing ICMP messages or even better: deploying DNSSEC. In addition to this, Unbound's RTO strategy also has advantages that make it more likely that hard-to-resolve names actually get resolved. So changing this behaviour is not without penalties and does not solve the underlying mechanism of the attack.
  2. Unbound does not use connected UDP sockets. As the paper states, calling connect() on UDP sockets makes the attack harder to execute. Note that we say "harder", because the authors also describe a variant of the attack that will work for connected UDP sockets. We are studying the impact of using connected UDP sockets in Unbound. Again, this may incur a performance penalty, and again, it does not mitigate the attack. If we do switch to connected sockets, this will be part of an upcoming Unbound release, and of course, we recommend that you always update your DNS software to the latest version to stay safe.

How does this affect NSD?

While the goal of the attack is to poison the cache of a resolver, the authors of the paper also perform attacks against authoritative name servers to increase the likelihood of the attack succeeding. In particular, they attempt to "mute" authoritative name servers by triggering a mechanism called "Response Rate Limiting" (RRL) by aggressively sending queries to the authoritative name server. The goal of this "muting" attack is to make it less likely that a legitimate response to the query the attacker wants to forge the response to arrives at the resolver before the attack succeeds.

So if you operate an authoritative name server for a domain and you are using NSD, an attacker may attempt to "mute" your name server during an attack on a resolver. In the paper, the attackers make assumptions to what extent they should mute an authoritative name server for the attack to succeed. In many cases they quote a mute rate of 66% or higher. So what does NSD do?

By default, NSD has Response Rate Limiting enabled (see rrl-ratelimit in the nsd.conf manual page). NSD's RRL implementation is such that NSD will send responses to some queries if the rate limit is exceeded, and these will have the TC bit set, forcing the resolver to fall back to TCP. It does not send responses to every query if the rate limit is exceeded, this is a configurable parameter (see rrl-slip in the nsd.conf manual page). By default, NSD will answer 1 in 2 queries, so under attack, an attacker could achieve a "mute" rate of 50% with the default settings. If you are really concerned or see evidence of this attack in action, you can configure NSD to respond with a truncated response to every query above the rate limit by setting rrl-slip to 1. We recommend against doing this by default, though, as this behaviour also prevents certain types of abuse in denial-of-service attacks.

Summary and takeaways

  • SAD DNS is a new variant of DNS resolver cache poisoning attack.
    Key takeaway: the attack itself is not new, but the suggested variant circumvents existing short-term mitigations making the attack more likely to succeed.
  • The underlying vulnerability is in the networking stack of operating systems.
    Key takeaway: update your kernel, or mitigate the attack by blocking outgoing ICMP traffic.
  • There are short-term stopgap measures (kernel patches, firewall rules), but these do not solve the fundamental flaw in DNS that allows for cache poisoning.
    Key takeaway: the only real solution is full-scale deployment of DNSSEC. If your resolver does not validate yet, turn that on now, if you have not signed your domain, seriously consider doing so.
  • The vulnerability is not in DNS software itself.
    Key takeaway: while we, and other open source vendors, are looking at further hardening of our software, the real long-term solution is to deploy DNSSEC.