NSD4 TCP Performance
By Wouter Wijngaards
For NSD 4 the TCP performance was optimised, with different socket handling compared to NSD 3. This article discusses a TCP performance test for NSD 4. In previous blog contributions, general (UDP) performance was measured and memory usage was analysed for NSD 4.
The TCP performance was measured by taking the average qps reported by the dnstcpbench tool from the PowerDNS source distribution. (Thanks for a great tool!) The timeout was set to 100 msec. On FreeBSD the system sends connection resets when a TCP connection cannot be established, and in this situation the tool overreports the qps. To mitigate this the qps was scaled back by multiplying by the fraction of succeeded tcp queries. The scaled back qps is close to the median qps that is also reported by the tool. For Linux, such scaling was not performed, and the average and median are close together.
You can click to enlarge these charts:
The highest TCP queries per second performance on Linux is about 14k qps by Yadifa, and then followed by NSD 4. On FreeBSD performance is higher, about 16k qps, and NSD 4 is fastest, with Knot and then Yadifa following with about 14k qps. Notice how Bind performs at 12k qps on FreeBSD, and 8k qps on Linux. PowerDNS remains at about the same speed. NSD 4 has higher TCP qps than NSD 3, on Linux and on FreeBSD.
On FreeBSD, software that cannot handle the load produces connection errors. The number of connection errors goes down when more threads are used by NSD 3, and Knot. For other software the thread count does not really influence this connection error count, but it does increase the qps performance. For Yadifa the qps performance degrades substantially when more threads are used, and it has a large number of connection errors because of that. In general the connection errors are caused by a lack of performance, and a TCP connection cannot be established. Linux apparently deals differently with this (turns it into timeouts), this may cause some qps reporting differences between the OSes. In both cases the charts represent the average successful TCP qps.
The same pattern as for UDP can be seen with the number of threads, for NSD 4, the best Linux performance uses 2 cpu, and performance increases better and higher on FreeBSD, but the optimum here is 3 cpu instead of 4 cpu for UDP. Other software similarly benefits from more CPU power. It turns out that the PowerDNS option to get extra distribution threads adds UDP workers and not TCP workers, this is why performance does not scale up in these charts for PowerDNS. Yadifa performance goes down for both Linux and FreeBSD when more threads are in use.
The zone served in these experiments is a synthetic root zone (as used in previous tests), with 1 million random queries for unsigned delegations. PowerDNS uses its zonefile backend. The same test system(s) as used in the previous measurements are used, a PowerEdge 1950 with 4 cores at 2 GHz is running the DNS server.