Please disable your adblock and script blockers to view this page

Achieving 100k connections per second with Elixir


HTTP
TLS
IoT
Elixir
Ranch
TSL
Ubuntu 18.04
SYN
Ranch—invokes
Cloudflare
net.core.somaxconn
Google
SYN messages.90th


Stressgrid
3.But
Ranch
Hacker NewsEnter


SO_REUSEPORT


Ranch
Linux


TCP/IP
/etc


Linux

No matching tags

Positivity     34.00%   
   Negativity   66.00%
The New York Times
SOURCE: https://stressgrid.com/blog/100k_cps_with_elixir/
Write a review: Hacker News
Summary

At 14th minute mark, the latency jumps from single-digit milliseconds to 1 second.To understand this bottleneck, we need to take a quick dive into the architecture of Ranch.Ranch maintains a pool of acceptors to enable more throughput when handling new connections. The 90th percentile now stays below 3.But is it too early to celebrate—at 15th minute mark, 90th percentile latency jumps to 1 second again.The breaking point in the connection rate graph is less pronounced, but remains at about 70k connections per second.We hit another bottleneck.To understand this bottleneck, we need to understand how TCP/IP is implemented inside the Linux kernel.When the client wants to establish a new connection, it sends a SYN packet. Furthermore, they proposed a Linux kernel patch that introduced the SO_REUSEPORT socket option, that makes it possible to open many listener sockets on the same port, causing the sockets to be load-balanced when accepting new connections.To find out if the SO_REUSEPORT socket option would help, we created a proof of concept application in which we ran multiple Ranch listeners on the same port with SO_REUSEPORT set using the raw setops option. This number never gets close to 1k, meaning we’re no longer dropping SYN messages.90th percentile latency confirms our findings: it remains consistently low throughout the test.When zooming in, 90th percentile latency measures between 1 and 2 milliseconds.We also observed much better CPU utilization, which resulted from less contention and fairer load balancing when accepting new connections.Finally, the connections per second rate reaches 99k, with network latency and available CPU resources contributing to the next bottleneck.By analyzing the initial test results, proposing a theory, and confirming it by measuring against modified software, we were able to find two bottlenecks on the way to getting to 100k connections per second with Elixir and Ranch.

As said here by Stressgrid