UDP Load Balancing with Keepalived

About halfway into my four month internship as a platform developer at 500px, I was faced with the problem of load balancing UDP packets. The rate limiting service that I had been writing was ready to ship: the code was finished, the Chef cookbooks were written, and the servers were provisioned. The only thing left […]

About halfway into my four month internship as a platform developer at 500px, I was faced with the problem of load balancing UDP packets. The rate limiting service that I had been writing was ready to ship: the code was finished, the Chef cookbooks were written, and the servers were provisioned. The only thing left on my plate was to come up with a load balancing solution. I needed to make sure that the rate limiter didn’t get overwhelmed with packets. There was no precedent in our codebase; this was our first service serving a large amount of UDP traffic. I had a clean slate to choose whatever implementation I wanted.

A couple of options quickly surfaced. The first was Keepalived, a load balancer built on top of the virtual router redundancy protocol (VRRP) and the Linux Virtual Server (LVS). The other, more modern choice was everyone’s favourite proxy server, nginx.

Initially, nginx was the more attractive choice simply because of better (i.e. existent) documentation. In fact, if you don’t need robust health checking, Nginx is the simpler solution. However, if you need something deeper than “can I ping this server”, you’ll have to look elsewhere. Since our rate limiter implemented an HTTP status endpoint to do more sophisticated health checks, we chose to move forward with Keepalived.

A quick read though our Chef cookbooks showed that we already had keepalived.conf file ready to go. Two HAProxy load balancers were using Keepalived as a failover mechanism (as described here). Basically, a VRRP-controlled IP address floated between the two machines, starting at the master and moving to the backup in case of a failure. Five different microservices were being managed under this virtual IP; it made sense to use this IP address to refer to the rate limiter as well. Only a few extra lines of configuration were needed. The outcome looked a little something like this:

# Old configuration defining the VRRP virtual IP address
vrrp_instance VI_1 {
state MASTER # or state BACKUP
interface eth0
virtual_router_id 50
priority 100 # highest priority will become the master
advert_int 1
virtual_ipaddress {
10.1.1.20
}
}
# Added configuration, defining two real servers under the same 
# virtual IP from above
virtual_server 10.1.1.20 {
delay_loop 2
lb_algo rr # round robin
lb_kind dr # direct routing
protocol UDP
  real_server 10.1.1.21 {
HTTP_GET {
url {
path “http://10.1.1.21/status"
status_code 200
}
connect_timeout 10
}
}
  real_server 10.1.1.22 {
HTTP_GET {
url {
path “http://10.1.1.22/status"
status_code 200
}
connect_timeout 10
}
}
}

A reference for the Keepalived configuration can be found here.

These settings define one virtual IP 10.1.1.20 as well as a set of two real servers load balanced under that virtual IP. Notice how easy it is to configure an HTTP health check! I tested out the changes on a couple virtual machines and was pleased to find that everything worked perfectly. Packets alternated between VMs just like I had told them to. It was time to ship.

Unfortunately, nothing is ever that easy. Almost immediately after running Chef Client on the production load balancers, the site went down. All of the services behind those two load balancers were unreachable using the virtual IP. Needless to say we rolled back quickly, but we were confused. The old configuration hadn’t been changed. Why would introducing new, independent functionality break our load balancers?

As it turns out, it didn’t. We had some deeper problems. After a day of digging through logs and documentation with my team, we came upon the problem. Upon examining a few of our servers, we found that their ARP entries for the virtual IP address pointed to the backup load balancer as opposed to the master. Packets arrived at the wrong doorstep and the secondary load balancer didn’t know what to do with them. It simply looked at their addresses, said “this isn’t for me”, and sent them away into the abyss.

Armed with this knowledge, we dug a little deeper into the inner workings of Keepalived. Whenever a server transitions to the master state, it sends a gratuitous ARP message declaring that it is now the owner of the virtual IP address. When we made our configuration changes, we caused a failover situation, causing the backup to temporarily take control. When the master came back up, it should have sent its gratuitous ARP message to reclaim control of the virtual IP. But, it turns out we had pinned Keepalived to a version with this very specific bug. Ashamed at our ops faux-pas, we upgraded to a new version. We could have shipped at this point, but we were troubled. Other dark networking problems might have been waiting, ready to take our site down once again.

It turns out that caution was to our benefit. After some more research, we came across something called “the ARP problem”. The ARP problem occurs when using LVS with direct routing or IP tunnelling. Since all of the machines in the LVS (ie. the load balancers and real servers) believe that they have ownership of the virtual IP address, it’s possible for clients to receive the MAC address of one of the real servers instead of the load balancer when making an ARP request. Put simply, there was a significant chance that the clients of the rate limiter were going to bypass the load balancer and access one of the rate limiters directly, rendering all of our work useless.

Luckily, there is a simple solution to the ARP problem. Linux allows dummy networking interfaces to be added to machines. A dummy interface mocks a real IP address and does not respond to ARP requests. All we had to do was add dummy interfaces with the virtual IP address to the rate limiter servers and the ARP problem was avoided. Here’s how we did it:

$ modprobe dummy numdummies=1
$ ifconfig dummy0 10.1.1.20 netmask 255.255.255.0

Our setup was starting to look pretty solid. There was one more thing we wanted to fix before we launched. In the event of a failover, we wanted to ensure connections to the load balancer wouldn’t get dropped. Rate limiting is an important piece in our infrastructure and accuracy is a key trait. Keepalived allows you to enable the LVS sync daemon through a configuration option. The sync daemon watches the master’s connection state, and continuously sends it to the backup in case a failover occurs. Our final configuration looked like this:

# Brand new config enabling the LVS sync daemon on VI_1 through the 
# eth0 interface
global_defs {
lvs_sync_daemon_interface eth0 VI_1
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 50
priority 100
advert_int 1
virtual_ipaddress {
10.1.1.20
}
}
virtual_server 10.1.1.20 {
delay_loop 2
lb_algo rr
lb_kind dr
protocol UDP
  real_server 10.1.1.21 {
HTTP_GET {
url {
path “http://10.1.1.21/status"
status_code 200
}
connect_timeout 10
}
}
  real_server 10.1.1.22 {
HTTP_GET {
url {
path “http://10.1.1.22/status"
status_code 200
}
connect_timeout 10
}
}
}

This is the final solution that is running in production today. It took some research and some understanding of lower level networking, but our setup has been very solid. We are now using this configuration for two different microservices with no issues. While there are many pitfalls that need to be avoided, Keepalived turned out to be a solid choice to load balance UDP traffic.


UDP Load Balancing with Keepalived was originally published in 500px Engineering Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: 500px