Peering automation at Facebook

Traffic on the internet travels across many different kinds of links. A fast and reliable way to exchange traffic between different networks and service providers is through peering. Initially, we managed peering via a time-intensive manual process. Reliable peering is essential for Facebook and for everyone’s internet use. But there is no industry standard for […]

Traffic on the internet travels across many different kinds of links. A fast and reliable way to exchange traffic between different networks and service providers is through peering. Initially, we managed peering via a time-intensive manual process. Reliable peering is essential for Facebook and for everyone’s internet use. But there is no industry standard for how to set up a scalable, automatic peering management system. So we’ve developed a new automated method, which allows for faster self-service peering configuration. We’re sharing a few best practices we have learned in automating our public peering in a hope of wider adoption of our approach in the internet community.

How does this work? Take Facebook, for example. Your friend has just posted a video of an extremely cute cat, and you are about to watch it. Let us follow the path of the cat video before it reaches your device:

Peering automation uses the shortest path for a video before it reaches your device:

 

Option A: Often the slower, less reliable, higher-latency route:

You see your friend’s post with a cute cat video, and you click on it and can’t wait to watch it! Before the video reaches your device, Facebook needs to send it to your ISP using the best-performing, shortest route available. There might be many other networks (commonly referred to as transit networks) between Facebook and your ISP. They might be interconnected in suboptimal locations with potential capacity constraints, causing the awesome cat video to reach you slowly. Nobody wants to watch a buffering cat video!

Option B: Often the faster, more reliable, more direct route:

You clicked the cat video to watch the cute cat! Before the video even reaches your device, Facebook’s traffic controller realizes there is a fast, direct way to your ISP, without other networks in the middle. The cat video travels through a peering exchange, a common meeting point where lots of different types of networks interconnect by establishing Border Gateway Protocol (BGP) sessions between their routers. These peering sessions allow them to directly exchange bits, including cat videos, which helps improve the quality, performance, latency, and reliability of the user experience. 

Why we automated public peering

Across the industry, configuring peering manually is known to be a painfully slow, inefficient, error-prone process. This challenge gets bigger as networks connect to new internet exchanges (IX) and connect multiple routers to each IX.

Before developing our automated system, we suffered the same struggle. Peers would email us to request to establish peering sessions. Next, one of our Edge engineers would verify the email and check our mutual traffic levels. To confirm the traffic levels were appropriate, that team member had to check numerous internal dashboards, reports, and rulebooks, as well as external resources, such as the potential peer’s PeeringDB record. The team member then would use a few internal tools to configure BGP sessions, reply back to the peer, and wait for the peer to configure their side of the network.  

This approach had several problems. First, there was no centralized place to see the incoming peering requests or the existing peering status. Requests could arrive over email, or several other out-of-band systems. Edge engineers had to track, parse, and hand-verify every request. Next, for each request, the team member had to manually launch and monitor an in-house tool for each peer, and then, once finished, type a response to each peering request. At last count, we estimate that this process took more than nine hours per week — wasting a whole day of each workweek on a needlessly manual process.

Our solution

We are thrilled to announce that peers can now request their own public peering sessions through our facebook.com/peering page. 

PeeringDB Oauth

PeeringDB is an open source, not-for-profit, volunteer-run database of networks and their peering network information, verified and vetted by PeeringDB administrators. We believe there is value in the PeeringDB database, and, along with many others in the industry, we support it through sponsorship. Since most peering networks already maintain their PeeringDB records as a source of truth for other networks to consume, we see PeeringDB’s OAuth service as an opportunity to standardize a peering management authentication method. 

To ensure that peering requests made on our peering page are from an authorized person, we require the requester to authenticate using their PeeringDB login and leverage PeeringDB’s OAuth service on behalf of their network’s organization. The peer does not need to provide any other authentication — no Facebook account is required. Once authenticated, the peer will see a list of all their network’s existing public peering sessions with Facebook and can submit new requests.

Once authenticated, the peer will see a list of all their network’s existing public peering sessions with Facebook, and can submit new requests.

After requesting sessions, our internal process takes over. All the peer has to do is await our automated emails and configure their side of the network. 

We have also set up a monitoring system to sort our [email protected] mailbox. If the system detects a peering request, it automatically replies with instructions to direct the peer to our peering page. Of course, we still monitor the inbox to respond to any nonpeering inquiries or support requests. But this new engine has significantly reduced time spent combing through email and verifying requests.

Once received, the request goes to an auditing queue. If the request is approved, another service launches a workflow to set up peering. First, it queries PeeringDB and our internal tables to validate and collect the mutual peering information, such as IP address and max-prefix settings. Next, it configures the sessions on Facebook’s routers. After that, it emails the peer to confirm that BGP sessions are ready on Facebook’s side and waiting for the peer to turn up their side. The workflow then checks daily to see whether the sessions are established. On the second, third, seventh, and 13th days, it sends an email to the peer to remind them to configure the sessions. As soon our workflow detects that all sessions have been established, our workflow sends a final confirmation email. At that point, our peer should be able to see the new sessions as active in the table on facebook.com/peering. 

Creating an industry standard

Since launch, we have received more than 170 peering requests and approved 149 of them. As a result, we have automatically pushed more than 1,400 public peering sessions — for a time savings of more than eight hours per week.

we have automatically pushed over 1,400 public peering sessions — for a time savings of more than 8 hours per week.

With PeeringDB OAuth, we are able to check the validity of peering request submissions and automate more steps in the peering turn-up process. Based on our experience using this system, we recommend leveraging PeeringDB OAuth as the industry standard in other public peering automation applications and implementations.

Building on our public peering automation success, we are investigating ways to automate our private network interconnects (PNIs). Private peering is the larger-volume counterpart to public peering, and we hope to offer a self-service option later this year. We are also exploring the possibility of using PeeringDB OAuth as an alternative login service for other services we offer our network partners. 

The post Peering automation at Facebook appeared first on Facebook Engineering.

Source: Facebook