How we scaled Wisembly’s infrastructure : moving from our Elephant to RabbitMQ

Hi, I’m Guillaume, I am the CTO and co-founder of Wisembly, a SaaS solution facilitating interactions during your big meetings and events. We recently launched a beta of our new product: Solid to help you make your every-day-meetings more productive and actionable. This is the story of how we improved our performance by changing and […]

Hi, I’m Guillaume, I am the CTO and co-founder of Wisembly, a SaaS solution facilitating interactions during your big meetings and events. We recently launched a beta of our new product: Solid to help you make your every-day-meetings more productive and actionable.

This is the story of how we improved our performance by changing and adding elements to our stack over the time. I’ll particularly focus on how using RabbitMQ on the backend to communicate between different stack and servers improved our life.

Where we once were

Here are the building blocks for our tech team’s philosophy: start small, DRY (Don’t Repeat Yourself) and YAGNI (You Ain’t Gonna Need It). Back in 2012, when we implemented real-time websockets communications with Node.js and Socket.io, we had a pretty small stack: everything fullstack on Symfony2 with MySQL as single storage and some tiny parts of Backbone.js here and there to power up our application. One year later I presented these slides at the Symfony2 Live Paris 2013 explaining how we implemented Elephant in raw PHP to communicate from our Symfony2 backend with our distant socket.io push server.

https://medium.com/media/dfdfc12635e346b3ceb1b2838fda5808/href

As I said, our stack was pretty minimal at the time. We didn’t feel the need to complexify it for the sake of the socket.io push server. So we looked at websockets and found a pretty way to implement them, connect and emit events with our Open Source library. It did the job, we open-sourced something cool (more than 500 stargazers now and still active!), on our way to live happily ever after 🙂… Or did we?

Where we are now

Quite recently, as the business was growing, more and more push events were sent every minute on the various customer meetings we handle daily. For example, we have a specific feature for very interactive seminars where more than 500 users can answer a live poll. Oftentimes, all the attendees submit their answers during the same 10-to-20-second time window, right after the poll is launched by the presenter. Aside from some MySQL resource concurrency problems (deadlocks) we had to deal with our friend Redis (that’s another story we might tell another time). We also had to deal with server pressure, as you can guess.

We had pretty shitty response time from our API. Skyrocketing from an average of 125 ms to ~350 ms, some time even more! The process was putting much pressure on php-fpm worker that had to fork again and again, spawning many children to the configured limit to handle all the sudden workload. It jeopardized other clients who were using the solution at the same time.

We ran blackfire.io and look deeper into New Relic data to spot bottlenecks, and then looked at our code to find where we could improve performances. We quickly noticed that external calls to the push server through cURL with Elephant were quite time and process consuming!

Sh*t! Suddenly we realized that we could improve code here and there, add cache or move things to Redis to reduce concurrency and improve throughput, but the push events communication layer with our beloved and loyal Elephant would still be a burden!

Looking at our stack

When we stumbled upon this problem, we then looked at our ways to overcome that by taking a closer look at our stack. Since 2012, we’ve had two powerful allies: Redis and RabbitMQ. Both of them are great to publish events with data and a great way to communicate between servers.

We finally chose Rabbit as we were already using it intensively with Symfony recently and the great Swarrot library. In a few minutes, thanks to amqplib and 50ish new lines later in our node pusher app, we connected our node server to our Rabbit queue and added the ability to receive events through websockets AND Rabbit.

In one more hour, we created a RabbitTransport with the exact same interface as our ElephantTransport. A simple Symfony config parameter later, the push events emitted by our API were not sent through Elephant but put in a Rabbit queue instead. And like magic, there were rightfully consumed by the distant push server, and it was blazing fast.

Performance improvements

Alright, after 3 long years of faithful and loyal service, could we get rid of Elephant, our baby, in only 3 short hours? Let’s dive into the numbers.

Fist of all, A to Z process (time between API call and broadcasted event reception) was not affected, even slightly improved: push events are received in less than 90ms (they still are received by listening users before the user that made the action receive the API response!). This does not vary a bit, even with 1000 events fired in a second to a single user. Good.

Then, looking only at server side, let’s try to send 100 “consequent” push events to our push server. On a standard developpement machine (1,8 GHz i7 dual core, 4GB Ram):

  • SocketTransport (with Elephant): 100 events took 8936ms
  • AmqpTransport (with RabbitMQ): 100 events took 521ms

No typo here, you read right: about 17 times faster! We did not bother measuring the CPU consumption for that, as these results were more than enough.

Quite frankly, we supposed that PHP curl implementation that powers the core websocket communications of Elephant was more greedy than lower level RabbitMQ implementation (we use the pecl C implementation for Rabbit in production that even speeds things up a bit), but to that extent…

Last thing to check: the push server’s ability to consume events stored in a single queue as fast as they come in. For now and unless we deal rooms with more with 600 users, it appears that node/amqplib/socket unstack events from the queue like a charm. Once we’ll reach a limit, we’ll think about adding more push consumers to keep the pace going.

API mean response time < 100 ms after deploy, external calls (in green) reduced

Where we’re heading

This story showed us that living without Rabbit and using Elephant to keep the stack simple at the time we were 2–3 devs was acceptable and allowed us to scale our application and business in a simple way. As usage grews, the need to refactorize parts of our application as codebase gets old and stack improves are needed. But it was all possible thanks to the awesome work we did in the last months on Rabbit.

Speaking of it, we now also use it to power other parts of our application. We use Rabbit to speak with various APIs, to make things asynchronous (emails, exports, stats, document conversion, heavy jobs). For example in Solid we are able to give you stats and insights about how long you spend each year in meetings by asynchronously crunching Google or Office365 calendar APIs and push you the result once it is done.

Two years ago, we were fine living without Rabbit. Today, we couldn’t do without it.

If you’d like to see it in action, you can signup for Solid here!

This article is part of the publication Unexpected Token powered by eFounders — startup studio. All eFounders’ startups are built by extraordinary CTOs. If you feel like it could be you, apply here.


How we scaled Wisembly’s infrastructure : moving from our Elephant to RabbitMQ was originally published in Unexpected Token on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: eFounders