PG Bouncer And Ruby on Rails

In this article I would like to share the newest addition to our technology stack “PG Bouncer” Background We at housing.com, have a service oriented architecture with most of our services backed by a RoR stack, deployed on a Unicorn web server, and an array of data stores like PostgreSQL(primary), Elasticsearch, Aerospike, Redis, Mongo etc. Why Unicorn? […]

In this article I would like to share the newest addition to our technology stack “PG Bouncer”

Background

We at housing.com, have a service oriented architecture with most of our services backed by a RoR stack, deployed on a Unicorn web server, and an array of data stores like PostgreSQL(primary), Elasticsearch, Aerospike, Redis, Mongo etc.

Why Unicorn?

The limiting factor of the Ruby concurrency model is the Global Interpreter Lock. This feature of Ruby prevents more than 1 thread of Ruby from executing at any given time per process. Unicorn is a Rack HTTP server that uses forked processes to handle multiple incoming requests concurrently.Unicorn forks multiple OS processes within each dyno to allow a Rails app to support multiple concurrent requests without requiring them to be thread-safe.

Connection To PostgreSQL

Exactly one persistent connection to the database is established per worker process (since every worker process serves one request at a time). This design comes with following flaws:

  1. On an average, the time spent by an application on database query is around 10–15% which means the persistent connections created are idle for the remaining 85–90% of the time. The idle connections do have a memory overhead and should be avoided.
  2. Any fluctuation or load on PostgreSQL freezes the application. But since our application depends on other datasources as well, ideally, it should continue to serve the requests that do not depend on PostgreSQL, thereby minimizing the impact on user experience.

Enters PG Bouncer

PgBouncer is an open-source, lightweight, single-binary connection pooler for PostgreSQL. It can pool connections to one or more databases (on possibly different servers) and serve clients over TCP and Unix domain sockets. PgBouncer maintains a pool of connections for each unique user, database pair.

We had 3 options for where to deploy this layer:

  1. PostgreSQL backend
  2. An intermediate layer
  3. Application servers

Adding PgBouncer on the database instance itself led to performance issues since load on database would impact the connection pool as well. Adding an intermediate layer would add an additional hop thereby adding latency and a maintenance overhead. So we went ahead with option 3 and since PgBouncer is a very lightweight process, it has minimal impact on the RoR application.

Installation & Configuration

Steps to install pgbouncer can be found here. The basic configuration file for PgBouncer is the pgbouncer.ini . A sample configuration file looks like this

pgbouncer.ini

There are two sections in the config:

  1. databases : This contains key=value pairs where the key will be taken as a database name and the value as a libpq connection string style list of key=value pairs. We have one entry for each database connection(connection pool)
  2. pgbouncer : This contains all the generic settings for pgbouncer

The detailed list for all the settings can be found here. However, lets dive into some of the key configurations.

auth_type : Pgbouncer supports multiple auth types for client authecntication. Some of them are plain, md5, trust, cert, hba. We used host based authentication. A sample hba.conf file would look like :

default_pool_size: How many server connections to allow per user/database pair. Can be overridden in the per-database configuration with pool_size property

query_wait_timeout: Maximum time queries are allowed to spend waiting for execution. If the query is not assigned to a server during that time, the client is disconnected. It also helps when the server is down or database rejects connections for any reason.

pool_mode: PgBouncer can be configured with three different types of pooling:

  • Session pooling: once the client gets one of the connections in the pool assigned it will keep it until it disconnects (or a timeout is reached).
  • Transaction pooling: once the client gets a connection from the pool, it keeps it to run a single transaction only. After that, the connection is returned to the pool. If the client wants to run other transactions it has to wait until it gets another connection assigned to it.
  • Statement pooling: in this mode, PgBouncer will return a connection to its pool as soon as the first query is processed, which means multi-statement transactions would break in this mode.

Transaction pooling suited our needs but it comes with its own issues like no support for prepared statements. The entire list of features supported for various pooling modes can be found here

The two properties pool_size and query_wait_timeout in conjunction resolve our two problems mentioned earlier. For one of our applications running 15 worker process on a unicorn server, we configured the pool size of 8 and a query_wait_timeout of 50ms. This has helped us reducing the number of idle connections to half and in rare cases of latent or unresponsive db, 7 of our worker process are free to process requests that interact with ES or cache and are not dependent on Postgres.

database.yml

Finally we make the above changes to the application’s database.yml and voila!!


PG Bouncer And Ruby on Rails was originally published in Engineering @ Housing/Proptiger/Makaan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Housing.com