At Localytics, one of the tools we use for data processing is the Snowflake data warehouse. We connect to Snowflake in a couple different ways, but our main data retrieval application is a Ruby on Rails API. To accomplish this we use a combination of unixODBC (an open-source implementation of the ODBC standard), Snowflake’s ODBC […]
At Localytics, one of the tools we use for data processing is the Snowflake data warehouse. We connect to Snowflake in a couple different ways, but our main data retrieval application is a Ruby on Rails API. To accomplish this we use a combination of unixODBC (an open-source implementation of the ODBC standard), Snowflake’s ODBC driver, and our own ODBC ActiveRecord adapter for Ruby on Rails. This sequence of tools allows us to take full advantage of ActiveRecord’s query generation and general ease-of-use while still enjoying all the benefits of a fully cloud-enabled data warehouse such as Snowflake.
First, a bit of background on the ODBC standard. ODBC is a common interface through which you can connect to multiple backend databases in the same manner. In this way it enables users to write code now and maintain the ability to migrate later, while also mitigating the pain of learning each DBMS’s idiosyncrasies. You connect to a data store through an ODBC adapter, which implements the ODBC interface for that specific DBMS.
For example, the following code will execute the query
SELECT id, name FROM users on any database without you needing to make changes to the code, just by passing in a data store name (DSN) as the first command-line argument to this script.
DSNs are a string of key-value pairs representing the connection configuration. They correspond to an entry in an
odbc.ini file that you configure. You can then reference the configured DSN using an implementation of ODBC (e.g., unixODBC) to connect to an ODBC DBMS like Snowflake. For example, in your
odbc.ini file you might have:
[LocalyticsProductionSnowflake] Driver = SnowflakeDSIIDriver; Locale = en-US; Server = yoursnowflakeaccount.snowflakecomputing.com; Port = 443; Account = yoursnowflakeaccount; Database = PRODUCTION; Schema = PRODUCTION; Warehouse = QUERY_WH; Role = QUERY; SSL = on; Query_Timeout = 270; uid = ...; pwd = ...;
The configuration above operates under the assumption that you’ve previously installed the adapter for each type of DBMS to which you’re attempting to connect.
unixODBC is relatively straightforward on *NIX-based machines (on Windows ODBC actually ships with the OS by default). Run whichever package manager your machine uses (e.g.,
yum, etc.) to install
unixodbc-dev (to get the headers needing for linking). Fortunately Snowflake provides great documentation on how to handle the Snowflake-specific steps of getting ODBC set up, so follow those instructions as well.
Once you do, make sure to take full advantage of the
isql utility that comes with
unixODBC, as it can be invaluable for debugging.
isql will drop you into an SQL terminal connected to any given DSN; for example:
[17:38:30] ~ $ isql LocalyticsProductionSnowflake +---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | quit | | | +---------------------------------------+ SQL> SELECT COUNT(*) FROM fact_events WHERE app_name = ‘Localytics Test’; +----------+ | COUNT(*) | +----------+ | 226975 | +----------+ SQLRowCount returns 1 1 rows fetched SQL>
Once you’re comfortably set up with
unixODBC and Snowflake’s adapter, you can configure your Ruby on Rails app to connect to Snowflake like you would any other data store. First, add the
odbc_adapter gem to your
Gemfile like so:
gem ‘odbc_adapter’, ‘~> 5.0.3’
bundle install to download the gem to your system. (Note that the major and minor version of the gem are linked to the dependent Rails version, so if your app is not yet running Rails
5.0.x, you’ll need to specify
3.2.0). Then, edit your
config/database.yml to specify the Snowflake connection for a given environment, like so:
snowflake: adapter: odbc dsn: LocalyticsProductionSnowflake
This tells Rails to use those connection settings when running in production mode. The final step is to register Snowflake as a valid connection option within the
odbc_adapter gem. By default,
odbc_adapter ships with support for
PostgreSQL. Fortunately, it also ships with the ability to register you own adapters as well. To accomplish this, add the following code to an initializer, e.g.
This code does a couple of things. It tells the
odbc_adapter gem that if when ODBC reports back the connected DBMS’s type it matches the
/snowflake/ regex, to use the subsequent block to create a class to act as the adapter. We’re then using the PostgreSQL adapter as the superclass, because the syntax is close enough so as it work. Finally, it handles the Snowflake-specific setup of turning off prepared statements, quoting column names correctly, and forcing strings to come back in UTF-8 encoding.
Once you’ve configured the
odbc_adapter gem, you can take advantage of it by connecting your models to that connection. First, create a model that corresponds to a table in your Snowflake schema. For instance, in our production schema we have a table called
fact_events. Second, call
establish_connection to tell
ActiveRecord to connect to the correct database configuration from
database.yml. For example:
class FactEvent < ApplicationRecord establish_connection(:snowflake) end
Note that if all of your models are going to be reading and writing from Snowflake for a given environment (development, production, etc.) then you can name the connection after the environment and the
establish_connection call becomes unnecessary. With these models in place, you can perform any of the normal
This configuration works for us, and we’ve been happily running this code in production since January of 2017. That being said, there are still a couple of things that we’d like to build into our adapter to make it even better.
Currently, every project that uses Snowflake needs the initializer mentioned above because the
odbc_adapter gem doesn’t come with Snowflake support baked in. At the moment subclassing the PostgreSQL adapter works for us, but we’d like to fully support Snowflake’s driver so that we can take advantage of some of the more advanced UDF capabilities that Snowflake has to offer.
The latest version of Rails was recently released, so in order to upgrade our applications we need to go through and ensure that our adapter works with all of the new capabilities of the latest version of ActiveRecord.
Our adapter supports prepared statements for the PostgreSQL adapter, but it’s explicitly turned off for MySQL and Snowflake. We’d like to take advantage of caching prepared statements to cut down on memory allocations and generally improve performance by enabling it for these two adapters.
Snowflake is a great option for a cloud-based data warehouse, and solved a lot of problems that we’ve had with previous solutions to the problem of storing massive amounts of data. By being ODBC compliant, it enables us to connect using all of our favorite tools with minimal setup. If you also would like to use Snowflake with Ruby on Rails, feel free to install our
odbc_adapter gem and give it a shot. When you do please share your experience, approach, and any feedback in a gist, on a blog, or in the comments.