Build an Uptime Monitoring System in Ruby with GCE, Cloud Storage, and PubSub

Uptime monitoring involves checking the availability of websites, APIs, and servers. The monitor probes a given endpoint within a specified interval to determine whether it is available. The goal is to achieve the contracted level of availability, as specified in the system's SLA, and determine the difference when the contract isn't met. In this article, […]

Uptime monitoring involves checking the availability of websites, APIs, and servers. The monitor probes a given endpoint within a specified interval to determine whether it is available. The goal is to achieve the contracted level of availability, as specified in the system's SLA, and determine the difference when the contract isn't met.

In this article, we'll build an uptime monitoring system based on Prometheus blackbox_exporter. While it might be trivial to build a custom HTTP monitoring system, building a wrapper around the exporter enables us to access many other probe techniques and quickly monitor other elements of our system.

This article covers the use of several technologies, and I'll describe each component before diving into the details of the uptime system.

What is Google Compute Engine (GCE)?

Compute Engine is Google's cloud computing service similar to AWS's EC2 compute offering. GCE is secure and customizable enough to fit various workloads ranging from small machines (supporting up to 32 vCPUs and 128 GB memory) to standard machines (supporting up to 224 vCPUs and 896 GB memory) and other high-end machines for intensive workloads. It utilizes computer on-demand to scale to your needs per-time.

GCE supports different deployment mechanisms for app deployment, including containers, instance templates, and managed instance groups. For the purpose of this article, we'll bundle our Ruby uptime monitor into a docker container for deployment.

What is Cloud Storage?

Google Cloud Storage is a highly available object-storage service similar to AWS's S3 service. Cloud Storage provides many storage features that enable several use-cases for modern apps. To get started with Cloud Storage in Ruby, we’ll use the google-cloud-storage gem to authenticate, as well as upload and download files from Cloud Storage:

require 'google/cloud/storage'

def upload_file bucket_name:, file_path:, file_name: nil
 storage = Google::Cloud::Storage.new
 bucket = storage.bucket bucket_name

 file = bucket.create_file file_path, file_name
end

def download_file bucket_name: file_path, file_name: nil
 storage = Google::Cloud::Storage.new
 bucket = storage.bucket bucket_name
 file = bucket.file file_name

 file.download file_path
end

Note: You need to set up GOOGLE_APPLICATION_CREDENTIALS in your environment to point to the right service account key. All Google client gems search for this environment variable for authorization; otherwise, you’ll need to pass auth specific parameters to Google::Cloud::Storage.new. If your app is running in a GCE VM, however, this is already set up in the environment.

What is Cloud PubSub?

Cloud PubSub is a publish/subscribe messaging service provided by Google Cloud. This form of communication is used to facilitate asynchronous service-to-service communication, similar to AWS's SNS. Building systems with asynchronous communication can help improve our system's performance, scalability, and reliability. To get started with Cloud PubSub in Ruby, we’ll use the google-cloud-pubsub gem to authenticate, publish, and listen-in on events:

require 'google/cloud/pubsub'

def publish_message topic_id:, message: nil
 pubsub = Google::Cloud::Pubsub.new
 topic = pubsub.topic topic_id

 topic.publish_async message do |result|
  raise "Failed to publish message" unless result.succeeded?
  puts "Message published asynchronously"
 end

 topic.async_publisher.stop.wait!
rescue StandardError => e
 puts "Received error while publishing: #{e.message}"
end

def receive_message subscription_id: nil, wait_time: 200.seconds
 pubsub = Google::Cloud::Pubsub.new

 subscription = pubsub.subscription subscription_id
 subscriber = subscription.listen do |received_message|
  puts "Received message: #{received_message.data}"
  received_message.acknowledge!
 end

 subscriber.start
 sleep wait_time
end

Note: The authentication described for Cloud Storage also applies here.

When leveraging Cloud Storage and PubSub, we can build very interesting solutions. Often, we want to upload an object and track updates – it's life-cycle – create, update, delete, and take specific actions based on certain events. If this still seems abstract, let's explore two use-cases:

  • Image Service: Building an Image Service. Let’s say that we want to create something similar to Cloudinary that provides image and video storage, as well as performs transformations on these data. While Cloud Storage can help store and version the data, with PubSub, we can listen for events from a bucket and perform certain types of pre-processing on the data, even before the customer requests a pre-processed version.
  • Distribute Configuration Files. A common problem in infrastructure engineering is rolling out configurations to several servers and providing easy rollbacks. Imagine that we want to have a central server responsible for server configurations, and we wanted to update the configuration once and distribute the config to a fleet of our servers. By using Cloud Storage and Cloud PubSub, we can build agents on our servers that listen through PubSub to get object notifications and take action based on these events. Furthermore, in the event that it was a bad change (wrong configuration changes are a common reason for downtime 😩 ), we can perform a rollback with object versioning.

In this article, we'll build a Ruby wrapper for Blackbox Exporter using the second use-case described above. The wrapper will run the exporter in one process and run another process to watch for configuration changes from a bucket in GCP, and then live reload the exporter. Are you ready? Let's have fun!

What is Blackbox Exporter?

Blackbox Exporter is an open-source tool built by the Prometheus team to probe endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. The exporter should be deployed alongside a Grafana and Prometheus deployment. The complete setup looks like the following:

System Architecture

The Blackbox wrapper probes all configured endpoints, and Prometheus scrapes the exporter like any other target. Then, Grafana retrieves data from Prometheus to be graphed. We run the exporter binary like blackbox_exporter --config.file blackbox.yml. Blackbox Exporter also allows us to live reload the exporter with a new configuration without shutting down the binary and restarting it. This can be very useful when scraping endpoints with intervals measured in seconds.

BlackboxWrapper Service Specs

Before deep-diving into the code, let's highlight the service specs:

  • The BlackboxWrapper service will run two processes.
    • The first process runs blackbox_exporter binary.
    • The second process listens for bucket changes from GCP and restarts the first process.
  • The service will be deployed as a docker image, which will enable us to package the service alongside the blackbox_exporter binary.

Blackbox wrapper service spec

Let's Start Building

First, create an app directory and then enter the directory.

mkdir blackbox-wrapper && cd blackbox-wrapper

Like our standard Ruby application, we'll use bundler to manage our wrapper's dependencies. Create a Gemfile:

source "https://rubygems.org"

git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }

ruby '2.7.2'

gem 'google-cloud-storage'
gem 'google-cloud-pubsub'
gem 'rake'
gem 'pry'

Then run bundle install.

Now we'll create a file to hold our code: app.rb.

This file will act as the entry point to our service. Since we will be deploying our app in a container, this file will be specified in the CMD command in our Dockerfile later on.

touch app.rb

Creating the Dockerfile

While some items have been omitted from this file on purpose. the code below highlights the critical components necessary for this article:

FROM ruby:2.7.2

RUN mkdir /app
WORKDIR /app
COPY . .

# Install other dependencies
...

# Download & Install blackbox exporter
RUN curl -SL 
    https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-386.tar.gz | 
    tar xvz -C /tmp && 
    mv /tmp/blackbox_exporter-0.18.0.linux-386/blackbox_exporter /usr/local/bin && 
    mkdir /etc/blackbox && 
    mv /tmp/blackbox_exporter-0.18.0.linux-386/blackbox.yml /etc/blackbox/

# Specify entry point.
CMD ["bundle", "exec", "ruby", "app.rb" ]

From the above, we should note the following:

  • We used a Ruby image – ruby:2.7.2 – as a base image with Ruby installed.
  • We installed blackbox_exporter binary and moved it to a directory accessible from our PATH.
  • We specified the entrypoint of the container to run app.rb on container start up.

Building The Wrapper Service

This is our Ruby service that glues everything together. In main.rb, place the following:

require 'rubygems'
require 'bundler/setup'
require "google/cloud/pubsub"
require "google/cloud/storage"

CONFIG_BUCKET = ENV['BUCKET_NAME']
TOPIC = ENV['PUBSUB_TOPIC']
TOPIC_SUBSCRIPTION = ENV['TOPIC_SUBSCRIPTION']

class ProcessNotification

  def initialize(file, attr, blackbox_exporter)
    @file = file
    @attr = attr
    @blackbox_exporter = blackbox_exporter
  end

  def call
    return if @attr['eventType'] == 'OBJECT_DELETE'

    @blackbox_exporter.write @file
    @blackbox_exporter.reload
  end
end

class BlackBoxExporter
  CONFIG_FILE = '/etc/blackbox/blackbox.yml'

  def initialize
    @blackbox_pid = nil
  end

  def start
    return unless @blackbox_pid.nil?

    @blackbox_pid = fork do
      exec('blackbox_exporter', '--config.file', CONFIG_FILE)
    end
  end

  def write(file)
    file.download CONFIG_FILE
  end

  def reload
    # Send SIGHUP signal
    Process.kill('HUP', @blackbox_pid)
  end

  def shutdown
    Process.kill('KILL', @blackbox_pid)
  end
end

class Subscriber
  class NotificationConfigError < StandardError
  end

  SUPPORTED_FILE_TYPES = ['blackbox.yml']

  def initialize(blackbox_exporter)
    @pubsub = Google::Cloud::Pubsub.new
    @storage = Google::Cloud::Storage.new
    @subscription_name = ENV['TOPIC_SUBSCRIPTION']  # Retrieve a subscription
    @bucket = @storage.bucket CONFIG_BUCKET
    @subscription = @pubsub.subscription @subscription_name
    @blackbox_exporter = blackbox_exporter
  end

  def listen
    create_notification_config

    puts "Starting subscriber"

    @subscriber = @subscription.listen do |received_message|
      process_notification(received_message)
    end

    @subscriber.on_error do |exception|
      process_exception(exception)
    end

    @subscriber.start
  end

  def process_notification(received_message)
    data = received_message.message.data
    published_at = received_message.message.published_at
    attributes = received_message.message.attributes

    puts "Data: #{data}, published at #{published_at}, Attr: #{attributes}"
    received_message.acknowledge!

    parsed_data = JSON.parse(data)
    file_name = parsed_data['name']
    return unless SUPPORTED_FILE_TYPES.include?(file_name)

    file = @bucket.file file_name
    process_notification = ProcessNotification.new(file, attributes, @blackbox_exporter)
    process_notification.call
  end

  def process_exception(exception)
    puts "Exception: #{exception.class} #{exception.message}"
  end

  def shutdown
    @subscriber.stop!(10)
  end

  def create_notification_config
    topic = @pubsub.topic TOPIC

    notification_exists = @bucket.notifications.count == 1
    unless notification_exists
      @bucket.notifications.each do |notification|
        notification.delete
      end
    end

    @bucket.create_notification topic.name

  rescue StandardError => e
    raise NotificationConfigError, e.message
  end
end

class BlackboxWrapper
  def initialize
    @blackbox_exporter = BlackBoxExporter.new
    @subscriber = Subscriber.new(@blackbox_exporter)
  end

  def start
    @blackbox_exporter.start
    @subscriber.listen

    at_exit do
      @blackbox_exporter.shutdown
      @subscriber.shutdown
    end

    # Block, letting processing threads continue in the background
    sleep
  end
end

blackbox_wrapper = BlackboxWrapper.new
blackbox_wrapper.start

While the above is a lot of coding, let's try to break it down starting from the bottom:

  • BlackboxWrapper: This class is the entrypoint to our service. – The .start method does the following:
    • Starts the blackbox_exporter binary in a different process to start probing endpoints.
    • Starts the subscriber in another process to listen for bucket changes.
    • It then calls sleep in the main process to ensure the app runs infinitely.
  • How does the BlackboxExporter work?
    • The .start method uses the exec kernel method to run the blackbox_exporter binary in another process.
    • The .reload method sends the SIGHUP signal to live reload the blackbox_exporter binary with the new configuration. As you may have noted from the ProcessNotification class, a new configuration file is written to the configuration file location before the exporter is reloaded.
  • How does the Subscriber work?
    • The .listen method starts with creating a NotificationConfiguation. A NotificationConfiguration is a rule that specifies three things:
      • A topic in pub/sub to receive notifications.
      • The event that triggers notifications to be sent. Click here to view the various event types that can trigger notifications.
      • The information contained within notifications.
    • The #create_notification_config method also ensures that there's just one NotificationConfiguration; otherwise, it will delete everything and create one. This ensures that notifications are sent just once.
    • The .listen method also calls @subscription.listen to start listening for notification changes in the bucket to which we're subscribed to. Note that this runs infinitely in another process, as explained.
    • The #process_notification method is called for every notification update sent. Note that we have SUPPORTED_FILE_TYPES, which we use to identify files in the bucket we care about and do nothing about the rest.
  • ProcessNotification: This is responsible for processing notifications, downloading the updated configuration, writing it to a file, and reloading the blackbox_exporter binary.

Running the Service Locally

To run the service locally and test it, run the following in the root of the app directory:

export BUCKET_NAME='{insert-bucket-name}'
export PUBSUB_TOPIC='{insert-pubsub-topic}'
export TOPIC_SUBSCRIPTION='{insert-subscription-name}'
export GOOGLE_APPLICATION_CREDENTIALS='{insert-path-to-service-key-json}'

bundle exec ruby app.rb

Deploying our Service to Google Compute Engine

Like many aspects of the cloud, there are many ways to achieve the same result, but modern software engineering encourages CI/CD processes for several good reasons. As such, we will focus on deploying our service from Github Actions using setup-gcloud

Let's set up our deployment file (.github/workflows/deploy.yml).

name: Build and Deploy to Google Compute Engine

on:
  push:
    branches:
    - main

env:
  PROJECT_ID: ${{ secrets.GCE_PROJECT }}
  GCE_INSTANCE: ${{ secrets.GCE_INSTANCE }}
  GCE_INSTANCE_ZONE: us-central1-a
  BUCKET_NAME: demo-configurations
  PUBSUB_TOPIC: demo-configurations-bucket-notifications
  TOPIC_SUBSCRIPTION: demo-bucket-changes-subscription

jobs:
  setup-build-publish-deploy:
    name: Setup, Build, Publish, and Deploy
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/[email protected]

    # Setup gcloud CLI
    - uses: google-github-actions/[email protected]
      with:
        version: '290.0.1'
        service_account_key: ${{ secrets.GCE_SA_KEY }}
        project_id: ${{ secrets.GCE_PROJECT }}

    # Configure Docker to use the gcloud command-line tool as a credential
    # helper for authentication
    - run: |-
        gcloud --quiet auth configure-docker

    # Build the Docker image
    - name: Build
      run: |-
        docker build --tag "gcr.io/$PROJECT_ID/$GCE_INSTANCE-image:$GITHUB_SHA" .

    # Push the Docker image to Google Container Registry
    - name: Publish
      run: |-
        docker push "gcr.io/$PROJECT_ID/$GCE_INSTANCE-image:$GITHUB_SHA"

    - name: Deploy
      run: |-
        gcloud compute instances update-container "$GCE_INSTANCE" 
          --zone "$GCE_INSTANCE_ZONE" 
          --container-image "gcr.io/$PROJECT_ID/$GCE_INSTANCE-image:$GITHUB_SHA" 
          --container-env "BUCKET_NAME=$BUCKET_NAME,PUBSUB_TOPIC=$PUBSUB_TOPIC,TOPIC_SUBSCRIPTION=$TOPIC_SUBSCRIPTION"

Note that the --container-env flag is set in the deploy phase, which ensures that we pass necessary environment variables from Github Actions secrets to the container in a secure fashion.

Secrets & Environment Variables

Next, we'll set up secrets for github actions.

Action Secrets

We set the environment variables for our container with the --container-env flag. Since we are setting it from Github actions, we can either use secrets for sensitive data or env variables for non-sensitive data.

Creating GCP Resources

Let's create a bucket in the GCP console.

Create Bucket

We'll also create a PubSub topic in the GCP console.

Create Topic

Set the service agent of the cloud storage bucket – the IAM role – pubsub.publisher in the console. Each project has an associated Cloud Storage service account responsible for some background actions, such as PubSub notifications. Click here to learn how to find it.

IAM

Finally, we create a subscription in the GCP Console.

Create Subscription

Voila! 🎉 Our cloud function has been deployed successfully.

Conclusion

If you’ve made it this far, you deserve a cookie 🍪 . I think this is the first version of a potentially great solution with multiple optimizations to be achieved. For example, we could achieve the following:

  • Deploy the blackbox_exporter as a serverless function to support multiple regions, which is ideal for uptime monitoring, and deploy a master server responsible for updating the bucket configuration in cloud Storage.
  • Potentially, from the previous point, we can abstract this into an app that integrates into popular cloud providers to achieve the same functionality, hence making it cloud-agnostic. P.S: Popular cloud providers (GCP, AWS, & Azure) provide the same functionalities across services.
  • In the next article, we'll build on this solution to provide rollbacks with cloud-storage object versioning, which will enable us to recover from updating the configuration with incorrect updates.

  • Deploying with Docker simply solves the packaging problem for us, but as you may already know, there are various ways to package services. I chose Docker in this article for the sake of simplicity.

Glossary

  • Prometheus is an open-source systems monitoring and alerting toolkit. It includes a server that scrapes and stores time-series data, client libraries for instrumenting application code, and an alert manager to handle alerts.
  • Grafana is a visualization system that allows you to query, visualize, alert on, and understand your metrics, regardless of where they are stored.
  • Blackbox Exporter is an open-source tool built by the Prometheus team to probe endpoints over HTTP, HTTPS, DNS, TCP, and ICMP.

Source: Honeybadger