Improving Performance with Flame Graphs

Recently we had reason to investigate the performance of one of our most commonly used endpoints. We find New Relic to be a great tool for identifying and exploring performance problems, but what about when performance isn’t exactly a problem per se, but we’d like to optimize it all the same? Often the key to […]

Recently we had reason to investigate the performance of one of our most commonly used endpoints. We find New Relic to be a great tool for identifying and exploring performance problems, but what about when performance isn’t exactly a problem per se, but we’d like to optimize it all the same?

Often the key to understanding can be in effective visualization of the problem, so in this short post we’ll explore how to do just that using Flame Graphs.

Generating the Graph

We won’t go through this in detail, because it’s very well covered in this good blog post from 2016. We found the following development environment setup process effective:

# Gemfile
gem "ruby-prof-flamegraph"

# development.rb
config.cache_classes = true

# Controller action
profile = RubyProf.profile do
  # Action code here
end

File.open("ruby-prof-profile", "w+") do |file|
  RubyProf::FlameGraphPrinter.new(profile).print(file)
end

Fire the action locally and then convert the output to a graph using FlameGraph:

cat ruby-prof-profile | ./flamegraph.pl --countname=ms --width=1600 > flame.svg

Note: Stackprof is a popular alternative to ruby-prof.

Reading the Graph

Here’s part of the graph for the endpoint in question:

Flame Graph
Flame Graph

Time is represented horizontally, and the call stack is represented from bottom to top. This block of code is iterating over an array of cars and serializing a photo URL for each one, as well as some other attributes.

The parts of interest are highlighted in blue. Oddly enough, they were all calls to a simple #config method which was necessary to generate each of our photo URLs. This gets called hundreds of times by this endpoint, but it’s just a hash so that shouldn’t be a problem, right? Here’s the code:

def self.config
  SHARED_CONFIG.deep_merge(ENV_SPECIFIC_CONFIG)
end

Unfortunately, it is a problem because it’s doing a #deep_merge of our increasingly large environment config into our system-wide config every time it’s called.

Once we’ve spotted it, the memoization fix is simple:

def self.config
  @_config ||= SHARED_CONFIG.deep_merge(ENV_SPECIFIC_CONFIG)
end

Note: Take care when memoizing class methods in this way in a threaded environment: some other thread-safe store might be more appropriate for you.

Caveats

  • In development, application performance isn’t necessarily going to behave the same as it will in production. For example, we noticed local performance issues with Time#utc.
  • Bear in mind the impact that adding a call-stack profiler to your production environment might have. Tools like rbspy might be a better way to go.
  • Absolute timing isn’t going to be as useful as the relative time spent in methods, since the profiler itself has an impact.

Source: Drivy