Expression Evaluators

Bugs are fun but this time I want to talk about win. Recently I finished work on a recursive descent expression evaluator in the Bandcamp code. (That’s a mouthful and I enjoy slinging that whenever I can—it’s up there with binary space partitions.) An expression evaluator is something that can take a sequence of characters […]

Bugs are fun but this time I want to talk about win. Recently I finished work on a recursive descent expression evaluator in the Bandcamp code. (That’s a mouthful and I enjoy slinging that whenever I can—it’s up there with binary space partitions.) An expression evaluator is something that can take a sequence of characters like

2 + 2

and figure out that it means to add two and two and give up four. Expression evaluators are obscure but oddly enough I’ve written at least five in my professional life.

The 1st brush was when I was 25, new at Adobe and began to seriously dig into the C programming language. In the appendix of K&R I found the grammar for the entire language in a formal notation called “modified BNF”. It described, line by line, every element of the language beginning with the highest thing, the mysteriously named translation-unit, and breaking it down from there:

  translation-unit external-declaration


  declaration-specifiersopt declarator declaration-listopt compound-statement

In this grammar a thing is defined in terms of simpler things. And these simpler things are subsequently defined in terms of even simpler things. And so on, downward, until you get to the atoms of a language, basic elements like integer-constants and character-constants. This is a conventional device in Computing.

Can we take a moment to worship at the altar of K&R? In 189 thin pages and eight quick chapters you have everything there is to know about the C programming language. The tutorial that’s chapter 1 is a model of concise, practical writing. Please for the love of God everyone who wakes up and thinks I’ll write a computer book! internalize and become one with this chapter. I mean the entire book.

So I wrote a program that understood C based on this grammar. It didn’t evaluate the language, but it could parse it. Meaning it was code that understood the rules for the arrangement of the keywords and symbols in the language. Programming is all about reference and so of course self-reference pops up from time to time—I had written a program in C that understood C.

You think C’s an obscure language that should be put to pasture? You owe everything you’re doing on a computer right now to C. Your operating system is written in C. Your warm and fuzzy language that’s not C is written in C (and C is written in C—ha!). Photoshop is written in C (okay C++ but we regret ever touching the ++). Apache is written in C. The syntax of your favorite language is heavily influenced by C. The world runs on C. #thatsawesome #fistpump

The 2nd and 3rd times I wrote an expression evaluator were when I wrote some plug-ins for Photoshop a few years later. Photoshop has an external programming interface that programmers can use to write little (and big) plug-ins that live in the app and do algorithmic things to images. When Photoshop invokes a plug-in the code is given the pixels of the image, the code messes with those pixels and voilá an effect. A butt simple plug-in could zero the green and blue values of every pixel, effectively putting the image through a red filter. Or it could halve every value making the image darker. It could peek at nearby pixels; that’s how blurring is done. Once you realize pixels are just numbers you can go crazy with the effects.

Fellow Adobeite Joe Ternasky had the idea of making a meta plug-in—a plug-in for making plug-ins—where you could enter expressions into boxes:

R = [    ]
G = [    ]
B = [    ]

Then when the plug-in runs it applies those expressions to each pixel in the image.

R = r
G = 0
B = 0

would be the red filter.

R = r/2
G = g/2
B = b/2

would be the darkener. All the normal math operators are available, as are several functions to give the pixel’s original values, coordinates, its distance and angle from the center, etc., and the values of a few user-defined sliders. If you were enough of a math-head you could write your own plug-ins to do the canonical resize, rotate, blur, and then go crazy with whatever other effects you imagined. The plug-in was called Filter Factory.

It had a fan club.

Filter Factory needed an expression evaluator so Joe and I wrote one. Then for fun we rewrote it as a set of expression compilers that turned the expressions into raw machine language. Joe did the 68K compiler and I did the PowerPC. The filters consequently ran just as fast as the ones built into Photoshop. It was cool. This is when I began to understand recursive descent expression evaluators.

Fast forward to Bandcamp. The 4th expression evaluator I wrote was last summer for our HTML templates. A template might have lines like this:

<a href="{{band.stats_url}}">stats</a>

When the page is served the {{curly brace}} things are filled in by the Liquid template engine we use. Liquid can do conditionals—the stats link on a band’s page for example is shown only if you’re logged in:

{% if user.logged_in %}
<a href="{{band.stats_url}}">stats</a>
{% endif %}

The user never sees the curly braces or the conditionals. The HTML is put together on the server and the custom page with the {{blanks}} filled in and the conditionals figured out is sent down to you. Liquid is too limited for some of the really awesome stuff we do in Bandcamp, but because it’s Ruby and open source we’ve been able to hack on it. I added a full-on expression evaluator. Now we can have smarter conditionals and counters.

{% if user.logged_in && user.admin_level >= 10 %}
{% let items_shown = items_shown + 1 %}

This expression evaluator was a breeze to write in Ruby. It’s only a page of code and doesn’t cheat by calling eval().

The 5th expression evaluator. This week I’ve been improving the system we use to distribute files around among the servers that together make Bandcamp. All of the files on Bandcamp (images, audio tracks, etc.) are kept in a big virtual file system called the zstore. It looks like one really big hard drive that’s made from the hard drives of all the machines. The files are spread across all the servers, and no single server has a complete set of all the files. There is redundancy so that if one machine dies we’ve got copies of the files someplace else.

For example, the transcoding servers hold most of the lossless audio files so that they’re immediately available for encoding into whatever format is needed. The downloader machines keep the audio files around for quick-start downloading. The webapp machines covet the public files (cover art mostly, in various sizes and formats). But any file can be anywhere. Our zstore code can act like every file is always available on every machine, and if a file isn’t where it’s needed the zstore copies it first behind the scenes. I wrote the zstore as one of the original integral pieces of Bandcamp, way back in ancient history (2008). I’ve studied distributed file systems and implemented a few before. I was amped to write one for Bandcamp.

Why did I write one instead of using an existing DFS? That’s worth its own post. The bottom line is that distributed file systems are still thumb-sucking toddlers, Computer Science speaking. We’re still figuring out how they should work: dealing with connectivity events, ensuring redundancy, providing the most-recent copy of a file when it’s needed, archiving, versioning, etc. These are hard. By focusing on Bandcamp’s specific needs I was able to write a DFS that does just what we need, and it does it well.

In addition to copying files between machines on demand, there’s code that proactively copies files around to anticipate needs and reduce delays. For example when a user uploads cover art for an album it first goes into the zstore on the webapp machine that handled the upload. The other webapp machines then see it and grab copies for themselves.

I’ll take one.

Me too.

I’m in.

Each machine needs to know what files it should proactively copy. I was reworking this code last week and decided an expression evaluator was the bee’s knees for this. I extended the Liquid evaluator with strings and built-in variables and extended comparisons. Now each zstore machine has an expression like this:

morbo:    tag == 'lossless' && (id % 100) >=  0 && (id % 100) <= 47
linda:    tag == 'lossless' && (id % 100) >= 48 && (id % 100) <= 96
calculon: tag == 'lossless' && (id % 100) >= 97 && (id % 100) <= 99
devil:    public == true
santa:    public == true

Whenever a file is added or modified in the zstore each of the machines looks at its expression and decides if it should make a local copy. What the rules above mean is that morbo, linda and calculon all want the lossless audio files, and each claims some percent of them (because no machine is large enough to hold all of them). The terms tag and id are built-in variables that resolve to attributes of the file. Devil and santa want all the files marked public because they’re webapp machines.

(The part of the expression (id % 100) >= 0 && (id % 100) <= 47 essentially partitions the entire set of files into 100 slots based on the file ids. A file id is a large random integer assigned when a file is created; it’s never changed or re-used. The % operator means modulo. I recently learned about a persistent partitioning scheme that uses a much higher number of slots and then distributes groups of slots to particular machines. This has the advantage of creating less havoc when a machine disappears. TODO.)

Doesn’t that look a bit like Filter Factory?

The code for this latest incarnation of Joe’s Recursive Descent Expression Evaluator is less than 200 lines of Ruby. You can take a look at it here:

bc-expression at google code

Start about half way down at parse_comparison_op. One of the interesting things about a top-down grammar is that it’s easy to implement that way, too. See if you can follow the code from there to the bottom of the source. (By the way, does anyone want to turn this into a gem? TXTMEOK) I find it fascinating when the same Computing concepts—the expression evaluator in this case—come up over and over again in wildly different contexts. I love this about programming.

You may be asking what’s recursive about all of this? Think about parentheses in an expression. If the topmost item in the grammar is an expression, and you descended the grammar through all the valid syntax, at the very bottom you’d come to a term. A term is the simplest unit that has a value. It’s either a constant (42), a variable (id, tag, public), or… well, take a look:


  comparison || comparison
  comparison && comparison

  additive == additive
  additive != additive
  additive < additive
  additive > additive

  multiplicative + multiplicative
  multiplicative - multiplicative

  term * term
  term / term
  term % term

  ( expression )  # Hey!

Source: Bandcamp