Modifying FFMPEG to Support Transparent GIFs

A sticker (left) is just a GIF (right) with transparent pixels. Here at GIPHY, we differentiate between GIFs and Stickers in our business language, as the two products are served to different searches and customers. However, we still use the GIF format to store stickers – all they really are is GIFs with transparent pixels. […]

A sticker (left) is just a GIF (right) with transparent pixels.

Here at GIPHY, we differentiate between GIFs and Stickers in our business language, as the two products are served to different searches and customers. However, we still use the GIF format to store stickers – all they really are is GIFs with transparent pixels.

The distinction exists in our engineering toolchain as well – some tools struggle to correctly support Stickers (usually due to transparency). One of those tools is FFMPEG, an extremely popular package for working with video in a wide variety of formats. At GIPHY, we use FFMPEG to process all uploaded content – including GIFs and Stickers.

FFMPEG is fast, high quality, and open source. Until recently, however, FFMPEG did not properly support writing GIFs that are animated and have transparent pixels. Although we have other tools at our disposal, such as Gifsicle, we want to use as many of the same tools as possible, especially ones that are powerful, popular and fast like FFMPEG. For this reason, we decided to tackle the issue ourselves.

Why FFMPEG Doesn’t Support Transparencies

To understand the issue that FFMPEG had writing transparent GIFs, you need to understand exactly how transparencies work in the GIF format, and how FFMPEG was handling it.

In a GIF, any pixel can take on any one of 256 colors defined in a palette. Optionally, one of those colors can be transparent. Non-transparencies are 100% opaque, meaning the GIF format does not support an “alpha” in the way that some formats, like PNG, do. If the GIF is a single-frame (i.e., a still image), how to handle the “transparent” color is pretty obvious: the tool rendering the GIF just needs to know to not draw that pixel. However, it turns out that if there are multiple frames, (i.e., an animated GIF), there are several possible ways to handle transparencies. These are called “disposal methods.”

The first disposal method is called “Restore to background color” in the GIF specification. We’ll call it RTB, for short. It does what you’d expect: when a new frame is drawn, the area of the image is set to be completely transparent and only non-transparent pixels get drawn on top of that.

The second disposal method is called “Do not dispose,” and it turns out that using the “Do not dispose” (or DND) method, GIF viewers can, instead, build a new frame using the last frame as the starting point. With this method, transparent pixels serve the purpose of preserving the color of the pixel from the previous frame. For example, in the following GIF, a lot of pixels don’t change color between frames, so we can use transparency for all the unchanged pixels.

Transparency can be used to optimize GIFs like this.

When using the DND method, only pixels that were transparent in the previous frame can be made transparent again in the current frame. DND could be used for Stickers if the pixel transparency doesn’t change from frame to frame, but that’s an unusual use-case for animation.

Most stickers have different transparent pixels from frame to frame. This sticker is a rare exception.

Although this method isn’t generally useful for stickers, it can be very useful in cases where GIF encoders want to use different sets of colors on different frames (which can be used to improve the quality), and for reducing file size because multiple transparencies in a row are easier to compress than a bunch of different colors.

The problem in FFMPEG was that it only supported the DND method, so single frame GIFs and GIFs without transparency both worked fine, but GIFs that had multiple frames and transparency did not. If you are thinking that clearly the solution was to add support for the RTB method, you’re right, and that’s just what we did.

(There are some other disposal methods. You can read the full spec if you are curious and this article has more non-technical info on disposal methods and how they work.)

How We Fixed It

The reduction in file size that comes from the DND method can be significant, so we want to be able to use it when possible, but also use RTB when needed. The solution we came up with was to write a function that analyzes each frame to determine which method to use, and then write the frame using the correct method.

The function for analyzing is fairly straightforward: after some setup, it simply loops through all the pixels in the buffer, and compares the color of each pixel to the transparent color. As soon as it finds a transparent pixel, it knows that the frame is translucent (i.e., partially transparent) and returns. If it is unable to find a transparent pixel, it knows the entire frame is opaque.

// returns true if any of the pixels are transparent static int is_image_translucent(AVCodecContext *avctx,
                                 const uint32_t *palette,
                                 const uint8_t *buf, const int linesize)
 {
     GIFContext *s = avctx->priv_data;
     int trans = s->transparent_index;
     int p;
     const int m = avctx->width * avctx->height ;

     if (trans < 0) {
         return 0;
     }

     for (p=0; p < m; ++p) {
         if (buf[p] == trans) {
             return 1;
         }
     }
     return 0;
 }

A slightly more elaborate check is possible (e.g., we could see if the pixel is transparent in the previous frame as well), but for the vast majority of cases this works well.

Once we added this function, we also needed to create a method for writing translucent frames, and rename the existing frame-writing function to indicate that it only works for opaque frames. This change is a lot of code, but it’s fairly straightforward. You can find the new method in the Github diff.

The only hard part that remained was getting FFMPEG to store the correct frame disposal method in the GIF file itself, so that the right method would be used when rendering the GIF. In principle, this is pretty simple, but the way FFMPEG divides up the work makes it a bit tricky. The raw frame data and frame metadata are written in two very different places, and communicating between those two places is not straightforward.

FFMPEG is structured this way to allow a clean separation between the container (which defines the overall structure of the file) and the codec (which defines how the actual image data is compressed). Since some containers support many codecs and many codecs can go in more than one container, this structure makes FFMPEG extremely flexible. However, in the case of GIFs, which don’t have as clear a distinction between the codec and the container it can get confusing, and the way FFMPEG splits the two is somewhat arbitrary.

To resolve the issue, we had two options. The first was to move all the relevant code into the file that handles the codec. There’s a good case to be made that disposal method and other frame header data is part of the codec and therefore should be moved. However, we decided to go for a more surgical approach that would have less impact on the code overall. We added a new type of “sidecar” data, which is a structure in the FFMPEG code that is specifically designed to allow the codec and container code to communicate. Since the sidecar data is logged, tested and compared, this option required changing existing tests to accommodate the new data. In retrospect, it’s still not completely clear which approach would have been best.

The complete changes we made are in Github.

Try This At Home

Now that we’ve made the changes needed to support writing GIF Stickers with FFMPEG, we thought it would be fun to demonstrate one way it might be useful, even if you don’t have any Stickers handy. So, we wrote a shell script that takes video input, analyzes it, and outputs a GIF Sticker. This script is purely a proof of concept — we only tested it on a few green-screened images — but hopefully it’s good enough to see how this might be useful. If you are looking for something more robust and production-ready, check out unscreen.com which has a service that does this using much more sophisticated analysis.

Above are some green-screen GIFs (left) and the stickers we made from them (right) using our simple script.

For these examples to work, you’ll need ImageMagick and a new version of FFMPEG installed on your machine. The script uses FFMPEG’s built-in ability to identify colors and replace them with transparency. Now that GIF transparency is supported in FFMPEG, the only hard part is figuring out what color (or, as we’ll see, range of colors) should be removed. The rule of thumb we came up with is as follows:

  1. Segment the corners of the first frame of the image. (Which corners we use is an argument passed to the script at the command line.)
  2. Find the average color of the corners.
  3. Get some additional statistics on the image to establish the full range of colors we need to remove.
  4. Replace the range of colors with transparency.

STEP 1: Segment the corners

To do all this in shell, we start by pulling out the first frame of the image using FFMPEG and save it as a PNG:

ff=${WORKDIR}/frame1.png
ffmpeg -i "$input" -vframes 1 -an -ss 0.0 $ff

The next step is to cut out the corners we want:

 convert $ff -crop ${CROPPERCENTAGE}%x+0+0 ${WORKDIR}/crop1.png

STEPS 2 AND 3: Find the Average Color and Get some additional Statistics

Analyzing the corners with ImageMagick to get the numbers we need is a bit of a trick, but once you combine all of the corner pieces into one using montage and setup the arguments correctly we can get all the data we need.

#now montage the corners into one:
montage -geometry ${width}x${height}+0+0 -tile 1x ${WORKDIR}/crop?.png ${WORKDIR}/montage.png 

#get stats for the montaged image
fmt="%[fx:int(255*mean.r)] %[fx:int(255*standard_deviation.r)]"
fmt="$fmt %[fx:int(255*mean.g)] %[fx:int(255*standard_deviation.g)]"
fmt="$fmt %[fx:int(255*mean.b)] %[fx:int(255*standard_deviation.b)]"
fmt="$fmt %[fx:int(255*mean)] %[fx:int(255*standard_deviation)]"
vals=(`convert ${WORKDIR}/montage.png -intensity average -format "${fmt}" info:-`)
for i in 0 1 2 3 ; do
	ave[$i]=$(( ave[i] + vals[i*2] ))
	dev[$i]=$(( dev[i] + vals[i*2+1] ))
done

STEP 4: Replace the range of colors with transparency

Now that we know what colors to remove, we can finally build our Sticker. This is not as straightforward as you might think, because outputting a GIF with FFMPEG is a two step process. In the first step, we need to analyze the colors in the image and find the 256 colors that can be used to best represent the GIF. This set of colors is called the “pallette.”

"[0:v]chromakey=$hexcolor:$similarity[a];[a]palettegen[b]" -map "[b]" $WORKDIR/palette.png || die "Can't make palette"

Then we use the palette to actually output the GIF.

ffmpeg -v error -i "${input}" -i $WORKDIR/palette.png -filter_complex "[0:v]chromakey=$hexcolor:$similarity[trans];[trans][1:v]paletteuse[out]" -map "[out]" -y "$output" || die "can't make final video"

The -filter_complex "[0:v]chromakey=$hexcolor:$similarity[trans];[trans][1:v]paletteuse[out]" part of the command to FFMPEG sets up two filters: the first is a chromakey filter that replaces colors close to $hexcolor (which is the average color from the above step) with transparency. The second filter, paletteuse, changes RGB color to indexed color using the color pallet we built in the previous step.

One thing we haven’t explained is where that $similarity value comes from or what it does. It’s a signal to FFMPEG about how close a color needs to be to the average color. We use the standard deviation we find from the corner images. We multiply that by a “spread” value given at the command line, which is really just a “fudge factor” for our hastily made script that doesn’t always work perfectly. Some more tweaking with the statistics output by ImageMagick might be able to eliminate this fudge factor and make the script easier to use, but for demonstration purposes, we thought this was a good place to start.

Here’s the complete script if you want to try it yourself.

What Remains

As we alluded to above, the script is not perfect — among other things, it still requires some fiddling to get the right value that removes all the background and none of the foreground, but we thought it was a good way to show off something cool that you can’t do easily with another command-line tool.

There’s another problem, though: if you look at the Sticker we made, you’ll notice some glitches. We’ll talk in a future blog post about what those are and how we fixed those as well.

— Bjorn Roche, Head of Engineering, Management

Source: GIPHY