Major social and cultural events mean “all hands on deck” here at GIPHY, as millions of people across the world use GIFs to express themselves as these events unfold. Award shows, championship games, and holidays (such as New Year’s Eve) require resources across the company — from our Editorial team “live-GIFing” video feeds in real-time […]
Major social and cultural events mean “all hands on deck” here at GIPHY, as millions of people across the world use GIFs to express themselves as these events unfold. Award shows, championship games, and holidays (such as New Year’s Eve) require resources across the company — from our Editorial team “live-GIFing” video feeds in real-time to our SRE team making sure everything is running smoothly during big traffic spikes.
These major events never fail to introduce new memes and cultural phenomena, like catchphrases, quotes, and video clips, that become popular GIPHY searches. Likewise, existing memes are often recontextualized to provide commentary on, or a reaction to, the new phenomena. In both cases, we have to identify these emerging search trends and understand the user intent driving the trend so we can return the most relevant content.
Given the anticipation and uncertainty surrounding the 2020 Presidential Election, we fully expected a multi-day event rife with intense and oscillating emotions and no shortage of new search trends. To prepare for the undoubtedly epic week, we overhauled our internal Trending Search tools and developed new cross-team workflows to create a search experience synchronized in real-time with the fast evolving zeitgeist.
Search query analysis is one of the primary methods we use to understand how GIPHY users express themselves during major events. Our most common approach is retroactively examining the search queries we received while the event took place to discover trends in the data, and then sculpt a narrative around these trends which explains what people were thinking and how they were feeling. Given the sheer volume of search requests we get in a single day, we are able to identify aggregate emotional states for “society-at-large,” and we’ve published articles demonstrating this.
Discerning trends in real-time, however, has historically been more of a challenge since it takes considerable time and effort to process and sift through such a large amount of data. We do track real-time trending searches, and we make these findings available to the public through our trending searches API endpoint, but these results are generated from a relatively small sample set of search data, and do not always provide actionable results for internal use.
The Signals Team, which includes myself, took to the task of transforming our internal Trending Searches tool to be more comprehensive, accurate, and helpful in understanding what content our users want. Our team extracts vital “signals” from GIPHY’s vast data stores to improve existing products or create new ones. We primarily use machine learning techniques to do so, things like unsupervised learning for recommendations and CNNs for computer vision. My colleague Taras Schevchenko was the tech lead for updating our Trending Searches tool, and did everything from developing the algorithms we used, to building an internal web interface.
Our first step in this process was to revisit what we consider to be a “trending” search, as we had to know exactly what we were looking for in the data before we could start any real work. We discussed this topic with other teams, like Editorial and Search, and came up with a consensus on what we consider to be a “trending” search. Qualitatively speaking, we’re interested in search queries demonstrating a clear relationship to happenings in pop culture, nascent memes, or major events involving sports, entertainment, or politics. Statistically speaking, we’re looking for queries whose recent volume counts reflect a significant change relative to the recent past.
To identify volume spikes like this, we treat search volume data for a given query as stationary time-series data generated by a stochastic process. Time series data is a sequence of numbers ordered successively over time, and data can be considered stationary if the numbers “look the same” at any given point in time, i.e. there are no predictable patterns or any sort of seasonality and the values are constant throughout the timeline. A stochastic process is a process that produces numbers with some inherent randomness.
So search volume for most queries is best understood as a series of random numbers fluctuating over time driven by a static source, which is user demand for that query. While the exact number of times people search for a particular query in a time-period isn’t predictable per se, as long as demand remains static you can estimate an expected value for volume using an autoregressive model.
If the current volume is significantly different than the expected average value outputted by the autoregressive model, usually in respect to some threshold, we can assume something has changed in the process generating the data, i.e. that user demand has dramatically increased (or decreased). When this happens, the data is no longer stationary, and this spike indicates the emergence of a trend.
As part of the update, we now look at an 8-day window of non-sampled search activity to examine our time-series data. As mentioned above, most of our query volume data is stationary. We do, however, have a number of queries whose volumes are non-stationary because they have repeatable trends, usually a daily or weekly periodicity, and we want our model to account for these patterns when calculating an expected mean for these queries. For example, “good morning” queries spike every morning, and searches for “TGIF” spike every Friday. Technically, auto-regressive models are not as reliable for non-stationary data analysis, but it’s well enough for our purposes as the model does help highlight these repeating trends and lets us know if the trend breaks out of its normal pattern.
We tested many variations of the model described above by backtesting it on search logs dating to the presidential debates and college football games. We examined the results of the various models until we found one that did a balanced job of identifying spikes in both popular, high-volume queries and unique, low-volume queries. With this, we were ready to take on the live election!
We began trend analysis Tuesday November 3rd, when in-person voting officially began. While we expected election-based searches to be prevalent amongst trending searches, we were floored by just how dominant and diverse these types of queries were. Searches related to individual candidates (even Kanye West) had massive volume increases as compared to the previous day, up to 400 times normal volume. Searches around voting in general, such as “i voted,” “did you vote” and “waiting in line” had increased percent changes in the thousands. There was a similar surge in queries expressing anxiety, like “hyperventilating,” “nail biting,” “stressed out,” “drinking wine” and “self-care.” Likewise, queries expressing anticipation were up tremendously, such as “hold onto your butts,” “fasten your seatbelts,” “let’s get ready to rumble,” “helm’s deep” and “raising head to look.” It was fascinating to see, and we felt like we had tapped directly into some sort of shared cultural consciousness.
Sentiments swirled as in-person voting wrapped-up and it became clear the election was going to last multiple days.. For the period of November 4th, 5th and 6th, we saw lots of queries commenting on the ballot counting process, some humorous (“Count Von Count,” “Zootopia sloth,” “slow motion,”) some reactionary (“stop the count,” “cheating,” “land doesn’t vote,”) and some emotional (“fat lady sings,” “don’t give me hope.”) Anxiety was still in full force (“what is happening?,” “stress eating,” “Canada watching US news,”) and the memorial quote “Looks like I picked the wrong week to stop sniffing glue,” citing a running gag from the classic comedy Airplane, picked up significant volume as people turned to their vices to cope.
As vote totals began reflecting a shift in the electoral college, GIPHY users turned to the Arrested Development meme “I’m afraid I just blue myself,” recontextualizing it as a funny commentary on many states changing from red to blue. Searches for individual states and cities trended heavily during this time as well. Georgia-related searches were abundant when that state turned blue (“John Lewis dancing,” “Stacey Abrams,” “Georgia on my mind,” “Outkast,” “the South got something to say,” “Designing Women,”) and Pennsylvania-related searches similarly spiked when that state turned as well (“flipadelphia,” “flip Always Sunny,” “It’s Always Sunny in Philadelphia,” “Philadelphia freedom,” “Philly Phanatic,” “Gritty”).
Late Saturday, November 7th, it became clear that Joe Biden was highly favored to win the election and our search trends reflected this news. Searches for the GIF of Vince Carter saying “its over, its OVER“skyrocketed, as did queries like “so relieved,” “sweet relief” and “it’s finally over,” as people expressed thankfulness for some closure. Searches for “Frodo it’s done,” referencing the final moments of the Lord of the Rings film trilogy, also shot upwards as people needed a gif that captures exhausted relief, as with release of some heavy burden.
Searches for scenes from the Star Wars movie franchise were also extremely popular to express a sense of triumph, “Return of the Jedi ending,” “Death Star explodes,” “Star Wars celebration,” “Ewok party,” and even “Admiral Ackbar” for the skeptics. More general celebratory searches, such as “dancing in the streets,”, “party in the USA,” “popping champagne,”, “we are the champions,”, and “brand new day” also had tremendous boosts.
Having real-time trending data unlocked a series of new cross-team workflows within GIPHY. Eugene Kong, from our Data Science team, tracked trending queries throughout the week, rolled them up into 12 categories, and compared the trends against general search volume. We found that GIPHY’s overall search volume doubled during election week, peaking on Saturday11/7 when the election was called for Biden. Over 50% of all the trending searches we identified during this time period were election related, and nearly 20% of those searches were related to Donald Trump, whether positive, negative, or neutral in sentiment. He plotted the number of trending searches per each of the 12 identified categories during the course of the week.
The resulting chart is fascinating, as it captures the week’s rollercoaster of emotions and interests and reinforces the narrative created with the example searches listed above. We can see the week began with lots of general searches regarding the election and voting in general, followed by a spike in searches negative emotions (like anxiety), and ended with lots of searches about individual states, remarks on the actual results, and finally a big spike in searches related to Trump himself.
Other teams used this data as well. Our Partnerships team was able to alert our media partners when their content was getting increased views due to being included in search results for trending queries. Our Editorial team used the endpoint to monitor trends as well. For example, they noticed that the query “So You’re Saying There’s a Chance” was gaining traction and, realizing this query was a slight variation of the quote “so you’re telling me there’s a chance” from 90s hit Dumb and Dumber, they ensured users could also find relevant content for the misquote. The team was able to perform similar actions like this for dozens of queries throughout the week, and the ability to tweak in-demand search results in real time can make huge improvements in our core metrics, like click-through rate (CTR), as users are more likely to click on a GIF when search results better match their expectations.
All in all, we’ve been very pleased with our updated Trending Searches tool and how performed during the elections. Our next step is to pass these insights on to our users and integrations through the GIPHY API. Our primary focus now will be on battle-hardening our infrastructure and improving the product based on internal feedback, with the goal of deploying the updated data stream to the Trending Searches API endpoint in early 2021.
— Nick Hasty, Director of Machine Learning, Signals Team @GIPHY