Classifying Popularity Trends of Internet Memes with Machine Learning

When you find an amusing image on the internet, you may often feel inclined to send it to your friends. This social process shares something in common with the spread of disease — people “infecting” others within their network to spread the “contagion,” or in this case, meme. However, this phenomenon does not fit well into the classic susceptible-infectious-recovered (SIR) model, as there may be a second spike in the spread of an internet meme that the SIR framework cannot adequately represent.
“If you want to capture spikes and re-spikes, many memes would violate epidemiological assumptions,” Pengcheng Xiao of Kennesaw State University said. In a contributed presentation during the Third Joint SIAM/CAIMS Annual Meetings, which are being held this week in Montréal, Québec, Canada, Xiao presented work with student William Little on the classification of meme popularity trends via a multistep machine learning approach.
The original genesis of this project was back in 2018, when Xiao was mentoring two students on a final project to investigate the trajectories of meme spread. These students found four categories of temporal patterns: (i) smooth decay after an initial peak, (ii) spikey decay with occasional resurgence, (iii) leveling off to maintain a certain level of popularity, and (v) long-term growth.
“Can we get more data, and can we validate this?” Xiao asked. That question was the impetus for his recent work with Little. The researchers began by collecting more than 2,000 memes from the internet database Know Your Meme, obtaining the 400 most popular memes from each of the following five categories: (i) catchphrase; (ii) character; (iii) exploitable, i.e., with utility to be adapted for a variety of contexts; (iv) popular culture; and (v) viral video. For each individual meme, they downloaded time series data on usage from Google Trends, then preprocessed the data by smoothing, normalizing, and setting the analysis window to a timeframe of 43 months.

To interpret this preprocessed data, Xiao and Little developed a machine learning pipeline with two stages (see Figure 1). First, the framework uses k-means clustering in an unsupervised fashion to find clusters pertaining to different temporal patterns in the popularity data from Google Trends. “After that, we train our model—a support vector classification—with the data here,” Xiao said. The supervised support vector classification algorithm uses the clusters identified by k-means as labels to develop a predictive model for classifying a meme’s evolution in popularity.
As expected, the analysis did indeed show four clusters of temporal patterns, validating the previous 2018 work (see Figure 2). Furthermore, for each type of temporal pattern, it is possible to examine the distribution of different types of memes that follow that evolution (see Figure 3). These content-pattern relationships reveal the percentage of the memes in each temporal cluster that fall into the five content-based categories.

Given the wealth of information on the internet, there are several interesting ways in which this research could evolve in the future. “We’re trying to get some data from hashtags,” Xiao said. He would also like to obtain data with a higher temporal resolution, as Google Trends only provides analytics on a monthly scale. On the longer term, Xiao hopes to look at causal inference with physics-informed neural networks to investigate the mechanisms that underly the formation of these temporal patterns.
The spread of information and trends on the internet presents fascinating opportunities for research at the intersection of the social and computational sciences. Continuing work in this area will allow researchers to better understand and predict what goes viral and what will have staying power in the online cultural conversation.
About the Author
Jillian Kunze
Associate editor, SIAM News
Jillian Kunze is the associate editor of SIAM News.

Related Reading
Stay Up-to-Date with Email Alerts
Sign up for our monthly newsletter and emails about other topics of your choosing.