Time Series · Music · R + Web Audio
What if a hit song
was a time series?
Apply the same ARIMA / GARCH stack quants use for asset returns to a non-financial domain: track popularity on Spotify. Built on 586,672 tracks across 1950–2021. Plus a Web Audio synth that lets you hear what the audio features actually sound like.
Tracks
586,672
Kaggle Spotify dataset
Artists
1,104,349
joined onto every track
Years covered
1950–2021
72 annual data points
Audio features
13
dance, energy, valence, tempo, …
Abstract
The ARIMA and GARCH machinery developed for asset returns is, at its core, a stack of stationarity tests, autoregressive structure identification, and conditional-variance modelling. Nothing about it is specifically financial. This paper applies the same stack to a very non-financial series: the daily mean popularity of every Spotify track in a 586,672-row public dataset, 1950 through 2021. We test stationarity, fit auto.arima on the level series, refit a per-genre ARIMA on the 14 most-populated genres, and stack a GARCH(1,1) / EGARCH(1,1) on the residual returns. The output is a forecast of popularity by genre and a risk/reward map of which genres are growing fast and which are quietly volatile.
Team: Mickias Ambaye · Henry Gamboa · Aryan Khokhar · Chen-Yu Liu · Bhuvan Sai Reddy Ashwatha Reddy
Part I · The dataset
1.What we are actually modelling
Each track in the dataset carries a popularity score that Spotify recomputes daily as a function of recent play counts, recency-weighted. The score is non-stationary at the individual-track level (most tracks decay, a few go viral), but if we aggregate to the population level where is the set of tracks released in period , we get a smooth, low-dimensional series that summarises how the average track from each cohort scores on the platform's modern popularity algorithm.
Alongside , the dataset carries Spotify's 12 audio features per track: danceability, energy, valence (positivity), tempo, loudness, acousticness, instrumentalness, liveness, speechiness, key, mode, and time signature. We treat these as exogenous covariates in the EDA and as the input vector for the audio synth in this section.
Interactive · Audio features → procedural sound
Below: drag the danceability, energy, valence, and tempo sliders. The card on the right computes a quick representative loop using the Web Audio API. Major key for high valence, minor for low. Brighter filter for high energy. More notes per bar for high danceability. Press play.
Studio · Audio features
Hear what high-energy, low-valence really means
Feature controls
Try a real track
Procedural preview
Sliders change the loop live: the next bar reads the current values, so moving the danceability or valence dot reshapes the rhythm and key on the next beat. Same slider values always produce the same loop.
Dataset peek
Top tracks by popularity score
Peaches (feat. Daniel Caesar & Giveon)
drivers license
Astronaut In The Ocean
Save Your Tears
telepatía
Blinding Lights
Leave The Door Open
The Business
Streets
These are the top 9 by Spotify's 0–100 popularity score in the dataset snapshot (June 2021). Each card shows the four audio features the synth above maps to sound. Dataset mean across all 586,672 tracks for reference: dance 0.564, energy 0.542, valence 0.552, tempo 118.5 bpm.
Part II · From raw tracks to a time series
2.Aggregation
We collapse the 586,672 tracks into a yearly mean-popularity series, plotted below. The line shows the platform's well-documented recency bias: the modern popularity algorithm scores recent releases far higher than 1970s catalogue. By 2019 the average track had a popularity score of , compared to in 1950. The 2021 dip is an artefact: the dataset was snapshotted mid-year, and 2021's tracks had less time to accumulate plays.
3.Stationarity
ARIMA needs the series (or some difference of it) to be stationary. We run an Augmented Dickey-Fuller test on the raw level series and on the first-difference . The level series fails ADF (visible trend); the first difference passes at the 5% level, so we set for the integration order. The auto.arima search confirms the same choice independently.
Interactive · Decomposition explorer
Below: the same series broken into trend + seasonal + residual via a classical additive decomposition. Toggle each component on or off to see how much of the variance each one carries.
Studio · Decomposition
Trend + Seasonal + Residual = popularity
Dashed white is the raw . The bold green is the sum of whichever components are toggled on above. With all three components on, the fit should sit on top of the dashed line. Drop the trend and the fit collapses to the mean.
Part III · ARIMA
4.The model class
An ARIMA() model expresses the -differenced series as a linear combination of its own lagged values and lagged shocks:
where is the lag operator and is white noise. ACF and PACF plots of the differenced series suggest two candidate orders: a parsimonious (1, 1, 1) and a slightly richer (2, 1, 1). We let R's auto.arima do the AIC search and confirm the parsimonious model.
5.Forecast and uncertainty bands
The forecast point estimate is the conditional mean , and the standard 80% / 95% bands widen as : each additional period of forecast horizon adds independent noise variance, so the band scales like . The studio below lets you slide manually and watch how the forecast shape changes.
Interactive · ARIMA playground
Below: drag p, d, q. Watch the in-sample fit and the forecast tail. Higher AR order (p) sticks closer to recent observations. Higher MA order (q) smooths past shocks. d shifts what you are forecasting — levels vs first differences.
Studio · ARIMA playground
Drag (p, d, q), watch the forecast tail
Order (p, d, q)
(1, 1, 1)
3 parameters
Forecast +10 yr
29.7
point estimate
95% band width
185.8
grows as √h
Residual σ
2.11
on Δ^1 P_t
Model order
Zoomed to last 15 observed years plus 10 forecast years. Past about 5–10 years the 95% band is so wide the forecast is essentially fiction — that is the honest message of the widening green cone, and why the lookback slider defaults to a tight window where the model is still believable. Set p = 1, d = 1, q = 1 to match what auto.arima picks on the real training set; push d to 2 and the forecast flattens because over-differencing kills the trend signal.
Part IV · GARCH for popularity volatility
6.Why a vol model
The ARIMA residuals show heteroskedastic clustering: large absolute residuals tend to be followed by more large absolute residuals. That violates the constant-variance assumption baked into the standard ARIMA error term, and it is the standard motivation for stacking a conditional-variance model on top. The GARCH(1,1) specification with student-t innovations is the workhorse:
We fit three competing specs: plain sGARCH(1,1) with t innovations, EGARCH(1,1) with normal innovations, and EGARCH(1,1) with t innovations. The EGARCH adds a leverage term that lets negative shocks generate more future variance than positive ones, which is exactly the pattern observed in popularity returns: tracks that suddenly lose popularity are followed by more volatile periods than tracks that gain it. The Student-t EGARCH wins on AIC.
7.Per-genre risk vs reward
Refitting ARIMA per genre, then computing the forecast mean and the return-series standard deviation, gives a clean two-dimensional signature for each genre: forecasted growth on the y-axis, volatility on the x-axis. The desirable quadrant is upper-left: high growth, low volatility. The studio below plots all 14 most-populated genres.
Interactive · Genre risk/reward
Below: each circle is a genre. X-axis is volatility (the SD of its return series), Y-axis is forecast growth, circle size is track count. Hover to highlight.
Studio · Genre risk vs reward
Where in the quadrant is each genre?
Pop and hip-hop lead growth with moderate volatility. Viral rap is the highest-vol entry on the board (one-month meme cycles). Classical and jazz cluster bottom-left: low growth, low vol — the catalogue quadrant. K-pop is the volatile outlier.
Part V · Conclusion
8.Findings
The same stack quants use for asset returns ports cleanly onto Spotify popularity. The level series is non-stationary by visual inspection and by ADF (p > 0.05); the first difference is stationary; auto.arima converges on ARIMA(1, 1, 1) as the parsimonious overall fit; per-genre models are dominated by simple (1, 0, 1) or (0, 1, 1) structures with a handful of outlier orders. Stacking a Student-t EGARCH(1,1) on the residual returns wins on AIC against both plain GARCH and normal-EGARCH, which says the popularity-return series has both fat tails and leverage — the same pair of features that motivate using these specs on equity returns.
9.What this is and what it is not
This is not a music-industry forecasting service. The dataset truncates at 2021 and the popularity score is a moving target redefined by Spotify's ranking engine. What it is, is a clean demonstration that ARIMA and GARCH are tools, not financial instruments: they apply to any unevenly-clustered, autocorrelated time series with a meaningful first difference. The audio synth in Part I extends the same point in the opposite direction — Spotify's 12 audio features are a feature vector you can do other things with, including procedural sound generation that has nothing to do with forecasting at all.
10.References
- [1]Box, G. E. P. & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control. Holden-Day.
- [2]Hyndman, R. J. & Khandakar, Y. (2008). Automatic Time Series Forecasting: The forecast Package for R. Journal of Statistical Software, 27(3).
- [3]Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31, 307–327.
- [4]Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica, 59(2), 347–370.
- [5]Pandya, M. (2021). Spotify Tracks Dataset. Kaggle public release.
R code
Copy-pasteable pipeline
Six tabs: load and clean, build the time series, fit auto.arima on the aggregated series, refit per genre, stack GARCH / EGARCH on the residual returns, and produce the risk/reward scatter. All five team members are credited on the original notebooks.
# Spotify dataset: 586,672 tracks, 1,104,349 artists# Public Kaggle release (Maharshi Pandya, "Spotify Tracks Dataset") library(readr)library(tidyverse)library(lubridate)library(stringr) artists <- read_csv(300/85">"spotify/artists.csv")tracks <- read_csv(300/85">"spotify/tracks.csv") # Parse types and clean the [list]-of-IDs columnstracks$release_date <- as.Date(tracks$release_date)tracks$duration <- tracks$duration_ms / 1000tracks <- tracks %>% mutate(id_artists = gsub("\\[|\\]|39;300/85">", "", id_artists)) # Join artist genres and popularity onto each track rowtracks <- tracks %>% left_join( artists %>% select(id, name, genres, popularity, followers), by = c(300/85">"id_artists" = 300/85">"id") ) %>% rename( track_name = name.x, track_popularity = popularity.x, artist_name = name.y, artist_genres = genres, artist_popularity = popularity.y, artist_followers = followers ) # Drop 71 untitled tracks, drop tracks with no release datetracks <- tracks %>% filter(!is.na(release_date), !is.na(track_name)) # Resolve each artist's primary genre by mode countprimary_genres <- tracks %>% mutate(artist_genres = str_replace_all(artist_genres, "\\[|\\]|39;300/85">", "")) %>% separate_rows(artist_genres, sep = 300/85">",") %>% mutate(artist_genres = trimws(artist_genres)) %>% filter(artist_genres != 300/85">"", artist_genres != 300/85">"Unknown") %>% count(id_artists, artist_genres) %>% group_by(id_artists) %>% slice_max(order_by = n, n = 1) %>% select(id_artists, primary_genre = artist_genres) tracks <- tracks %>% left_join(primary_genres, by = 300/85">"id_artists") %>% mutate(primary_genre = coalesce(primary_genre, 300/85">"Unknown")) Mickias Ambaye + team · 2026