Time Series · Music · R + Web Audio

What if a hit song
was a time series?

Apply the same ARIMA / GARCH stack quants use for asset returns to a non-financial domain: track popularity on Spotify. Built on 586,672 tracks across 1950–2021. Plus a Web Audio synth that lets you hear what the audio features actually sound like.

Tracks

586,672

Kaggle Spotify dataset

Artists

1,104,349

joined onto every track

Years covered

1950–2021

72 annual data points

Audio features

dance, energy, valence, tempo, …

The PaperTeam project · July 2025

Abstract

The ARIMA and GARCH machinery developed for asset returns is, at its core, a stack of stationarity tests, autoregressive structure identification, and conditional-variance modelling. Nothing about it is specifically financial. This paper applies the same stack to a very non-financial series: the daily mean popularity of every Spotify track in a 586,672-row public dataset, 1950 through 2021. We test stationarity, fit auto.arima on the level series, refit a per-genre ARIMA on the 14 most-populated genres, and stack a GARCH(1,1) / EGARCH(1,1) on the residual returns. The output is a forecast of popularity by genre and a risk/reward map of which genres are growing fast and which are quietly volatile.

Team: Mickias Ambaye · Henry Gamboa · Aryan Khokhar · Chen-Yu Liu · Bhuvan Sai Reddy Ashwatha Reddy

Part I · The dataset

1.What we are actually modelling

Each track in the dataset carries a popularity score $P_i \in \{0, 1, \dots, 100\}$ that Spotify recomputes daily as a function of recent play counts, recency-weighted. The score is non-stationary at the individual-track level (most tracks decay, a few go viral), but if we aggregate to the population level $\bar{P}_t = \frac{1}{|T_t|} \sum_{i \in T_t} P_i$ where $T_t$ is the set of tracks released in period $t$ , we get a smooth, low-dimensional series that summarises how the average track from each cohort scores on the platform's modern popularity algorithm.

Alongside $P_i$ , the dataset carries Spotify's 12 audio features per track: danceability, energy, valence (positivity), tempo, loudness, acousticness, instrumentalness, liveness, speechiness, key, mode, and time signature. We treat these as exogenous covariates in the EDA and as the input vector for the audio synth in this section.

Interactive · Audio features → procedural sound

Below: drag the danceability, energy, valence, and tempo sliders. The card on the right computes a quick representative loop using the Web Audio API. Major key for high valence, minor for low. Brighter filter for high energy. More notes per bar for high danceability. Press play.

Studio · Audio features

Hear what high-energy, low-valence really means

Feature controls

Danceability0.56

Energy0.54

Valence (musical positivity)0.55

Tempo119 bpm

Try a real track

Procedural preview

Mode

major key

Rhythm density

16th notes

Filter cutoff

2839 Hz

Tempo

119 bpm

Sliders change the loop live: the next bar reads the current values, so moving the danceability or valence dot reshapes the rhythm and key on the next beat. Same slider values always produce the same loop.

Dataset peek

Top tracks by popularity score

1002021

Peaches (feat. Daniel Caesar & Giveon)

dance0.68energy0.70valence0.46tempo90 bpm

992021

drivers license

dance0.58energy0.44valence0.13tempo144 bpm

982021

Astronaut In The Ocean

dance0.78energy0.69valence0.47tempo150 bpm

972020

Save Your Tears

dance0.68energy0.83valence0.64tempo118 bpm

972020

telepatía

dance0.65energy0.52valence0.55tempo84 bpm

962020

Blinding Lights

dance0.51energy0.73valence0.33tempo171 bpm

962021

Leave The Door Open

dance0.59energy0.62valence0.72tempo148 bpm

952020

The Business

dance0.80energy0.62valence0.23tempo120 bpm

942019

Streets

dance0.75energy0.46valence0.19tempo90 bpm

These are the top 9 by Spotify's 0–100 popularity score in the dataset snapshot (June 2021). Each card shows the four audio features the synth above maps to sound. Dataset mean across all 586,672 tracks for reference: dance 0.564, energy 0.542, valence 0.552, tempo 118.5 bpm.

Part II · From raw tracks to a time series

2.Aggregation

We collapse the 586,672 tracks into a yearly mean-popularity series $\bar P_t$ , plotted below. The line shows the platform's well-documented recency bias: the modern popularity algorithm scores recent releases far higher than 1970s catalogue. By 2019 the average track had a popularity score of $\bar P \approx 44.9$ , compared to $\bar P \approx 3.4$ in 1950. The 2021 dip is an artefact: the dataset was snapshotted mid-year, and 2021's tracks had less time to accumulate plays.

3.Stationarity

ARIMA needs the series (or some difference of it) to be stationary. We run an Augmented Dickey-Fuller test on the raw level series and on the first-difference $\Delta \bar P_t = \bar P_t - \bar P_{t-1}$ . The level series fails ADF (visible trend); the first difference passes at the 5% level, so we set $d = 1$ for the integration order. The auto.arima search confirms the same choice independently.

Interactive · Decomposition explorer

Below: the same series broken into trend + seasonal + residual via a classical additive decomposition. Toggle each component on or off to see how much of the variance each one carries.

Studio · Decomposition

Trend + Seasonal + Residual = popularity

Dashed white is the raw $\bar P_t$ . The bold green is the sum of whichever components are toggled on above. With all three components on, the fit should sit on top of the dashed line. Drop the trend and the fit collapses to the mean.

Part III · ARIMA

4.The model class

An ARIMA( $p, d, q$ ) model expresses the $d$ -differenced series as a linear combination of its own lagged values and lagged shocks:

(1 - \phi_1 L - \dots - \phi_p L^p)\,(1 - L)^d\, P_t = c + (1 + \theta_1 L + \dots + \theta_q L^q)\,\varepsilon_t, \quad (1)

where $L$ is the lag operator and $\varepsilon_t \sim \mathcal{N}(0, \sigma^2)$ is white noise. ACF and PACF plots of the differenced series suggest two candidate orders: a parsimonious (1, 1, 1) and a slightly richer (2, 1, 1). We let R's auto.arima do the AIC search and confirm the parsimonious model.

5.Forecast and uncertainty bands

The forecast point estimate is the conditional mean $\mathbb{E}[P_{T+h} \mid \mathcal{F}_T]$ , and the standard 80% / 95% bands widen as $\sqrt{h}$ : each additional period of forecast horizon adds independent noise variance, so the band scales like $\sigma \sqrt{h}$ . The studio below lets you slide $(p, d, q)$ manually and watch how the forecast shape changes.

Interactive · ARIMA playground

Below: drag p, d, q. Watch the in-sample fit and the forecast tail. Higher AR order (p) sticks closer to recent observations. Higher MA order (q) smooths past shocks. d shifts what you are forecasting — levels vs first differences.

Studio · ARIMA playground

Drag (p, d, q), watch the forecast tail

Order (p, d, q)

(1, 1, 1)

3 parameters

Forecast +10 yr

29.7

point estimate

95% band width

185.8

grows as √h

Residual σ

2.11

on Δ^1 P_t

Model order

p (AR order)1

d (differencing)1

q (MA order)1

Forecast horizon10 yr

Lookback window15 yr

Zoomed to last 15 observed years plus 10 forecast years. Past about 5–10 years the 95% band is so wide the forecast is essentially fiction — that is the honest message of the widening green cone, and why the lookback slider defaults to a tight window where the model is still believable. Set p = 1, d = 1, q = 1 to match what auto.arima picks on the real training set; push d to 2 and the forecast flattens because over-differencing kills the trend signal.

Part IV · GARCH for popularity volatility

6.Why a vol model

The ARIMA residuals show heteroskedastic clustering: large absolute residuals tend to be followed by more large absolute residuals. That violates the constant-variance assumption baked into the standard ARIMA error term, and it is the standard motivation for stacking a conditional-variance model on top. The GARCH(1,1) specification with student-t innovations is the workhorse:

r_t = \mu + \varepsilon_t,\quad \varepsilon_t = \sigma_t z_t,\quad z_t \sim t_\nu, \quad (2)

\sigma_t^2 = \omega + \alpha\,\varepsilon_{t-1}^2 + \beta\,\sigma_{t-1}^2. \quad (3)

We fit three competing specs: plain sGARCH(1,1) with t innovations, EGARCH(1,1) with normal innovations, and EGARCH(1,1) with t innovations. The EGARCH adds a leverage term that lets negative shocks generate more future variance than positive ones, which is exactly the pattern observed in popularity returns: tracks that suddenly lose popularity are followed by more volatile periods than tracks that gain it. The Student-t EGARCH wins on AIC.

7.Per-genre risk vs reward

Refitting ARIMA per genre, then computing the forecast mean and the return-series standard deviation, gives a clean two-dimensional signature for each genre: forecasted growth on the y-axis, volatility on the x-axis. The desirable quadrant is upper-left: high growth, low volatility. The studio below plots all 14 most-populated genres.

Interactive · Genre risk/reward

Below: each circle is a genre. X-axis is volatility (the SD of its return series), Y-axis is forecast growth, circle size is track count. Hover to highlight.

Studio · Genre risk vs reward

Where in the quadrant is each genre?

Pop and hip-hop lead growth with moderate volatility. Viral rap is the highest-vol entry on the board (one-month meme cycles). Classical and jazz cluster bottom-left: low growth, low vol — the catalogue quadrant. K-pop is the volatile outlier.

Part V · Conclusion

8.Findings

The same stack quants use for asset returns ports cleanly onto Spotify popularity. The level series is non-stationary by visual inspection and by ADF (p > 0.05); the first difference is stationary; auto.arima converges on ARIMA(1, 1, 1) as the parsimonious overall fit; per-genre models are dominated by simple (1, 0, 1) or (0, 1, 1) structures with a handful of outlier orders. Stacking a Student-t EGARCH(1,1) on the residual returns wins on AIC against both plain GARCH and normal-EGARCH, which says the popularity-return series has both fat tails and leverage — the same pair of features that motivate using these specs on equity returns.

9.What this is and what it is not

This is not a music-industry forecasting service. The dataset truncates at 2021 and the popularity score is a moving target redefined by Spotify's ranking engine. What it is, is a clean demonstration that ARIMA and GARCH are tools, not financial instruments: they apply to any unevenly-clustered, autocorrelated time series with a meaningful first difference. The audio synth in Part I extends the same point in the opposite direction — Spotify's 12 audio features are a feature vector you can do other things with, including procedural sound generation that has nothing to do with forecasting at all.

10.References

[1]Box, G. E. P. & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control. Holden-Day.
[2]Hyndman, R. J. & Khandakar, Y. (2008). Automatic Time Series Forecasting: The forecast Package for R. Journal of Statistical Software, 27(3).
[3]Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31, 307–327.
[4]Nelson, D. B. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica, 59(2), 347–370.
[5]Pandya, M. (2021). Spotify Tracks Dataset. Kaggle public release.

R code

Copy-pasteable pipeline

Six tabs: load and clean, build the time series, fit auto.arima on the aggregated series, refit per genre, stack GARCH / EGARCH on the residual returns, and produce the risk/reward scatter. All five team members are credited on the original notebooks.

# Spotify dataset: 586,672 tracks, 1,104,349 artists
# Public Kaggle release (Maharshi Pandya, "Spotify Tracks Dataset")
 
library(readr)
library(tidyverse)
library(lubridate)
library(stringr)
 
artists <- read_csv(300/85">"spotify/artists.csv")
tracks  <- read_csv(300/85">"spotify/tracks.csv")
 
# Parse types and clean the [list]-of-IDs columns
tracks$release_date <- as.Date(tracks$release_date)
tracks$duration <- tracks$duration_ms / 1000
tracks <- tracks %>%
  mutate(id_artists = gsub("\\[|\\]|&#39;300/85">", "", id_artists))
 
# Join artist genres and popularity onto each track row
tracks <- tracks %>%
  left_join(
    artists %>% select(id, name, genres, popularity, followers),
    by = c(300/85">"id_artists" = 300/85">"id")
  ) %>%
  rename(
    track_name        = name.x,
    track_popularity  = popularity.x,
    artist_name       = name.y,
    artist_genres     = genres,
    artist_popularity = popularity.y,
    artist_followers  = followers
  )
 
# Drop 71 untitled tracks, drop tracks with no release date
tracks <- tracks %>% filter(!is.na(release_date), !is.na(track_name))
 
# Resolve each artist's primary genre by mode count
primary_genres <- tracks %>%
  mutate(artist_genres = str_replace_all(artist_genres, "\\[|\\]|&#39;300/85">", "")) %>%
  separate_rows(artist_genres, sep = 300/85">",") %>%
  mutate(artist_genres = trimws(artist_genres)) %>%
  filter(artist_genres != 300/85">"", artist_genres != 300/85">"Unknown") %>%
  count(id_artists, artist_genres) %>%
  group_by(id_artists) %>%
  slice_max(order_by = n, n = 1) %>%
  select(id_artists, primary_genre = artist_genres)
 
tracks <- tracks %>%
  left_join(primary_genres, by = 300/85">"id_artists") %>%
  mutate(primary_genre = coalesce(primary_genre, 300/85">"Unknown"))

Mickias Ambaye + team · 2026

What if a hit songwas a time series?