Stranger Things Tracker

paused

Analyzing character screen time and dialogue across all seasons of Stranger Things

Tech Stack

PythonSQLiteBeautifulSoupmatplotlibpandas

TL;DR

Official scripts to SQLite database to animated visualization

S1E1 - The Vanishing

INT. WHEELER BASEMENT - NIGHT

[MIKE, DUSTIN, LUCAS, and WILL huddle around the game board]

MIKE

Something is coming. Something hungry for blood.

DUSTIN

A demogorgon?!

LUCAS

We're so screwed if it's the demogorgon.

MIKE

It's not the demogorgon.

[MIKE rolls the dice. They clatter across the board.]

DUSTIN

An eleven?! Oh my god, an eleven!

WILL

Does the demogorgon get us?

Official Duffer Brothers script book

stranger_things.db
sqlite> SELECT
  season,
  character_name,
  ROUND(
    100.0 * SUM(line_count) /
    SUM(SUM(line_count)) OVER (
      PARTITION BY season
    ), 1
  ) as pct
FROM dialogue_summary
WHERE character_name = 'MIKE'
GROUP BY season;

season | pct
-------|------
   1   | 13.2%
   2   |  7.3%
   3   |  7.1%
   4   |  5.1%

Mike's dialogue share:

S1
13%
S2
7%
S3
7%
S4
5%

Mike's declining dialogue share

Bar chart race: dialogue share over time

What We Built — proving a theory with data

Phase 1

The Hypothesis

Mike Wheeler runs the show in Season 1. He's the leader, the one making plans, the emotional center. But by Season 4? He feels like a background character — tagging along in California while everyone else gets the big moments.

I wanted to prove it with data. Not vibes, not Reddit threads — actual dialogue counts from the official scripts. So I bought the Duffer Brothers' script books and started parsing.

The verdict: Mike went from 13% of all dialogue in S1 to just 5% in S4. Dustin, meanwhile, quietly became the show's biggest talker

Phase 2

The Parsing Challenge

The script books are epub files — basically HTML in a zip. Should be easy to parse, right? Wrong. Each season uses a completely different HTML structure. Season 1 uses one format, Season 2 uses CSS classes, Season 3 changes everything again.

Then there's character names. "EL" vs "ELEVEN" vs "JANE". "CHIEF HOPPER" vs "HOPPER" vs "JIM". Multi-speaker dialogue boxes where three characters talk at once. The parser handles all of it with regex patterns and alias mapping.

Final count: 15,178 dialogue lines across 4 seasons, 517 unique characters, all normalized into SQLite

Key Findings

15,178

dialogue lines

517

characters

-62%

Mike's share (S1→S4)

#1

Dustin (most lines)

Tech Stack

PythonSQLiteBeautifulSoupebooklibmatplotlibpandas

Pure Python data pipeline. Parse epubs with BeautifulSoup, store in SQLite with proper normalization, animate with matplotlib. The bar chart race uses custom frame interpolation for smooth rank transitions.