TL;DR
Official scripts to SQLite database to animated visualization
INT. WHEELER BASEMENT - NIGHT
[MIKE, DUSTIN, LUCAS, and WILL huddle around the game board]
MIKE
Something is coming. Something hungry for blood.
DUSTIN
A demogorgon?!
LUCAS
We're so screwed if it's the demogorgon.
MIKE
It's not the demogorgon.
[MIKE rolls the dice. They clatter across the board.]
DUSTIN
An eleven?! Oh my god, an eleven!
WILL
Does the demogorgon get us?
Official Duffer Brothers script book
sqlite> SELECT
season,
character_name,
ROUND(
100.0 * SUM(line_count) /
SUM(SUM(line_count)) OVER (
PARTITION BY season
), 1
) as pct
FROM dialogue_summary
WHERE character_name = 'MIKE'
GROUP BY season;
season | pct
-------|------
1 | 13.2%
2 | 7.3%
3 | 7.1%
4 | 5.1%Mike's dialogue share:
Mike's declining dialogue share
Bar chart race: dialogue share over time
What We Built — proving a theory with data
The Hypothesis
Mike Wheeler runs the show in Season 1. He's the leader, the one making plans, the emotional center. But by Season 4? He feels like a background character — tagging along in California while everyone else gets the big moments.
I wanted to prove it with data. Not vibes, not Reddit threads — actual dialogue counts from the official scripts. So I bought the Duffer Brothers' script books and started parsing.
The verdict: Mike went from 13% of all dialogue in S1 to just 5% in S4. Dustin, meanwhile, quietly became the show's biggest talker
The Parsing Challenge
The script books are epub files — basically HTML in a zip. Should be easy to parse, right? Wrong. Each season uses a completely different HTML structure. Season 1 uses one format, Season 2 uses CSS classes, Season 3 changes everything again.
Then there's character names. "EL" vs "ELEVEN" vs "JANE". "CHIEF HOPPER" vs "HOPPER" vs "JIM". Multi-speaker dialogue boxes where three characters talk at once. The parser handles all of it with regex patterns and alias mapping.
Final count: 15,178 dialogue lines across 4 seasons, 517 unique characters, all normalized into SQLite
Key Findings
15,178
dialogue lines
517
characters
-62%
Mike's share (S1→S4)
#1
Dustin (most lines)
Tech Stack
Pure Python data pipeline. Parse epubs with BeautifulSoup, store in SQLite with proper normalization, animate with matplotlib. The bar chart race uses custom frame interpolation for smooth rank transitions.