Photo via Unsplash

time-machineHistory6 min read

Every Time You Type a Sentence and Your Phone Finishes It, You're Running a 1948 Math Paper

Your phone's predictive text isn't magic. It's applied information theory, invented by Claude Shannon in 1948 to solve a fundamental problem: how to measure information.

ROOT•BYTE Team·March 4, 2026

Key Takeaways

•Shannon invented the 'bit' as the fundamental unit of information
•His entropy formula measures uncertainty, the core of predictive text
•Modern neural networks implement Shannon's 78-year-old mathematical framework
•Predictive text saves the average user 20+ hours of typing annually

Root Connection

Claude Shannon's 1948 paper 'A Mathematical Theory of Communication' didn't just invent information theory, it created the framework that makes predictive text possible.

Timeline

1948Claude Shannon publishes 'A Mathematical Theory of Communication' in Bell System Technical Journal

1950Shannon builds Theseus, a mouse that learns mazes, early machine learning

1960Information theory becomes foundation for digital communication

1990Early predictive text systems use Shannon's entropy concepts

2007iPhone introduces predictive keyboard, Shannon's theory in your pocket

2016Neural network-based predictive text (Gboard, SwiftKey), Shannon's math, modern implementation

2026Predictive text saves users 20+ hours of typing per year, all thanks to a 1948 math paper

When you type 'I'm going to the' and your phone suggests 'store,' you are running a 1948 math paper. Not a polished, recent, app-store-distributed math paper. A 77-year-old 79-page monograph, first published in the Bell System Technical Journal, by a 32-year-old mathematician named Claude Shannon. That paper is called 'A Mathematical Theory of Communication.' It is the most important single document in the history of digital technology, and almost nobody outside of electrical engineering has read it.

Predictive text, spell check, autocomplete, text-to-speech, machine translation, the compression that makes streaming video possible, the error correction that makes deep-space probes work, the entire modern practice of machine learning — all of it sits on top of Shannon's framework.

THE PAPER THAT INVENTED INFORMATION

“Information is the resolution of uncertainty. Every time your phone predicts your next word, it's reducing uncertainty, exactly as Shannon described.”

Claude Shannon joined Bell Labs during World War II, working on cryptographic systems for secure military communications. By the time he returned to unclassified research in 1946, he had been thinking about a deceptively simple question for years: what, mathematically, is information?

Before Shannon, information was a squishy concept. Engineers knew that telegraph, telephone, and radio channels had different capacities, but there was no precise way to compare them. There was no unit of information. There was no way to prove that a given message could or could not be transmitted through a given channel. Communication theory was, essentially, folk engineering.

“Shannon didn't just measure information. He gave us the tools to compress it, transmit it, and now, predict it.”

Shannon's 1948 paper changed that overnight. It defined information mathematically as the reduction of uncertainty. It introduced the 'bit' — binary digit — as the fundamental unit of information (the term was actually coined by Shannon's Bell Labs colleague John Tukey, but Shannon popularized it). It proved that any communication channel has a maximum capacity, measured in bits per second, and that messages can be transmitted reliably below that capacity regardless of how noisy the channel is, provided the right coding is used.

The paper introduced entropy as the quantitative measure of uncertainty. High entropy means unpredictability — like random coin flips, where the next outcome carries a full bit of information. Low entropy means predictability — like the letter after 'Q' in English, which is almost always 'U,' and therefore carries very little information. Entropy is measured in bits. English prose, Shannon calculated, has an entropy of roughly 1.0 to 1.5 bits per letter, even though the alphabet could in principle encode about 4.7 bits per letter (log2 of 26). The difference is the redundancy of natural language — the part that makes prediction possible.

WHY PREDICTIVE TEXT IS SHANNON APPLIED

The principle behind your phone's autocomplete is directly derived from Shannon. When you type a sequence of words, the system computes the probability distribution over possible next words, given everything you have typed so far. It then picks the word with the highest probability — or shows you the top three, as most modern keyboards do. That probability distribution is exactly what Shannon called the entropy of a source.

Early predictive text systems in the 1990s used simple Markov models: count how often each word follows each other word in a large corpus of text, and use those counts to estimate next-word probabilities. T9, the predictive text system that shipped on Nokia phones from 1999 onward, was a direct implementation of this idea. It let users enter words with fewer keypresses by exploiting the fact that, given a sequence of digits on a numeric keypad, there is usually only one likely word that matches.

Modern predictive text is the same idea at dramatically larger scale. Google's Gboard and Apple's keyboard use transformer-based neural networks — descendants of the same family of models that power ChatGPT — to predict the next word given a longer context. The model is much larger. The training data is much larger. The computations are different. But the framework, the question being answered — what is the probability distribution of the next word given the previous ones — is the same one Shannon posed in 1948.

SHANNON THE INVENTOR

Shannon was not a pure theorist. He built things. Theseus, a mechanical mouse he constructed in 1950, could navigate a maze, remember the path, and traverse it again faster on subsequent runs. It is frequently cited as one of the first machine learning systems. Shannon also built a computer that played chess (in a 1950 paper, he sketched what would become the core algorithm used by essentially every chess engine through the Deep Blue era), a machine that counted cards at blackjack (with MIT mathematician Edward Thorp), and a unicycle-juggling robot that he kept in his attic.

At Bell Labs, he rode a unicycle through the halls while juggling.

Shannon won the IEEE Medal of Honor in 1966, the Kyoto Prize in 1985, and the National Medal of Science in 1966. He died in 2001, after spending the last decade of his life with Alzheimer's disease — a disease that, tragically, eroded the memory of the man who had mathematically defined what memory and information were.

WHY IT MATTERS

Every act of digital compression, transmission, and prediction in the modern world runs on Shannon's framework. The ZIP file, the MP3, the H.265 codec, the WiFi protocol, the cellular radio in your pocket, the error correction code that keeps Voyager 1 in contact with Earth across 15 billion miles of space, the neural network that finished your last sentence — all of them are applications of information theory, with the fundamental concepts and units defined in a single Bell Labs paper in 1948.

Shannon's math is older than the transistor (1947's invention had not yet been commercialized when Shannon wrote the paper), older than the integrated circuit, older than the first programming language, older than the internet by a generation. It has outlasted every implementation built on top of it.

Your phone predicted your next word this morning. It used a 77-year-old equation to do it. The equation is still accurate. The math was right the first time.

(Sources: Claude E. Shannon, 'A Mathematical Theory of Communication,' Bell System Technical Journal, July & October 1948; Claude E. Shannon, 'Prediction and Entropy of Printed English,' Bell System Technical Journal, 1951; James Gleick, 'The Information: A History, A Theory, A Flood,' Pantheon, 2011; Jimmy Soni & Rob Goodman, 'A Mind at Play: How Claude Shannon Invented the Information Age,' Simon & Schuster, 2017; IEEE Information Theory Society historical archive)

predictive-textshannoninformation-theoryaihistory

Sharein 🦋𝕏

Studio

Enjoy This Article?

RootByte is 100% independent - no paywalls, no corporate sponsors. Your support helps fund education, therapy for special needs kids, and keeps the research going.

Support RootByte on Ko-fi

How did this make you feel?

Recommended Gear

View all →

Disclosure: Some links on this page may be affiliate links. If you make a purchase through these links, we may earn a small commission at no extra cost to you. We only recommend products we genuinely believe in.

Framework Laptop 16

The modular, repairable laptop that lets you upgrade every component. The right-to-repair movement in action.

From $1,399Recommended

Fan Favorite

Flipper Zero

Multi-tool for pentesters and hardware hackers. RFID, NFC, infrared, GPIO - all in your pocket.

$169Recommended

RootByte Essential

The Innovators by Walter Isaacson

The untold story of the people who created the computer, internet, and digital revolution. Essential tech history.

$14Recommended

reMarkable 2 Paper Tablet

E-ink tablet that feels like writing on real paper. No distractions, no notifications - just thinking.

$449Recommended

Keep Reading

History

The Government Lab That Accidentally Invented the Internet, GPS, Siri, and Self-Driving Cars

7 min read

History

NVIDIA Nearly Closed Its Doors After a Sega Deal Gone Wrong. Now It's Worth $5 Trillion.

7 min read

Ford assembly line factory manufacturing

History

AI Will Transform 85 Million Jobs and Create 97 Million New Ones. We Saw This Pattern in 1913.

7 min read

Want to dig deeper? Trace any technology back to its origins.

Start Research

Timeline

1948Claude Shannon publishes 'A Mathematical Theory of Communication' in Bell System Technical Journal

1950Shannon builds Theseus, a mouse that learns mazes, early machine learning

1960Information theory becomes foundation for digital communication

1990Early predictive text systems use Shannon's entropy concepts

2007iPhone introduces predictive keyboard, Shannon's theory in your pocket

2016Neural network-based predictive text (Gboard, SwiftKey), Shannon's math, modern implementation

2026Predictive text saves users 20+ hours of typing per year, all thanks to a 1948 math paper