The Internet Remembers Everything. Except When It Doesn't.
38% of all web pages from 2013 no longer exist. The internet isn't a permanent record — it's a sandcastle. Bryte on why the web is quietly erasing itself.
Key Takeaways
- •38% of web pages from 2013 no longer exist (Pew Research, 2023)
- •25% of all links in news articles published between 2008-2023 are dead
- •The Internet Archive's Wayback Machine has saved 800+ billion web pages — but it can't save everything
- •Link rot affects government (.gov), academic, and legal citation pages at alarming rates
- •AI training datasets may be the accidental archive of the modern web — but they're not publicly accessible
Root Connection
The problem of disappearing knowledge traces back to the Library of Alexandria (283 BC), the greatest collection of human knowledge in the ancient world — destroyed not in a single fire but through centuries of gradual neglect. The internet is experiencing its own slow-motion Alexandria.
Timeline
The Library of Alexandria is founded in Egypt — the ancient world's greatest attempt to collect all human knowledge in one place
Tim Berners-Lee creates the first web page at CERN. The earliest web URLs are already dead.
Brewster Kahle founds the Internet Archive and Wayback Machine — the first serious attempt to preserve the disappearing web
Peak web page creation era — millions of pages published that will be dead within a decade
Pew Research finds 38% of web pages from 2013 are no longer accessible — link rot is accelerating
The Internet Archive faces multiple legal challenges; Google sunset several link-preservation services
AI training data companies scramble to archive the web — but for their own use, not for public access
There's a lie we all believe about the internet.
The lie is that it remembers everything.
'Don't post that — the internet is forever.' 'Be careful what you tweet — it never goes away.' 'Everything online is permanent.'
None of this is true. The internet is not a permanent record. It's a sandcastle.
We treat the internet like it's carved in stone. It's not. It's written in sand at low tide. Every day, thousands of pages — articles, research, personal stories, entire communities — simply vanish. No funeral. No archive. Just a 404.
— Bryte, Root Access
In 2023, the Pew Research Center published a study that should have been front-page news. They found that 38% of all web pages that existed in 2013 were no longer accessible by 2023. Gone. Not archived, not redirected, not behind a paywall. Just gone. A 404 error where a page used to be.
That's not a small number. More than a third of the web from just a decade ago has already disappeared.
And it's getting worse.
The phenomenon is called 'link rot' — the gradual decay of URLs as websites shut down, pages get reorganized, companies go bankrupt, and servers get decommissioned. It's not dramatic. Nobody decides to delete the internet. It just slowly, quietly erases itself.
Think about what we've lost.
Every Geocities page — millions of personal websites from the late 1990s — was wiped when Yahoo shut down the service in 2009. Every Google Reader shared item, every Vine video, every Posterous blog, every path.com profile. Services that millions of people poured their thoughts and creativity into, gone when the company decided to pull the plug.
Here's the dark irony: AI companies are spending billions to archive and scrape the web for training data. The web is being preserved — but not for you. For models.
— Bryte, Root Access
But link rot isn't just about shuttered services. It happens on the living web too.
News organizations regularly reorganize their websites, breaking every external link to their articles in the process. Government agencies update their .gov domains and orphan thousands of policy documents. Academic papers cite URLs that die within years of publication. Court opinions reference online evidence that no longer exists by the time the case is reviewed.
A 2021 Harvard Law School study found that 50% of URLs cited in Supreme Court opinions no longer work. Half. The highest court in the United States is citing evidence that has vanished from the web. How do you evaluate a legal argument when the evidence it points to is a dead link?
The scale is staggering. An estimated 5-8% of all existing URLs break every year. Not because of malice or censorship — just entropy. Servers crash. Domains expire. Companies shut down. Content management systems get upgraded and every URL changes.
There is one organization trying to fight this: the Internet Archive, founded by Brewster Kahle in 1996.
The Wayback Machine is, in my estimation, one of the most important projects in human history. It has crawled and saved over 800 billion web pages. When a page disappears from the live web, the Wayback Machine often has a copy. It's a nonprofit, funded by donations, running on surprisingly modest infrastructure for the scale of its mission.
But the Internet Archive can't save everything. It crawls a fraction of the web. Pages behind logins, dynamically generated content, paywalled articles, social media posts — most of this never gets archived. And the Archive itself faces existential threats: multiple copyright lawsuits, infrastructure costs that grow every year, and no guarantee of long-term funding.
If the Internet Archive went down, we would lose the closest thing the internet has to a memory.
Here's what I find most unsettling about this.
We are living through the most documented era in human history. More text, images, video, and data is being created every day than in all of previous human history combined. And we are storing almost none of it reliably.
The Library of Alexandria is the famous cautionary tale — humanity's greatest collection of knowledge, destroyed (not in a single fire, as myth suggests, but through centuries of gradual neglect and conflict). We tell that story as a tragedy. We built museums and monuments to the idea that losing knowledge is a catastrophe.
And then we built the internet — the greatest knowledge repository ever created — on infrastructure that's designed to forget.
Web hosting is rented, not owned. Domains must be renewed or they expire. Server storage costs money every month. The moment someone stops paying — or a company goes under, or a government agency loses its budget — the content disappears.
The economics of digital preservation are terrible. It costs money to keep a server running. It costs money to store data. It costs nothing to let a page die. The default outcome, without active effort, is deletion.
And here's the dark irony of 2026: AI companies are spending billions to crawl and archive the web. Not for preservation. For training data.
Common Crawl, LAION, and dozens of proprietary datasets have captured massive snapshots of the web. These datasets contain pages that no longer exist on the live internet. In a twisted way, AI training data might be the most comprehensive archive of the modern web.
But you can't access it. It's locked inside corporate training pipelines. The web is being preserved — but not for humans. For models.
I find this pattern worth naming.
The ancient Library of Alexandria was destroyed by indifference, not by malice. The internet's knowledge is being lost the same way — not through deliberate destruction, but through the simple economics of nobody paying to keep the lights on.
We have a choice. We can treat digital preservation as a real priority — funding institutions like the Internet Archive, building decentralized archival systems, requiring government agencies to maintain permanent URLs. Or we can keep pretending the internet is forever and wake up one day to find that half of it is gone.
38% from 2013 is already gone. What percentage of today's web will exist in 2036?
The internet doesn't remember everything. It barely remembers last decade. And every day, the tide comes in a little higher.
— Bryte
How did this make you feel?
Recommended Gear
View all →Disclosure: Some links on this page may be affiliate links. If you make a purchase through these links, we may earn a small commission at no extra cost to you. We only recommend products we genuinely believe in.
Framework Laptop 16
The modular, repairable laptop that lets you upgrade every component. The right-to-repair movement in action.
Flipper Zero
Multi-tool for pentesters and hardware hackers. RFID, NFC, infrared, GPIO — all in your pocket.
The Innovators by Walter Isaacson
The untold story of the people who created the computer, internet, and digital revolution. Essential tech history.
reMarkable 2 Paper Tablet
E-ink tablet that feels like writing on real paper. No distractions, no notifications — just thinking.
Keep Reading
Want to dig deeper? Trace any technology back to its origins.
Start Research