We Create More Data in a Day Than Ancient Civilizations Made in a Millennium. We're Losing It Faster Too.
Paper lasts centuries. Microfilm lasts decades. A hard drive lasts five years. We are creating the most documented civilization in history and simultaneously ensuring that most of it will vanish within a generation.
Key Takeaways
- •The BBC Domesday Project (1986) — a digital survey of Britain involving 1 million contributors — became unreadable within 15 years because the custom LaserDisc format and hardware were discontinued
- •A Harvard study found that 25% of all URLs cited in academic papers are broken ('link rot'), rising to 50% for links older than 10 years
- •The Internet Archive's Wayback Machine stores over 800 billion web pages as of 2025, making it the largest preservation effort in history — and it is a nonprofit running on donations
- •NASA lost the original high-resolution Apollo 11 moon landing tapes because they were recorded on a format (1-inch Type C) that no functioning playback machines exist for
- •Vint Cerf estimates that without active intervention, the 21st century could be the 'black hole' of recorded history — more created, less preserved, than any prior era
Root Connection
The Dead Sea Scrolls survived 2,000 years in a desert cave. The BBC's Domesday Project, a digital archive of British life from 1986, became unreadable within 15 years because the format was obsolete. We are building a civilization on formats with the lifespan of a mayfly.
Timeline
1086William the Conqueror commissions the Domesday Book, a survey of England written on vellum. It is perfectly readable 940 years later
1956IBM ships the first hard disk drive, the RAMAC 305. Storage media begins its shift from permanent (paper, film) to volatile (magnetic, optical, flash)
1986The BBC creates the Domesday Project, a digital multimedia survey of Britain on custom LaserDisc. By 2002, the discs are unreadable — the hardware to play them no longer exists
1996Brewster Kahle founds the Internet Archive (archive.org) — the first serious attempt to preserve the web as it changes
2003The Library of Congress launches the National Digital Information Infrastructure and Preservation Program, acknowledging that digital formats are disappearing faster than physical ones
2015Vint Cerf, 'Father of the Internet,' warns of a 'Digital Dark Age' at the AAAS conference — our era may be the least documented in history
2021The Internet Archive's Wayback Machine contains over 600 billion web pages. Studies show that 25% of all links on the web are already broken
2025AI companies scrape the web for training data, accelerating the disappearance of original content behind paywalls and platform closures
In 1986, the BBC undertook the most ambitious digital archiving project in British history.
To mark the 900th anniversary of the Domesday Book — William the Conqueror's 1086 survey of England — the BBC created a modern equivalent: a multimedia survey of contemporary Britain. Over 1 million people contributed. Schools, community groups, and individuals across the country submitted photographs, essays, maps, and data about their local areas. The BBC compiled it all onto two specially formatted LaserDiscs, designed to be read by a custom Philips player connected to a BBC Master computer.
The BBC Domesday Project was celebrated as a triumph of digital preservation. It was the most comprehensive snapshot of a nation ever assembled. It won awards. It was distributed to schools and libraries.
Fifteen years later, it was unreadable.
“We can read a 940-year-old book written on animal skin. We cannot read a 15-year-old LaserDisc. Progress is not always forward.”
The custom LaserDisc format had been discontinued. The Philips players were no longer manufactured. The BBC Master computer was obsolete. The data was all there — physically intact on the disc surfaces — but there was no way to access it. A digital archive created in 1986 had become inaccessible by 2002.
The original Domesday Book, written on vellum with iron gall ink in 1086, is still perfectly readable. It sits in the National Archives in Kew, London. Scholars consult it regularly. Nine hundred and forty years old, and you can read every word.
This is the central paradox of the Digital Dark Age: we are creating more information than any civilization in history, and we are simultaneously ensuring that most of it will not survive.
ROOT — THE PERMANENCE WE LOST
For most of human history, information storage was durable by default.
Clay tablets from ancient Sumer, inscribed around 3100 BCE, are readable today. The medium (clay) and the method (pressing a stylus into wet clay, then letting it dry or firing it) produce an artifact that is essentially permanent under normal conditions. Thousands of these tablets have survived 5,000 years.
“The average web page lasts 100 days. The average clay tablet lasts 5,000 years. We have accidentally made our record-keeping worse.”
Papyrus was less durable than clay but still remarkably long-lived. The Dead Sea Scrolls, written on parchment and papyrus between 250 BCE and 68 CE, survived in desert caves for over 2,000 years. The environmental conditions helped, but the fundamental point stands: information written on organic material can last millennia if stored reasonably.
Paper, invented in China around 100 CE and adopted widely in Europe by the 13th century, lasts 500 to 1,000 years under proper archival conditions. Acid-free paper lasts longer. Books printed in the 15th century — the Gutenberg era — are still readable today.
Microfilm, introduced in the 1930s as an archival medium, has an expected lifespan of 500 years if stored at controlled temperature and humidity. It is still the preservation format of choice for many archives and libraries. The Library of Congress maintains millions of microfilm reels.
Then came digital.
A magnetic hard drive has an expected lifespan of 3 to 5 years. Solid-state drives last 5 to 10 years. Optical discs (CDs, DVDs) were marketed as lasting "100 years" but real-world studies show significant degradation after 10 to 25 years, depending on manufacturing quality. Flash memory in USB drives degrades as electrons leak from the floating gates — a process that accelerates in warm environments.
But hardware failure is not the primary threat. Format obsolescence is.
THE FORMAT GRAVEYARD
A file is not just data. It is data structured according to a specific format, readable by specific software, running on specific hardware. If any layer of that stack disappears, the file becomes inaccessible even if the underlying bits are perfectly intact.
The graveyard of dead formats is enormous. Lotus 1-2-3 spreadsheets (.wks). WordPerfect documents (.wpd). RealMedia video (.rm). Flash animations (.swf). QuickTime VR panoramas (.mov with QTVR atoms). HyperCard stacks (.hc). Amiga IFF images. Atari ST music files. Commodore 64 disk images in non-standard sector formats.
Each of these formats was widely used during its era. Each is now difficult or impossible to open without specialized tools. The data is not damaged. It is imprisoned — locked inside a format that the modern software ecosystem no longer speaks.
This is not limited to obscure formats. Microsoft has changed its Office document format multiple times. A Word document from 1997 (.doc) is technically a different format than one from 2010 (.docx), which is different from one saved by Word 2024. Modern Word can still open old .doc files, but only because Microsoft invests significant engineering effort in backward compatibility. If Microsoft were to stop maintaining that compatibility — or if Microsoft were to cease to exist — billions of .doc files would become progressively harder to read.
DID YOU KNOW?
NASA nearly lost the original high-resolution recordings of the Apollo 11 moon landing. The live broadcast that the world watched on July 20, 1969, was a low-resolution scan of a monitor playing the high-resolution original. The originals were recorded on 1-inch Type C videotape. By the time NASA went looking for them in the early 2000s, the tapes had been found — but there were almost no functioning Type C playback machines left in the world. NASA ultimately found a retired engineer who maintained one of the last working machines, and the tapes were digitized. If they had waited another decade, the original footage may have been lost forever.
LINK ROT: THE DISAPPEARING WEB
The problem is not just about old files. It is about the living web.
A 2021 study by Harvard Law School found that 25% of all URLs cited in the scholarly legal literature are broken. The pages they pointed to no longer exist. For links older than 10 years, the broken rate rises to nearly 50%.
The Pew Research Center conducted a similar study of URLs cited in government and news websites and found comparable rates of decay. A web page that exists today has roughly a 50% chance of being accessible in 10 years.
This is called "link rot," and it is not a bug. It is the fundamental architecture of the web. The web was designed for the present tense. URLs are pointers to current resources, not archives. When a website is redesigned, when a company changes its CMS, when a server is decommissioned, when a startup goes bankrupt — the URLs die. There is no built-in mechanism for the web to remember its own past.
The Internet Archive, founded by Brewster Kahle in 1996, is the most significant attempt to fix this. The Wayback Machine crawls the web and saves snapshots of pages. As of 2025, it has archived over 800 billion web pages. It is, functionally, the memory of the internet.
It is also a nonprofit that runs on donations and has been repeatedly targeted by lawsuits, DDoS attacks, and data breaches. The memory of the internet depends on the financial health of a single organization in San Francisco.
THE SCRAPING ACCELERANT
The rise of AI has added a new dimension to the problem. Large language models are trained on web data. Companies like OpenAI, Google, and Anthropic have scraped massive portions of the web for training datasets. This has two effects on preservation.
First, it incentivizes content creators to put their work behind paywalls and login walls, removing it from the publicly accessible web. Content that was freely available in 2020 is increasingly locked behind subscriptions in 2026. The open web is shrinking.
Second, AI-generated content is flooding the web with material that has no original author, no editorial oversight, and no archival value. The proportion of the web that is worth preserving is being diluted by content that was generated to game search algorithms.
WHY IT MATTERS
Vint Cerf — one of the inventors of TCP/IP, often called a "Father of the Internet" — gave a speech at the American Association for the Advancement of Science in 2015 warning of a "Digital Dark Age." His argument was simple: future historians may know more about the early 20th century than the early 21st century, because the 20th century's records are on paper and film (which last), while the 21st century's records are digital (which do not).
We take 1.4 trillion photographs per year. We send 300 billion emails per day. We publish millions of web pages per week. And almost none of it is being preserved in a format that will be readable in 100 years.
The irony is sharp. The most documented civilization in history may leave the thinnest historical record.
FUTURE — WHERE THIS GOES (SPECULATIVE)
The preservation community is working on solutions, but none are complete. The Internet Archive continues to expand. The Library of Congress has a dedicated digital preservation program. The LOCKSS project (Lots of Copies Keep Stuff Safe) distributes digital archives across multiple institutions. The Arctic World Archive, located in a decommissioned coal mine in Svalbard, Norway, stores important data on special film reels designed to last 1,000 years.
But these are islands of preservation in an ocean of loss. The vast majority of digital content — personal photos, social media posts, emails, text messages, app data, game saves, voice messages — is stored on devices and platforms with no preservation plan.
Your grandchildren may be able to read a letter your great-great-grandmother wrote in 1890. They may not be able to read the texts you sent last Tuesday.
(Sources: BBC Domesday Project Rescue, CAMiLEON Project, University of Leeds; Vint Cerf, "Digital Vellum," AAAS Address, 2015; Harvard Law School, "Perma.cc Link Rot Study," 2021; Pew Research Center, "When Online Content Disappears," 2024; Library of Congress NDIIPP; Internet Archive Annual Reports; NASA Apollo Television Restoration Project; Jeff Rothenberg, "Ensuring the Longevity of Digital Documents," Scientific American, 1995)
Enjoy This Article?
RootByte is 100% independent - no paywalls, no corporate sponsors. Your support helps fund education, therapy for special needs kids, and keeps the research going.
Support RootByte on Ko-fiHow did this make you feel?
Recommended Gear
View all →Disclosure: Some links on this page may be affiliate links. If you make a purchase through these links, we may earn a small commission at no extra cost to you. We only recommend products we genuinely believe in.
Framework Laptop 16
The modular, repairable laptop that lets you upgrade every component. The right-to-repair movement in action.
Flipper Zero
Multi-tool for pentesters and hardware hackers. RFID, NFC, infrared, GPIO - all in your pocket.
The Innovators by Walter Isaacson
The untold story of the people who created the computer, internet, and digital revolution. Essential tech history.
reMarkable 2 Paper Tablet
E-ink tablet that feels like writing on real paper. No distractions, no notifications - just thinking.
Keep Reading
Want to dig deeper? Trace any technology back to its origins.
Start Research