The Internet is rotting: Are we losing our digital memory?

Every day, thousands of web pages disappear without a trace. And with them, memories, knowledge, and fragments of our history vanish. When everything seems just a click away, it's paradoxical that the World Wide Web (WWW) we call the Internet—that immense digital archive of our global civilization—is silently evaporating.
Nearly 4,000 years ago, a merchant wrote a complaint about defective copper ingots on a clay tablet. That complaint has survived to this day. However, blogs, forums, and personal websites published just fifteen years ago have disappeared. How is it possible that a Bronze Age complaint is more enduring than a post from 2009?
The key lies in the fragility of the internet. Digital content, if not actively preserved, is by nature ephemeral.
Unlike physical media such as clay, papyrus, or paper, websites depend on servers that require maintenance, domains that need to be renewed, and formats that sooner or later become obsolete.
When a server disappears, a domain expires, redirects are mismanaged, or a website relies on obsolete technologies, the result is the same: content becomes inaccessible, and when it finally disappears, no one notices.
This phenomenon is called link rot , and it's ongoing. In an analysis of tweets I posted between 2007 and 2023, it was found that 13% of links were broken, and if the tweet was more than ten years old, the figure rose to 30%. In other words, nearly a third of the content linked to a decade ago has become inaccessible... if not completely gone.
The silent blackout In Blade Runner 2049 , a massive blackout caused by replicant activists erases all digital records. But it doesn't take such an extreme scenario for vast amounts of information to disappear in the blink of an eye. However, as in the film, these erasures are the result of conscious decisions, usually made by private companies. For example, the closure of platforms like Yahoo! Answers, Geocities, Tuenti, or the Meristation forums meant the loss of millions of texts, images, and conversations that documented part of our lives and our digital culture.
On the other hand, unlike previous administrations that implemented policies to preserve information available on government websites, the Donald Trump administration has systematically removed thousands of pages and official data from agencies such as the Centers for Disease Control and Prevention (CDC), the National Oceanic and Atmospheric Administration (NOAA), and the Environmental Protection Agency (EPA).
These deletions have primarily affected content related to public health, climate change, diversity, and social rights. They have led to a significant loss of public and scientific information and have generated alarm, particularly among the scientific community.
The paradox is evident: our civilization produces more content than ever, but it does so in volatile formats and, furthermore, it is losing it faster than we imagine.
All this is happening while more and more information (parliamentary minutes, official bulletins, scientific articles, and technical manuals, among others) is being published in digital format, often without a physical copy.
The paradox is evident: our civilization produces more content than ever, but it does so in volatile formats and, furthermore, it is losing it faster than we imagine.
Despite this situation, there are efforts to preserve our digital memory. The most well-known is the Internet Archive's Wayback Machine, which has archived billions of web pages since 1996. At the national level, institutions such as the National Library of Spain, or its equivalents in the United Kingdom and Australia, are also working to preserve part of our digital heritage.
What is being done? Similarly, in the face of mass and deliberate deletions like those carried out by the Trump administration, various organizations are collaborating to archive deleted information. These initiatives seek to ensure future access to public data, not only for research purposes but also to preserve the historical record.
Of course, it's not a simple task. Today's WWW is much more complex than it was in the 1990s: content is dynamic and interactive, no longer simple HTML documents. Furthermore, archiving social media or multimedia content not only represents an enormous technical challenge, compounded by the obstacles imposed by the platforms themselves, but also raises ethical and legal dilemmas related to user privacy and consent. In other words, not everything can or should be preserved.
Still, we can all contribute: tools like Save Page Now, the Wayback Machine, or Archive.today allow anyone to archive a copy of any web page simply by entering its URL.
Maybe in 4,000 years, no one will find our complaints about faulty ingots, but they will find our recipes, memes, and forum discussions, and with them, a glimpse of who we were.
Ultimately, saying that the WWW is rotting is like saying a forest is rotting: something always dies, but also something is born , since the network is constantly changing. The important thing is to know that we can capture fragments, preserve the essential, and build a more solid digital memory, less vulnerable to technological fluctuations or the decisions of a few companies or governments.
Maybe in 4,000 years, no one will find our complaints about faulty ingots, but they will find our recipes, memes, and forum discussions, and with them, a glimpse of who we were.
(*) Full professor in the Department of Computer Science, University of Oviedo.
(**) It is a non-profit organization that seeks to share ideas and academic knowledge with the public. This article is reproduced here under a Creative Commons license.
Four out of ten of the websites of 2013 no longer exist 
Photo: iStock
This January 1st, the internet as we know it turned 42 years old, and in these more than four decades, users have generated a huge amount of information on the web: in 2023 alone, there were 120 zettabytes (ZB) of data, and this year the figure is expected to increase by 150%, reaching 181 ZB, according to information compiled by Statista. To put that figure into perspective, one ZB is equivalent to one billion terabytes (TB), and the largest SDUC-type memory cards on the market today only reach 128 TB.
However, what is uploaded online doesn't always last. You've probably clicked on a link more than once and encountered a "404 Not Found" message, which doesn't direct you to what you were looking for. A 2024 report by the Pew Research Center revealed that some digital content is lost over time, even on sites considered trustworthy, such as government portals, news outlets, social media, and Wikipedia.
“The Internet is an unimaginably vast repository of modern life, with hundreds of billions of indexed web pages. But while users around the world turn to the web to access books, images, news, and other resources, this content sometimes disappears,” the document states.
The study analyzed a sample of nearly one million web pages saved between 2013 and 2023 through Common Crawl, an archive service that periodically compiles snapshots of the internet as it existed at different points in time. The findings indicated that 25% of all pages analyzed were no longer accessible by October 2023. Broken down, that figure includes 16% of pages that were down but originated from a primary domain that was still active, and 9% of websites that were inaccessible because their root domain stopped serving.
The analysis also found that the older the page, the more likely it was to have disappeared: Of the samples collected in 2013, 38% were no longer accessible by 2023 ; but even of pages collected in 2021, roughly one in five were no longer usable two years later.
Digital decay doesn't just affect personal pages or low-traffic sites. Pew Research Center included 500,000 local, state, and federal government web pages in the United States using Common Crawl's March/April 2023 snapshot, and found that, by October 2023, 21% of those pages contained at least one broken link, and 16% of links within web pages redirected to URLs other than the one they originally pointed to.
For news outlets, the sample also included 500,000 pages from Common Crawl's March/April 2023 snapshot. The pages came from 2,063 websites classified as "News/Information" by audience metrics firm comScore, and it was found that at the time of the study, in October 2023, 23% of the pages had broken links.
Even Wikipedia, one of the most visited sites in the world, has this problem: out of a sample of 50,000 of its English-language pages, 54% had at least one link in their “References” section that redirected to a page that no longer existed.
eltiempo