5 Tools to Help You Search the Archived Internet

The archived internet deserves more recognition. Online security has been a hot button topic in the tech community recently, with data scandals and privacy policy updates constantly driving the conversation. But, keeping the internet a stable and reliable network isn’t all about data security – it’s also about data preservation.

Anything that’s low tech is dismissed as “from the stone age,” but stone is by far the most stable way to record information. Not only will the hard drives and networked routers of today never last a thousand years, but plenty of information online won’t even last the decade. As local newspapers or long-in-the-tooth startups go under, they all leave dead links scattered across the internet, constantly replaced with fresh links that will themselves eventually die.

Wow, sorry, didn’t mean to get too dark there. My point is, memories that you might want to keep are increasingly likely to exist only on the internet — rambling G-Chat conversations with your best friend, say, or your first WordPress blog. If you want to preserve, protect, or search through your online footprint, read on to learn which five online tools can best help you comb through the archived internet.

What is the Deep Web? – Learn more about the hidden parts of the internet with our explainer guide

Archive.is

What It Does

This is the quickest and easiest way to grab a free, high-quality record of an existing webpage.

“This can be useful if you want to take a ‘snapshot’ a page which could change soon: price list, job offer, real estate listing, or drunk blog post,” the site explains.

You can search through the Archive.is site for previously archived webpages, if you’re interested in tracking a specific Twitter account or tech company. There’s even a draggable bookmarklet that you can add to your bookmarks bar to archive future webpages with a single click.

How You Can Use It

Go to the Archive.is site, paste the URL of your webpage into the bar at the top, and click on the “save the page” button.

Given the social fallout that can come from a single bad tweet, this site can be a useful way to grab a verifiable, photoshop-proof evidence of a tweet or post that will likely be deleted soon. The saved webpage that results won’t have any active elements or scripts (no popups or paywalls, in other words), but should look more or less the same, even down to the same clickable hyperlinks that the original page boasted.

Lumen

What It Does

Data loss on the internet isn’t always due to the natural process of link rot, as servers or domains become permanently unavailable. One major cause is due to legal demands for content removal. While the content removed due to takedowns can’t itself be archived, the legal complaints themselves can be.

Lumen is an online database of takedown notifications. It’s a project from the Berkman Klein Center for Internet & Society at Harvard University, designed to collect digital content removal requests.

“Our goals are to educate the public, to facilitate research about the different kinds of complaints and requests for removal–both legitimate and questionable–that are being sent to Internet publishers and service providers, and to provide as much transparency as possible about the “ecology” of such notices, in terms of who is sending them and why, and to what effect,” the website explains.

How You Can Use It

Type any search term into the site and you’ll likely pull up thousands of results. Use the advanced search functions, and you’ll be able to narrow down the DMCA requests by topic, sender, recipient, tags, country, language, action taken, and date.

The search results page includes an easily scanned list of takedown requests, including details such as who submitted them, on behalf of whom, and who to (the latter is almost always Google). You might want to use this database if you’re interested in why a seemingly innocent post in your search results or a favorite YouTube video has suddenly disappeared due to a content claim. You’ll get all the information you need to follow up on the takedown with the company who submitted the request in the first place.

Lumen has a feature that allows you to construct a DMCA counter notice, if you’re the one who has been hit with a takedown that you want to contest. You can also report your own takedown notification though the contact information available on the site.

PeekYou

What It Does

Often, the data you’re looking for can’t be located with a normal internet search engine. The “deep web” is one reason: Plenty of archives, from the Social Security Administration’s baby name database to this trove of 19th century British book reviews, live in online portals that can’t be crawled. Another issue is how spread out the data often is: Local papers, social media accounts, and blog platforms might all hold bits of information, while never revealing the big picture.

People-search engines are designed to comb through these isolated databases, and PeekYou is a great example of a service that combines disparate sources of content to find a wide swathe of information on individuals. You’ll need to know a name and location, a username, or a phone number.

How You Can Use It

Want to finally thank your third-grade English teacher for that mentorship? Return a book to your ex-girlfriend’s cousin? Find out which of your college classmates have an arrest record? The possibilities are endless.

The Wayback Machine

What It Does

The Internet Archive’s Wayback Machine is likely the most universally-known archival site. Its 279 billion web pages cover the past 20 years of internet history.

The Internet Archive features millions of books, texts, images, videos, and audio recordings in addition to the webpages, making this the starting point for anyone interested in finding digital ephemera.

How You Can Use It

The Wayback Machine is useful as a portal into a different era — The MySpace homepage on June 10, 2004 is just a click away. And, if you’ve operated a website during the past two decades, the site might have logged snapshots of data you had thought was long lost.

You’ll likely find something worth preserving in the rest of the Internet Archive’s vaults, too, like over 900 classic 70s and 80s-era arcade games, high-res scans of the 1950s science fiction magazine Galaxy or 1983 instructions on how to build a Yugoslavian computer.

The Wayback Machine Downloader

What It Does

Just finding the website data on the Wayback Machine won’t help preserve it for future generations, however. What happens if the Internet Archive loses its funding? You’ll have to download the data today if you want to do your part in preserving it, and for that you’ll need this program.

It’s more technical that the rest of the tools on this list: Once you download it, you’ll need to have the programming language Ruby on your system in order to run it using this command:

gem install wayback_machine_downloader

How You Can Use It

Once installed, the downloader will retrieve the latest version of every file the Wayback Machine has for any website that you request with the base url. You can further filter the data you download with more complex commands: Github has additional information on how it all works. Finally, you can ensure that if World War III hits, the world will be able to remember your 2004 DeviantArt account.

Archive.is

Lumen

PeekYou

The Wayback Machine

The Wayback Machine Downloader

Written by: