How to Restore a Website from the Wayback Machine – Step-by-Step Guide

Header image for this blog post containing an image and the title

Sometimes life happens. Where once there was a website, all you have now is an error page. Your hosting has expired, there are no backups, and you are panicking.

So how can you recover your old website? The Wayback Machine!

For those of you who don’t know: the Wayback Machine (also know as the Internet Archive) is a digital archive of the entire public-facing internet. The Internet Archive is actually a San Francisco-based non-profit that’s been running for almost 25 years.

And it might just be your last hope to get your website back.

Tired of dealing with your slow WordPress website? Email me at brian@pagecrafter.com and mention the code #FreeHosting10 for two free months of lightning-fast WordPress hosting. We will even migrate you for free!

If you’re lucky, the Wayback Machine crawled your website in some of its final days and has a copy sitting there, waiting for you to restore. If your somewhat unlucky, that version will be old. If you’re really unlucky, it won’t exist, will be ancient, or totally incomplete.

I want to preface these instructions by making sure you understand that, unfortunately, even if you are able to restore your website, it is unlikely to be the same as it was. This is because the Wayback Machine can only archive what it can see. Server-side scripting like PHP (which WordPress runs on) is invisible to it and can’t be restored.

So if you were hoping to restore your entire WordPress backend: think again! It isn’t possible.

You can, however, make it look the same as it did for everyone else.

If that’s good enough for you for the time being, or if your site is just basic HTML, then this should work for you!

It’s actually not very hard to do, however you do need command-line access to your own Linux machine. If you don’t have this, you can always hire somebody to do it or reach out to us to do it for you.

The Wayback Machine Downloader utility

We’re going to be using a great little tool from Hartator, the aptly-named “Wayback Machine Downloader“. It’s simple and easy and it gets the job down.

Really all of the instructions for its installation can be found at that link, so I don’t need to reproduce all of the setup instructions here.

Basically, you need to install Ruby if you don’t already have it, then install the downloader, and then run a simple command to download it.

With it installed, you literally run a command this simple to download your site:

This automatically downloads every single file from the most recent archive of your site at the Wayback Machine.

And the beauty is: all of the files should be the “original” ones, and not ones with Wayback Machine URLs in them. The directory structure should be intact, and it should be easy to move around.

Make sure you take a look at the “Advanced Usage” for extra parameters, like specifying where the site should be downloaded to.

I did find that I needed to replace some URLs, still. It would appear that external links were still being routed through the Wayback Machine, so I needed to fix those.

I found that I had URLs that looked like this:

Obviously, we want those to just go directly to http://www.example.com/, so there was some work to be done!

What I did was open up the free software Notepad++ and did a search and replace to remove the extra part of the link. Here’s how to do that:

  1. View your files, and find the exact URL that’s being used in these links (the full part before the actual, desired link). You’ll need this in step 4.
  2. Open it up and press ctrl+f to open the find and search-and-replace screens.
  3. Click “Find in files”
  4. For “Find what:”, specify the entire archive.org URL, for example: https://web.archive.org/web/20190316040039/
  5. For “Replace with:”, put nothing. We are just removing it entirely, so it shouldn’t be replaced with any other content.
  6. Specify the directory where your site files are located.
  7. Press “Replace in Files” to run the replacements.
This is a screenshot from Notepad++ showing how to properly perform a search and replace on your site files downloaded from the Wayback Machine

These were the proper Notepad++ settings in our example

It should be done! Your URLs should work great.

In our case, we zipped up the entire site, and them moved it to the new hosting server where the site was to be located and extracted the archive.

The files should be all ready!

Now you just need to point your DNS settings to your new hosting server, and you should be up and running.

Limitations

As touched-on before, there are some serious limitations here. There won’t be any server-side scripts available, which could break a lot of things. Other, more-specific limitations include:

  • Contact forms probably won’t work
  • If you had a CMS like WordPress, none of the backend functionality will work
  • Anything that requires a database query won’t work
  • Most queries of any kind probably won’t work
  • It won’t really be able to produce any dynamic content
  • It could be a pain to make changes, especially to repeated content like the header and footer
  • You might end up having to rebuild your entire site

So it’s not perfect, but in a pinch it will do the trick. Please reach out to us if you’d like us to take care of this for you.

You may find that you still need to rebuild the site after this happens. You’ve at least got your site content, so if you just want to rebuild it similarly or if you’d like to take advantage of this opportunity to redesign your site, it might be a good time to do so.

We love redesigning websites, so if you are considering going that route, please reach out. We’d love to hear from you!

 

 

About Brian Johnson

Brian Johnson is a website developer and designer living in Minneapolis, Minnesota with a passion for code and WordPress. He spends his days building WordPress websites for small businesses, developing new code with the online community, and living life.