selfaware soup

Esther Weidauer

Backing Up Websites

Things are disappearing, but you can keep offline copies.

2025-02-02

The new US administration is taking a lot of valuable information offline in what can only be called a fascist purge of knowledge that will likely continue.

Online archives like the Web Archive’s Way Back Machine are a good resource to still access those pages but ultimately, saving things offline is the safest way to keep them available and doesn’t depend on any online service to stay up.

Screenshots

Screenshots really aren’t ideal because they often just capture a small part of a page and don’t contain searchable text anymore.

But in a pinch when you don’t have any other tools available, especially if it’s a page with immediately critical information, it’s better than nothing.

Using your browser

All major web browsers have a “Save Page” feature. The keyboard shortcut is usually Ctrl-S or Cmd-S. It attempts to save all files that are necessary to display the page the way it was seen online. However this doesn’t always work very well. Images or layout elements may be missing for example.

An alternative is to “print” the page as a PDF. Your browser’s printing window probably has a “Save as PDF” thing somewhere. Usually the shortcut for printing is Ctrl-P or Cmd-P. This even works on some phones where the “Save page” feature may not be available.

Both of these ways have the downside that they only save the page you’re currently viewing, not the whole website with all it’s pages.

Using WGET

If you’re comfortable with command line tools, or want to learn that, wget is a powerful tool for archiving websites. It has a lot of options that can be unintuitive though. Here is one way of using it that will work for most websites:

wget --mirror --convert-links --page-requisites --adjust-extension --no-parent https://www.selfawaresoup.com/

This will make a copy of https://www.selfwaresoup.com starting from the home page and following any links from there until it no longer finds any more pages.

It will not find pages that aren’t linked to within the website. For those you will need a direct link.

This is what each of the options does:

If you want to know more details about how to use wget, you can run man wget to read its full manual or check out the online version of the manual.