Backing Up Websites
Things are disappearing, but you can keep offline copies.
2025-02-02
The new US administration is taking a lot of valuable information offline in what can only be called a fascist purge of knowledge that will likely continue.
Online archives like the Web Archive’s Way Back Machine are a good resource to still access those pages but ultimately, saving things offline is the safest way to keep them available and doesn’t depend on any online service to stay up.
Screenshots
Screenshots really aren’t ideal because they often just capture a small part of a page and don’t contain searchable text anymore.
But in a pinch when you don’t have any other tools available, especially if it’s a page with immediately critical information, it’s better than nothing.
Using your browser
All major web browsers have a “Save Page” feature. The keyboard shortcut is usually Ctrl-S
or Cmd-S
. It attempts to save all files that are necessary to display the page the way it was seen online. However this doesn’t always work very well. Images or layout elements may be missing for example.
An alternative is to “print” the page as a PDF. Your browser’s printing window probably has a “Save as PDF” thing somewhere. Usually the shortcut for printing is Ctrl-P
or Cmd-P
. This even works on some phones where the “Save page” feature may not be available.
Both of these ways have the downside that they only save the page you’re currently viewing, not the whole website with all it’s pages.
Using WGET
If you’re comfortable with command line tools, or want to learn that, wget
is a powerful tool for archiving websites. It has a lot of options that can be unintuitive though. Here is one way of using it that will work for most websites:
wget --mirror --convert-links --page-requisites --adjust-extension --no-parent https://www.selfawaresoup.com/
This will make a copy of https://www.selfwaresoup.com
starting from the home page and following any links from there until it no longer finds any more pages.
It will not find pages that aren’t linked to within the website. For those you will need a direct link.
This is what each of the options does:
--mirror
: enables a bunch of other options that are useful for creating a full copy of a website. You should leave this out if you only want a specific page, especially if it’s on a very large website with many pages--convert-links
: changes the internal links of the website so that it works offline and doesn’t link to the online version--page-prerequisites
: downloads any external files that are necessary for the offline page to work--adjust-extension
: makes sure that each offline page has a.html
extension so you can easily open it for viewing--no-parent
: limits the link-following to pages within the website, so it doesn’t also download other websites that might be linked tohttps://selfawaresoup.com
: the home page URL of the website. Replace this with the site you actually want to back up, or a the URL of a specific page, if you only want that one.
If you want to know more details about how to use wget
, you can run man wget
to read its full manual or check out the online version of the manual.