Download website for offline use

Post by **^rooker** » Thu Jun 23, 2011 5:07 pm

[PROBLEM]
I wanted to publish a php-based website made by me on a different server, but only as static HTML pages, since the php was only used for convenient automatic layouting of complex image arrangements (long story...)

Anyway:
I couldn't just use my browser and use "Save As...", because all the links would point to invalid filenames if saved as static HTML. They had to be rewritten. *sigh*

[SOLUTION]
Thanks to the GNU/Linux community and the Free Software people, who always seem to have thought about almost everything - including implementing a comfortable solution, I found a short article on the Linuxjournal-website about creating an offline copy using "wget" - which immediately, and perfectly, came to the rescue and saved my day.

In order not to use that valuable information, I thought I'd post it here, too. The commandline arguments I've actually used, were like this:

Code: Select all

$ wget \
     --recursive \
     --level=3 \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
WEBSITE_ADDRESS (like https://whatever.com/dude/...)

I like it when a tutorial also mentions which arguments, and why, they're using:

--recursive: download the entire Web site.
--no-parent: don't follow links outside the directory tutorials/html/.
--page-requisites: get all the elements that compose the page (images, CSS and so on).
--html-extension: save files with the .html extension.
--convert-links: convert links so that they work locally, off-line.

EDIT (2021-05-11): Added "--level" to avoid wget going crazy downloading the whole Internet.

Post by **^rooker** » Tue Feb 09, 2016 4:17 am

In case you're trying to download a HTTPS secured website that has a non-trustworthy certificate for one reason or another (self signed, expired, non-trusted CA, etc), you might want to add the following option to the above wget command:

Code: Select all

--no-check-certificate

But be warned:
As wget already says:

To connect to http://www.somedomain.test insecurely, use `--no-check-certificate'.

Example error messages:

ERROR: no certificate subject alternative name matches requested host name

Das Werkstatt

Download website for offline use

Download website for offline use

wget: Self signed certificate