[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Wget
Peter Mount wrote:
> However it takes quite some time (~1.5 hours) but it does work. The only
> down side is that it retrieves not just the current, but every version of
> each page, so it ends up downloading 22Meg.
>
> I don't think it's good for backup purposes, but is useful for making an
> offline copy.
I reckon Wiki's keeping the old pages in place provides an excellent way to
detect and audit the changes to the site. Which may be helpful one day to
resolve an argument.
By default wget will grab everything it's told to, unless there's a Robots
Exclusion Protocol file defined for the site. FYI this Protocol is designed
to keep search engine robots and mirroring programs out of non-public parts
of your site. The Protocol is routinely flouted by spam harvesters and other
malicious programs (a misbehaviour you can turn to your advantage by
deploying nasty countermeasures, but I digress...).
Incidentally wget's --mirror option only downloads the files that are new
and/or changed since its previous run. You got them all this time because it
was your ~first~ run; therefore all of the files were "new". Next time it
won't be so bad - provided Tom doesn't upload his data CDs to Wiki first :-)
cheers,
--
Fraser Farrell
----------------------------------
http://astronomy.trilobytes.com.au
----------------------------------
- Follow-Ups:
- Re: Wget
- From: Peter Mount <peter@retep.org.uk>
- References:
- Re: Wget
- From: Peter Mount <peter@retep.org.uk>