[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wget




On Wed, 13 Aug 2003, Fraser Farrell wrote:

> Peter Mount wrote:
> > However it takes quite some time (~1.5 hours) but it does work. The only
> > down side is that it retrieves not just the current, but every version of
> > each page, so it ends up downloading 22Meg.
> >
> > I don't think it's good for backup purposes, but is useful for making an
> > offline copy.
>
>
> I reckon Wiki's keeping the old pages in place provides an excellent way to
> detect and audit the changes to the site. Which may be helpful one day to
> resolve an argument.

Yes, it's one of the things I like about chiki - it does keep the old
versions, and you can make an earlier version the live one.

> By default wget will grab everything it's told to, unless there's a Robots
> Exclusion Protocol file defined for the site. FYI this Protocol is designed
> to keep search engine robots and mirroring programs out of non-public parts
> of your site. The Protocol is routinely flouted by spam harvesters and other
> malicious programs (a misbehaviour you can turn to your advantage by
> deploying nasty countermeasures, but I digress...).
>
> Incidentally wget's --mirror option only downloads the files that are new
> and/or changed since its previous run. You got them all this time because it
> was your ~first~ run; therefore all of the files were "new". Next time it
> won't be so bad - provided Tom doesn't upload his data CDs to Wiki first :-)

That's normally true, but I have a feeling it won't work, as I was seeing
messages from wget saying that there was no date header - probably chiki
not providing it for some reason.

Peter

--
Peter Mount
peter@retep.org.uk
http://www.retep.org/
http://retep.net/
   Tel: +44 (0) 1622 749439
   Fax: +44 (0) 8701 361620
Mobile: +44 (0) 7903 155887
    IM-MSN: retep207@hotmail.com
IM-AOL/ICQ: retepworld