+Forseti's own made wget mirroring bot

Version 28 Mai, 02004
Hola Fravia+
Con permiso, I will tell you of the most recent (circa Jan. 02004) version of the Wget Type bot for searchlores mirroring. A bit of RTFM has enlightened me to a pura vida script for this:

The nueva mirror script:

-----------------------

#!/bin/sh cd s/ && mv wget-log.1 wget-log && wget -r -m -b -k www.searchlores.org

-----------------------
Non che male, eh?

The -k flag addresses all the other crap in the previous versions of the scripts. RTFM for explaining what it does.


The following Old version of this script is depricated.

Fravia+, I noticed you have Luc's perl mirroring bot up, and I thought you'd like to have mine as well. It's not quite as long and involved :)

I use wget ( get=http://www.gnu.org/software/wget/wget.html ) and call it via a shell script to mirror searchlores.org. I also have a script to move the images and other sub-directories after wget gets them, as my mirror is not a root directory.

Here they are:

The mirror script:

-----------------------

#!/bin/sh cd /qu00l.net/html/s/ && mv wget-log.1 wget-log && wget -r -m -b www.searchlores.org

-----------------------

Simple /sh script, the first command sets the working directory, the second overwrites the log so as not to have infinitely iterated wget.logs, I think 2 will do nicely for comparison, and the third calls wget and instructs it to recursively (-r) mirror (-m) and run in the background (-b).

And the moving script:

-----------------------

#!/bin/sh -
cd /qu00l.net/html/s/
cp ./www.searchlores.org/fiatlu/* ./fiatlu/
cp ./www.searchlores.org/images/* ./images/
cp ./www.searchlores.org/pdffing/* ./pdffing/
cp ./www.searchlores.org/realicra/* ./realicra/
cp ./www.searchlores.org/zipped/* ./zipped/
cp ./www.searchlores.org/protec/* ./protec/

-----------------------

Which is the only part of the scripts that may need to change, as you make new child dirs I add them to this script. This arises as wget places it's output into a dir named www.searchlores.org, whereas on the actual site this dir would be /, therefore everything is copied one directory up so as I dont have to write a crazy script to rewrite all the url's in all of the pages.

I'm using 2 scripts instead of one, because wget works in the background and using && (which means 'upon completion of the last command do the next) won't work. The shell sees that the command has been executed but it actually has just gone to the background.

Wget is a great program too, it can grab images, or single files, or whole sites, or just sections, and it's very flexible and configurable.

Enjoy :)

Forseti+


Petit image

(c) 1952-2032: [fravia+ & +Forseti], all rights reserved