how to search
Lesson 10
('light' version)

Fravia's Nofrill
Web design

June 1998
Ported to
in February 2000

Lesson 10
...and build your own search-bots :-)

old version
Based on some original private
emailings from +ORC
Searchengines' strings cracked by
Master Accmailer G.E. Boyd

Preceding lessons:

lesson_5 about general agora
http:// retrieving ~ July 1996
lesson_6 about ftping files agora queries
and emailing altavista ~ December 1996
lesson_7 about the W3gate, search spiders,
error messages and evaluation of results ~ March 1997
lesson_8 about advanced searching techniques
(combing and klebing) ~ November 1997
lesson_9 about "effective" searching techniques
(infoseek 'finalised' and dejanews filtering) ~ January 1998


Ported to Fravia's searchlores.org in February 2000

Never forget the bots!] [The 'pasted stringsearch' method] [FTPmail: mailing and re-mailing :-)] [Is gopher dead?]

...and build your own search-bots :-)

To know answers is easy, the difficult part is knowing how to find any answer
~S~ +ORC

Few people know how to search, and even less know how to find what they have searched for
~S~ +Alistair

Never forget the bots!

I have decided to 'resume' some of the must know techniques for automated searching and data retrieval on the web for all those readers that keep writing me that some of the ftpmailer listed in my older lessons don't work anymore. Kids: the Web is a Quicksand! Lotta sites and servers and bots DISAPPEAR, but this does not mean anything at all: since you (should) know the sublime art: how to search, you'll always be able to catch the same (or analoguous) sites and services elsewhere!
As you already know (since I assume you have read the preceding lessons and have learned the basic of all 'getweb' techniques :-) there are many automated servers, out there, that will send you pages/files/source code and/or will answer your queries... of course for free, this is still 'our' web after all, the evil powers of commercialisation and advertisement don't dominate the net (yet)

As usual, since you're going to work with email, first of all check how much info you are leaking around with your own emails: send right now an email to
write 'test' both in the 'Subject' and in the 'Text' fields and examine with attention what you will get back as automated answer in a couple of seconds from this German echo bot...
OK? Everything ok? Your emailing traces are nice enough? Now let's start this lesson 10...

Let's list the main services we'll deal with:
1)      I wanna get pages, files and images from da net!
                agora@dna.affrc.go.jp           [01]
                agora@kamakura.mss.co.jp        [02]             
                agora@www.eng.dmu.ac.uk         [03]
                w3mail@gmd.de                   [04]
                w3mail@enigma.gex.gmd.de        [04]
                webmail@www.ucc.ie              [05]
2)      I wanna search da net
                getweb@unganisha.idrc.ca        [06]
                getweb@lanic.utexas.edu         [07]
                getweb@usa.healthnet.org        [08]   
                iliad@algol.jsc.nasa.gov        [09] 
                iliad@rosy.tenet.utexas.edu     [09]
3)      I wanna patrol da net
                Email-Queries@Reference.COM     [10]
4)      Oldies but useful
                gophermail@eunet.cz             [11]  
                gopher@dna.affrc.go.jp          [12]    
                http://veronica.psi.net         [13]

[01] the most used one by those who know this stuff
[06] a beautiful one for searches:
[04] a very powerful one for images retrieval
[08] very fast but with a 200.000 bytes weekly quota
[09] iliad has a "get url" or a "iliad query" function
[10] a very powerful 'filter' possibility to automatically patrol usenet

Each one of the preceding services will give us the possibility to learn a different face of searching... we'll now examine them all (only three in this version, I'll complete soon)

agora@dna.affrc.go.jp [01]

Who knows if these nice people from Japan really grasp how IMPORTANT their fantastic service is for any Interenet user? This is the "mother of all agoras", because it's 'speedy quick' and allows the three famous commands SEND (your target URL's text), SOURCE (your target URL with all its HTML formatting, so that you can browse it off line, pretty important in order to browse 'almost' anonymously a delicate target site :-) and DEEP (one URL with all linked URLs on it... yet whatch it! You can get hundred of emails if your target is a page that links to a lot of pages, like my aca300.htm).
Agora allows the retrieval of zipped files as well, btw, if you for instance ask for:
send ftp://ftp.crl.com/users/iv/iverham/ua.zip

agora will deliver you Uzi Paz's famous (and invaluable) file on Usenet access, techniques and newsgroups.

The 'pasted stringsearch' method

So, how do you do a search with an agora? Well, the trick is to do a search exactly as you would do it in your own browser... therefore you must first of all learn how you should search using your own browser, which many readers still don't know: i.e. the 'pasted stringsearch' searching method... very useful indeed if you until now only searched using the ready-made searchengines forms, like the altavista one below or, if, even more slowly, you only used the advertisement overloaded front pages of the search engines themselves :-)
1) copy the following line (highlight it and then CTRL+C)
2) paste it into your browser's "URL" small window (CTRL+V, duh)
3) replace the "bozo" keyword with your search phrase, separating different words with a plus (+) sign, not with blanks... [ida+disassembler+regged] for instance... :-)
4) Press ENTER and up you go... much quicker than accessing altavista's real site isn't it? Actually it's even quicker than using a form like my own one:
Search and Display the Results

Try both the form and the 'pasted stringsearch' methods for searching on line now... which one is quicker? :-)
Now, the same 'stringsearch' method can be used (with an agora server), per email. The advantage in this case of course is NOT rapidity, is automation... the following pre-prepared email form can be your first 'home-made' generic search agent... just cut and past the following block as TEXT in a email to agora@dna.affrc.go.jp and you'll see what I mean (send after having search-replaced [bots+source] with [your+own+searchstring], duh):
send http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web\
send http://webcrawler.com/cgi-bin/WebQuery?searchText=bots+source
send http://search.dejanews.com/dnquery.xp?QRY=bots+source\
send http://search.dogpile.com/search?q=bots+source&fs=web&ss=stop\
send http://www.excite.com/search.gw?trace=a&search=bots+source
send http://www2.infoseek.com/Titles?qt=bots+source&col=WW
send http://www.lycos.com/cgi-bin/pursuit?cat=lycos&query=bots+source\
send http://www.metacrawler.com/cgi-bin/nph-metaquery?general=bots+source\
send http://search.opentext.com/omw/simplesearch?SearchFor=bots+source\
send http://guaraldi.cs.colostate.edu:2000/search?KW=bots+source\
send http://search.yahoo.com/web/advanced/bin/search?p=bots+source

See? Now you can automate the whole process: prepare a batch file that will compose your 'agora search' email, say every two days... with (some of) the selected search engines above, with your preferite search strings... and you are set for fishing the deep deep web without much work...
getweb@unganisha.idrc.ca [06]

OK, admittely the 'pasted searchstrings' method above has got a strong 'concurrence' from the new 'breed' of getweb servers... unganisha, for instance is a beautiful canadian robot. The getweb servers make it extremely easy to use any form based search engine, and have moreover integrated automated facilities for three difefrent search engines: SEARCH ALTAVISTA, SEARCH YAHOO and SEARCH INFOSEEK.
Just email getweb@unganisha.idrc.ca leave the subject blank and compose in your text the following:


SEARCH YAHOO "automated retrieval" bots 


Notice the blank lines BEFORE begin, after begin, before end and after end. Since these blank lines are required by some of the getweb systems, you better get used to use them with EVERY getweb system, just in case. Of course you can substitute SEARCH ALTAVISTA or SEARCH INFOSEEK to the SEARCH YAHOO command above. SEARCH INFOSEEK has two important additional switches that will give more power to your search: NN (search the usenet) and NW (search only among the past MONTH of news) Just email getweb@unganisha.idrc.ca leave the subject blank and compose in your text the following:


SEARCH INFOSEEK NW "automated retrieval" bots 


Getweb's limits
There are limits on all these automated servers, these vary and lay currently between 10 and 100 documents requests every week OR between 100.000 and 700.000 kilobytes every week, of course you can use different email accounts to multiply your allowed quotas. Week limits regenerate after seven days from trespassing, NOT on monday morning :-)

Email-Queries@Reference.COM [10]

The emmail query service provides a powerful interface that lets you refine queries by author, author's organization, subject, newsgroup or e-mail list

So, how d'you use it? Well, first of all TRY IT right now with a "on the fly" query...
FIND 'software reverse engineering' WHERE AGE < 14 DAYS
And then send for HELP and learn how to create your own automated filtering bots... here you have a very simple example:
FIND agents 
AND scripts
AND source
AND NOT fan money jobs sell help buy god
Ok, that should be enough for a start... and I believe that if you never used this service before you'll thank me a long time for this... more on the 'full' version of this lesson...

FTPmail: mailing and re-mailing :-)

Explained elsewhere on searchlores.org

Is gopher dead?

This is the 'light' version, I'm sure you have had enough info for to-day... anyway, you should at least understand that gopher of course is not dead, the www notwithstanding... :-)

Should you want to retrieve large zip files (say MPEG huge files) that are accessed via a web page (and don't refer to any FTP site... else we should use ftpmail :-) you should by all means learn what gophers are and how to use them. The idea to download huge files on-line is IMO pretty silly: the aleas of the web and the number of accesses to *ahem* pretty sensible files make such downloads a very difficult enterprise at times. Once you have mastered the gopher techniques you'll never download huge files on line again (get them sent to you by an automated bot that will automatically retry to connect every time its connection breaks... isn't it nice?)
Go ahead, enjoy!

(c) fravia+ 1998, work in progress, all rights reserved nevertheless

Back to how to search

how to search 5 how to search 6 how to search 7 how to search 8 how to search 9
Entrance links ~~ tools ~~ antismut anonymity
~~ ~~ ~~ ~~
search bots search_forms mail_fravia

red(c) Fravia 1995, 1996, 1997, 1998, 1999, 2000. All rights reserved