Web-searching Session @
BERLIN, 29/12/2002

(19c3, Berlin ~ Sunday 29 December 2002)

How to find anything on the web
by fravia+

This file dwells @ http://www.searchlores.org/berlin.htm

Scaletta of this session
Searching for disappeared sites    Many rabbits out of the hat    Jeff's example   
Slides    Opera 6.1    Some reading material   
Bk:flange of myth 

Scaletta
Six steps to web searching perfection
(Wizard searching for dummies)
This workshop has one main aim: teach to young (and not so young) people, aware of the potential of the web, how to find anything they would care for.
For free, as the web of old teached us. Without commercial crap, without advertising, without copyrights, without paying. Nothing, to nobody. Knowledge: sons of the light do actually beat black and blue all commercial minions and advertisers of the darkness. May they die tomorrow, pancaked by a truck, or something.
That is one of the reasons I have to use a pseudonym instead of my real name. Indeed some of the searching techniques I will describe may be seen as "non kosher" in many of our euroamerican copyright-obsessed dictatorships.

Six steps?
0) combing & klebing: the structure of the web
1) Beyond google (and the problem with spammers) combing again, regional search engines: "buscadores hispanos"
2) Nomen est omen
3) Guessing, stalking, social engineering ---> database opening
4) Playing with google (yo-yo; 5 words searching; images; long phrase: "'who is that?' Frodo asked, when he got a chance to whisper to Mr. Butterbur")
5) Alternative techniques: Netcraft, Web of the past, minus ".com", minus "tits", compound searches, webferret/copernic/beeline/weblookup/hurricane/atomica(Gurunet)
6) Password wizardry: human beings are human, bob/bob etc. hardcoded passwords and lists
ANYTHINGInfact we will see together to-day, how to find ANYTHING on the web.
Indeed the web is so huge that -with almost no exceptions- searching in the correct way will enable you to find anything you may be looking for. Any image, any music, any book, any document, any data, any software (proprietary or not), any newspaper... that has been published in the history of mankind. Whole national libreries, government archives, are going on line in this very moment in some god-forgotten country in Africa or Asia.
An incredible wealth of documentation for NGOs and grass-root organisations is available on line: Here an EU-related simple example:

Fetch a OJ on the fly!  ("l" or "c")
(Build a string like 1999l138 or 2001c011)
 ? 
   string → 

See? The power to search and retrieve anything!

I hope to find the time to build similar masks for all sort of Legislation available in the USA, Japan, China and Russia (to start with :-)

Half a million people are putting in these very 5 seconds half a million scanned images on some god forgotten homepage. Another 350 thousand users are uploading, in THESE five seconds 350 thousand mp3, somewhere on the web.
Some of these books, of these images, of these musics, have never been on the web before. But they will now remain on Internet for the ETERNITY.
NO WAY BACKIndeed, everything that has been published once, will remain on the web for ever and ever (and ever), in copycatted electrons, because the very moment something is there it will be copied. A simple demonstration of this is that you can find things that DO NOT EXIST ANYMORE on the web using one of the many repositories.
One of them is google cache, another one is the wayback machine, but there are many more, that will allow you to find data that have been 'pulled off' the web.
SOMEWHERE... WHERE?Yep, all this stuff is on the web, somewhere, but where?
You will have to use a lot of different techniques and approaches to search effectively the web. As you will see google, though worthy, is by FAR not the solution for your searches. In order to understand WHY you have to use different tools, you should first of all understand what the web looks like, from a searcher point of view.
Structure
Explain structure
Explain diameter 19: do not dispair
There is ONE important thing in this image that i wish you will not forget: the difference between INDEXED web (coupla milliard/billions sites) and NOT INDEXED web (9 milliard/billions pages more). So when you are searching with the main search engines, with google, or with fast, or with wisenut, you are just limiting yourself to -in the best case- a FIFTH of the web.
SO, HOW DO YOU SEARCH?Ok, how do you search your needles in this huge ocean of commercial hay?
Let's begin with the beginning: usually you do not search a specific target: you search people that have searched that target. If the target has enough signal among the noise you may even search for people that have searched people that have searched for that specific target... :-)
This approach is called COMBING, and is rather effective. But before explaining it, we will have to understandhow the MAIN search engines really work, and WHY they are there. Simply stated, these "free" search engines exist in order to grep what you and million of other users are searching for.
Anonymity -proxies- free homepages - free email addrsses -free search engines
EXAMPLE ~ S.E. DIFFERENCES Let's take an example: you are interested ina specific camera, how to use it, if it is worth using... whatever. Let's say a nikon F2
Now, of course, you could search on google for nikon F2:
search?as_q=%22nikon+f2%22&num=100 :8180 results
This is a good, simple query and it is what most people would do. And they may even be happy with it.
Nevertheless a good idea would be to use ANOTHER main engine as well, let's say FAST:
&query=%22nikon+f2%22: 2561 results
Before discussing this, it would not be bad to use at least a THIRD main search engine on such a broad query:
query.dll?q=%22nikon+f2%22: 6807 results
A first thing to understand is that you should ALWAYS use at least two main search engines, when starting a broad query. As you may see if you follow the links above, wisenut, for instance, is more 'asian-centered' than google, which in our case, searching for a japanese camera, would probably be useful.
Now a normal user would be happy: Woah, 2000 - 6000 - 8000 results! I may browse for ever just here
In fact first of all you CANNOT really see all those results. There is a difference between the number stated by teh search engines and the results you may really check yourself.
If you really tryed to see ALL those links, you would quickly discover that
Here a table I made two months ago, based on another query, as you can see there is a huge difference between alleged results and results you can investigate:
Yo-yo index


THE YO-YO Index
(Based on the broad query: "advanced searching")


s.e. Yo-yo indexreal maxmiddle3/4Alleged Total
Google3,8299950075026100
Altavista1,2740020030031834
Lycos100%real!real!real!18240~23940
Fast17,08 (46,31)40102005267023419 (8853)
Wisenut0,7230015023041776
Northernlightn/a (high)n/a (high)n/a (high)n/a (high)19344
Hotbot7,281397700105019200
Teoma2,571941001507540
Excite43,4210005007502303
Yahoo5,1367722545013200
WHILE WE ARE STILL AT THE MAIN SEARCH ENGINESSome simple rules:
1. always use more than one search engine! "Google alone and you will never be done!"
2. Always use lowercase
3. Always use MORE searchterms, not only one "one-two-three-four, and if possible even more!"
This is EXTREMELY important. Note that -lacking better ideas- even a simple REPETITION of the same term can give you more accurate results:
nikon: 1,410,000 (alleged) results
nikon nikon: 627,000 (alleged) results
If you are interested in this 'pleonastic' stuff, read The epanaleptical approach.

Since we did not do it before, it's time to use more searchterms now. Here is a "better" query for our target (I will use google, but remember -yourself- to use also OTHER main searchengines when broadsearching, you will be amazed by the non overlapping results).
&q=%22nikon+f2%22+nikkormat
Interesting, eh?
Now look at this query:
q=%22links.html%23nikon%22
You may not recognize the querycodes above... it is just "links.html#nikon". We are already slowly moving away from simple main search engines searching towards combing. In fact I was searching for pages of links that are of ineterest for my query. I can go further:
"nikon.htm" OR "nikon2.htm"
"nikon*.htm"
Another approach: +nikon nikkor +photo resources
As you see it's commercial infested... there is some need for our yo-yo here
You get the idea.
I could also use the netcraft trick: do we happen to have many "nikonsites"?
Woha... 800 sites NAMES vontain the word nikon! (But many of them will be dormient).
BUT THE REAL DIFFERENCEBut the real difference between the simple queryes we have made above and a good seeker approach, is that above we are still just "skimming", or only slightly touching, the relatively small INDEXED part of the web. We are still missing 4/5ths of it! That's the reason you will have to learn at least some rudiments of combing.
The first -simple- combing approach (remember: searching those that have already searched) is to use old glorious USENET!
Usenet
Messageboards
Homepages
Webrings
Local searching (spanish search engines - buscadores hispanos)
getting at the target from behind: netcraft, synecdochial searching.
GRAN FINALE Guessing
Passwords through google
Database accessing (politically correct)
brute forcing? Guessing / Searching
Bots searching, scrolls, wands
Software reversing: commercial bots capering
ANY QUESTIONS?Now, at the beginning of our workshop I told you that you can find ANYTHING on the web. My experience has tought me that there is almost always -unfortunately- ONE sad exception. It is almost always next to impossible to find quickly the curious targets that people ask for at the end of my workshops, so do not ask me to find something specific for you now...





SEARCHING FOR DISAPPEARED SITES

http://web.archive.org/collections/web/advanced.html ~ The 'Wayback' machine, explore the Net as it was!


Visit The 'Wayback' machine at Alexa, or try your luck with the form below.


Alternatively learn how to navigate through [Google's cache]!




Search the Web of the past

Weird stuff... you can search for pages that do not exist any more! VERY useful to find those '404-missing' docs that you badly need...



1996  1997   1998  1999   2000  2001



Max. Results  


NETCRAFT SITE SEARCH

(http://www.netcraft.com/ ~ Explore 15,049,382 web sites)

VERY useful: you find a lot of sites based on their own name and then, as an added commodity, you also discover immediately what are they running on... verbum sapienti sat est eheh, I mean... A buon conoscitor, poche parole, I mean... a word to the wise is enough...
Search Tips
Example: site contains [searchengi] (a thousand sites eh!)





RABBITS (out of the hat)

Examples of absolute password stupidity: http://www.smcvt.edu/access/ejournal_passwords.htm


The index of approach, MSN: http://search.msn.com/results.asp?f=any&q=%2B%22Index+of%22+%2BName+%2B%22Last+modified%22+%2B%22Parent+Directory%22%0A&FORM=SMCA&cfg=SMCINK&v=1&ba=0&rgn=&lng=&depth=&sort=&d0=&d1=&cf=
Inktomi: http://169.207.238.189/search.cfm?query=%2B%26quot%3BIndex%20of%26quot%3B%20%2BName%20%2B%26quot%3BLast%20modified%26quot%3B%20%2B%26quot%3BParent%20Directory%26quot%3B&first=20&adult=0

PLAYING WITH GOOGLE

mp3 searches:
index of complex:
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&q=intitle%3A%22index+of+%2F%22+%22parent+directory%22+intitle%3A%22mp3%22+-filetype%3Ahtm+-filetype%3Ahtml&btnG=Google+Search

index of simple:
http://www.google.com/search?num=30&meta=hl%3D%26lr%3D&q=%2B%22index+%2Bof%2Fmp3%22+%2Bdylan

Andromeda:
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&q=%22vivaldi%22+%22powered+by+Andromeda+version%22&btnG=Google+Search

password searches:
Searching entries 'around the web', no specific target, using 'common' passwords:
For instance: bob:bob

For instance: 12345:54321
james:james ~

Searching entries to a specific site (not necessarily pr0n :-):
For instance: "http://*:*@www" supermodeltits

warez searching:
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&q=intitle%3A%22index+of+%2F%22+%22parent+directory%22+%2B%22%2A.nfo%22+%2B%22%2A.rar%22+%2B%22%2A.r05%22+%2B%22%2A.r10%22+-filetype%3Ahtm+-filetype%3Ahtml&btnG=Google+Search

Fishing info out of the web:
password3.htm

The above is not 'politically correct' is it? But it works. And speaking of "politically correctness", some of you'll love the Borland hardcoded password faux pas... Databases are inherently weak little beasts, duh, quod erat demonstrandum.


Jeff's example


well it gets even stupider ... EVERY ---and i do mean EVERY
passmaster login page that I tested out, only needed but one little change in the URL and in IE the dowggone password log pops right up for viewing (in Netscape it works too but asks if you want to save it to disk)

just remove whatever the blahblah.htm ending is in the url and type in its place 'password.log'
and sure enuff pardner --- the log file jumps up showing all users passes and url's ... the thing is once you have the url for these dum things you don't even have to bother logging in with the user or pass... u just use the url that is listed

example:

now just change password.htm too password.log

hell try any of them returned by applet:passmaster in alta ... they all give you wabbits

beertime

it worked on every one i tried ... doh
jeff

SLIDES

(BERLIN, 29/12/2002)

How big?

Short/Long term coverage
coverage
Structure

Kosher - non kosher

Web coverage, short term, long term

Main search engines coverage

Opera 6.1 (windoze opensourcing)
Traditionally I conclude my workshops examining ways to remove advertisement from this FANTASTIC Web-browser, that beats hands down any other browser on the market.
You'll find a Linux version at http://www.opera.com/linux/, where you can either use the ad-infested version or purchase the non-ad one.
Below I'll explain you how to eliminate advertisement in Opera 6.1 for windoze, but I would advice you to purchase, as I did, your own version, and help slick, good and fast Opera against the awful Netscapian and Microblowian browsersaurii.

I do not believe that the following information will damage Opera, on the contrary: