hts_0708.htm: How to search the web, by fravia+ (¯`·.¸(¯`·.¸ july/august workshops ¸.·´¯)¸.·´¯)

Web-searching Session @ Hal2001
(Twente, Holland ~ Saturday 11 August 2001)

Web wizard searching techniques, anti-advertisement galore
and software reversing tips
by fravia+
with useful linguistic corrections provided by Anna, Queen of Saba and e=h

Introduction Evaluating results A glimpse of the web
Klebing and other non kosher techniques
The need to fight against advertisement
Eliminating advertisement banners (Opera 5.1)

Introduction
I'm happy to be here. And I would like to thank those who have invited me to hold a workshop to-day.
The title of this workshop is

'Web wizard searching techniques, anti-advertisement galore and software reversing tips'

Since some among the presents will be quite web-savvy, whereas some other will not be so confident with the subject, I'll try to-day to find a path capable to satisfy everyone.
This is at times difficult, so please if something appears obscure just interrupt me with any questions you may have and/or note down the specific lore you are interested in, we'll delve into it deeper on request -if time allows- at the end of this workshop.

Searching the web is of paramount importance, for reasons that are easy to understand: since gathering information is now both cheap and effective, society is changing. When gathering information was expensive it made economic sense for decisions to be made in few central places. Now, with communication costs next to nothing, more and more people can easily be well enough informed to act independently. This is paramount to a revolution of considerable scale: those that learn how to find knowledge on the web ('hidden' as well as 'open') will find eo ipso quite some opportunities to 'move' things on the right paths... or at least to counter the decisions still that are still being made 'in few central places' by the dark side.

The amount of information you can now gather on the web is truly staggering.
Let's see... how many fairy tales do you think human beings have ever written since the dawn of human culture?
How many songs has our race sung?
How many pictures have humans drawn?
How many books in how many languages have been drafted?
The mind shudders, eh?
In fact, already now, every text, every music, every software program, every image, every book is on the web. Moreover: everything that goes on the web will 'remain' there forever, carved in copycatted electrons. It is next to impossible to 'pull down' a text, a whole site or an image, once published. But there are even more important goodies than media products out there. There are SOLUTIONS! Imagine you're an administrator, or a consumer defender, or a lawyer, or a scientist, or simply yourself when seeking a solution, doesn't matter a solution to what, ça c'est égale.
Well, you can bet: the solution to your problem is there, somewhere. Actually you'll probably find MORE THAN ONE solution to your current problem, and maybe you'll be even able to build on what you'll have found to develop another further approach.

I would also like to go beyond powerful searching techniques, if time permits, in order to underline the cosmic power that software reverse engineering can give you. Therefore I will close this workshop demonstrating a simple technique to get rid of advertisement banners planted inside software (using the latest version of the very good 'Opera' browser as example and target). Since this approach does not require any particular software reversing skill and can thus be applied by anyone of normal intelligence, it should help to put a nice stop to a pestering habit that utterly annoys me: unpleasant advertisement inside software.

When learning how to search the web one of the most important, lore is how to evaluate the results you find.

Evaluating results
Before beginning my workshop I feel I owe you an explanation... and at the same time I take the opportunity to introduce a most important web-seeking lore: 'evaluation'
You may wonder - as you always should when attending any conference whatsoever - why you should actually bother to listen to me today.
The importance of evaluating the results of your query should never be underestimated on a web where fake rules, and where scattered gems of knowledge must be dug out from beneath heaps of commercial crap. You MUST learn to evaluate -quickly and effectively- the 'authority' of any Author on any matter or subject, without necessarily knowing thoroughly such specific matter or subject (which, admittedly, would help you a lot in your evaluation :-)

A simple rule of thumb is to calculate 1) how much time the subject has spent on the matter, 2) how consistent he was in his line of investigation (you shouldn't for instance buy a sail-ship from somebody that began assembling them a couple of years ago after having done quite different things all his life long) and 3) how much he is valued among his 'peers'.
Yet you should never forget that even when all the three conditions above are satisfied you could still be easily deceived by carefully masked incompetence.

There are two more helpful parameters you should consider when evaluating a speaker: his 'stolid stubbornness' (or load of prejudices) on the negative side and, on the positive side, how much chances you have to verify his asserted data.
The first point: Author's prejudices, is VERY important when evaluating. Always remember that people who take themselves too seriously have great difficulty in changing their own minds when proved wrong. This applies in general to whomever is commercially 'obfuscated'. In general you should be wary to trust those whose career depends from how they are PERCEIVED, and not from what they really know.
I'll exemplify the second point: possibility to verify asserted data, using the 'librarian' difference between books that have footnotes and books that have endnotes. Endnotes are more cumbersome to check, hence you tend to have to rely more on what the Author says. Footnotes can (could) be immediately checked, hence you can (could) discuss or question step by step what the Author says.
So footnotes are more 'honest' than endnotes. Quite a difference of style, if you think about it.
Fortunately nowadays with the web available to everyone, knowing how to search for references, we can immediately check whatever anybody says on the fly.
This is the same as if we always had had 'footnotes'... for every text we find. De facto it means that we can have a footnotes 'equivalent' even when dealing with Authors that would rather hide behind endnotes (and also with whomever would rather not use any explaining notes at all :-)

Finally there are 'global' characteristics that unveil charlatans, on the web and elsewhere: the Author shows strong arrogance; extreme loquacity; frequent appeal to emotions; repeated appeal to supreme authorities; incomprehensible language; untarnished conviction of his own importance (often with the paranoid corollary of being unjustly persecuted); and an almost absolute certainty that who does not agree with him is either fool or vile.

Back to this workshop: Fundamentally, you are listening to me because I have been present on the web for what in Internet is regarded as a long time (7 years) dealing with themes that are reciprocally intertwinned: web-searching, software reversing, anti-advertisement and anonymity lore.
Should you wonder why these themes are intertwinned, rest assured: one of the aims of this workshop is precisely to clear this point. Moreover I am - in my 'real' life - both a linguist and a historian.
I can assure you that when searching the web being able to understand 5 major languages is an asset, no less than knowing rhetoric and sources-collation.
Finally I could point out to some advanced work that has been made on my sites -thanks to the help and knowledge of other powerful seekers- on matters related to searching. We have for instance introduced this year our 'scrolls', a term that refers to some small -but powerful- searchbots (in this case small scripts written in php) that will allow searchers to find their useful quarries on any search engine they select, for how long as they select and as deep as they select.
For these reasons I have been asked by several friends and European universities to hold this kind of workshops.

All this, of course, does not mean much. The proof, as always, will be in the pudding. It will be up to you to evaluate this workshop after having attended it. Your critical comments and suggestions are welcome.

I won't give - in general - many references supporting my statements. As said, among searchers references should not be necessary. When I state -for instance- that the 'diameter' of the web is around 19 clicks, you have already enough angles to find all necessary references by yourselves, and then some. Thus you'll be able to 'find my missing footnotes' and check or refute such a statement on your own. That's the beauty of websearching... reminds early medieval "Quellenforschung" techniques. Yep, you should search for that lore as well :-)

A glimpse of the web
I cannot even conceive to begin a workshop about web searching without an attempt to show a searcher's "vision" of the web. It is important to understand how big Internet is and how it looks like (from a searcher's point of view). In fact, despite its width and complexity, a searcher will take comfort knowing that the Web 'diameter' will always remain narrow enough to allow any one who is capable to find anything he is looking for, irrespective of the amazing speed, and pace, of Internet's growth.

Now, to say that you can really find anything on the web would be a lie, but -as we will see- such a lie 'lies' less and less far from truth.
I'm not exaggerating. Whole national libraries are going massively on-line in some godforsaken country at this very moment. Tumbling prices for scanners, hard disks and web-connection made this possible. In fact every media product is most probably already on the web... somewhere, albeit buried under huge mountains of mindless futility and commercial crap.

So, how big is the web?

growth

As you can see from the graphic above, the web now measures something between 5 billions (5000 millions) indexed pages and 6 billions (6000 millions) indexed pages. Keeping in mind that the more powerful among the 'main' search engines (best non boolean: webtop and google; best boolean: fast and hotbot) cover on their own one fifth of this 'bulk' you'll easily realize why you'll need other means -and strategies- in order to search the web.
The figures above, however, are just my own collation from various surveys, surveys themselves based on tiny 'segments' of the web that have been object of specific research from 1999 onwards. The error margin is huge and the real figure, i.e. the real dimension of the web, is unfortunately anybody's guess. The two main 'philosophic schools' are at the moment
1) a 'slowing down' hypothesis -still not confirmed- that would give for august 2001 a dimension slightly above 5000 million pages.
2) a 'keeping the pace' hypothesis -still not confirmed- that would give for august 2001 a dimension slightly above 6000 million pages.

Anyway, as said, the diameter of the web is relatively narrow: 19 clicks. Studies have shown that you can go from any page of the web to any other page with a maximum of 19 clicks. Even if the dimensions of the web should further increase with the amazing pace shown in the recent years, it is unlikely that its diameter would ever reach a logarhitmic barrier fixed at 22 clicks.

This is, for obvious reasons, excellent news for seekers. It means, translated, that no matter how big the web is, or will ever be, you'll be able to find your quarry in a reasonable lapse of time... provided you know how to search, that is.

I would be at a loss to specify what exactly such "a reasonable lapse of time" should be. It depends of course on the kind of target (text, image, sound, person or event) and from the language you are searching in (English still commands 86% of the web).

Short term searching and a Long term searching
First of all, as we will see, there are two searching strategies: a "short term" searching and a "long term" searching. The "short" approach allows you to search 3/4 of the web, the "long term" one will give you (almost) the whole indexable web.
Let's say that in general, as a rule of thumb, if you don't find your short term quarry in 15 minutes, this means that your search strategy is wrong. Please note that finding simple 'short term' targets should NOT require more than 5 minutes.

Now, I hear you asking... how exactly do you accomplish that? How do you really find any target in few minutes?
Well, there are various approaches you may be tempted to explore, from the simplest, like combing, i.e. fishing inside usenet's newsgroups, perusing local messageboards, taking advantage of regional searchengines, private homepages repositories and webrings, to the more complex paths of writing your own bots and/or reversing commercial bots.
Each one of these techniques will allow you to gather 'angles' for a long term strategy.

Short term seeking and long term seeking are different beasts.
The various searching techniques that we will deal with in a minute will help you not only to increase the depth of your searches beyond the 33% that -in the best case- the main search engines will allow you to cover if used properly and together.
The various techniques will allow you also to gather various 'angles' onto which you will be able to base your long term searches, in order to 'grasp the guts' of the tantalizing 1/4 of the web that all the techniques we know of, together, will still not cover.

coverage

Yet, before examining the various techniques in depth, we will need to have a look at the structure of the web, which is of paramount importance for seekers, as you will, I'm sure, agree in a couple of minutes.

How does the web 'look like'? From a searcher's standpoint, the web looks like this:

growth

As you can see, there is a huge 'core', or 'bulk', made out of reciprocally linked pages. This is the CORE of the web, strongly interconnected pages, sites, Usenet newsgroups, messageboards, you name it. This is the 'web' as you know it, where you happily browse from link to link smearing all your personal data along, as we'll see later examining some anonymity themes.
It is not easy to represent the web in three dimensions. As a matter of fact the Nucleus is far from being the compact and uniform 'ball' of interconnected sites you are looking at in this image... you should think at a fractal like entity, with almost 'organic' features, with spaghetti-similar 'tubes' that quickly connect some areas while leaving 'link-holes' in many places. It would probably look like a huge chump of Gruyere :-)

As the image behind my shoulders should make clear, only a relatively small part of the Nucleus has been indexed by any search engine.
In fact the main search engines' coverage of the web is extremely limited:

growth

As you can see, the coverage of the bulk by the main search engines (which -note- overlap among themselves for almost the 60-80%) is at best around 25%. It has been calculated that with overlaps among them, the main search engines cover at best a meager third (33%) of the web.

Take note that a significant part of the huge bulk is USENET, huge mass of micro-information, which can frequently give quite useful results, especially for long term seeking (and combing, klebing and stalking) purposes.
There are some specific search engines used for usenet searching, the most famous one was for many years the famous dejavue (deja.com), which has been now acquired by google (under the name 'groups.google.com': http://groups.google.com/advanced_group_search).

So, back to our 'image' of the structure of the web.

For searching purposes the web is divided into parts that link to and/or are linked from the Nucleus.

First of all there's a big area called "Outside linked". This consists of pages and files that the Nucleus links towards, but that do NOT point reciprocally back to the Nucleus. Thus they are 'outside' the reciprocally connected bulk, yet they are not particularly difficult to find. In order to search this part of the web, mostly made of non-hidden databases (imagine an observatory linking to a huge collection of star images, for instance) you'll use the same techniques (power-searching, combing and building your own searchbots) commonly used in order to search the Nucleus.

Then there's another big area called "Outside linkers". The pages located in this area of the web "point" to the Nucleus but are not pointed back from it.
Imagine as an example the "personal links" pages of a scientist: lotta "juicy" links to the Nucleus yet no need for him to publicise their existence. There is -per definition- NO LINK from the Nucleus that you could use in order to browse and peruse back to this part of the Web.
Outside linkers are often VERY important: a page with information you may need is there, somewhere, yet without any link whatsoever that could bring you to it.
The "outside linkers" may hoard knowledge you need but are a part of the web you cannot reach using "normal" search techniques, since no link whatsoever points to them.
There are, fortunately, some techniques that you can apply in order to find them. The most common one being klebing, as we'll see in a minute.

Then we have a big area (or to be more precise, a very big quantity of small areas scattered around the Nucleus, but I didn't know how to draw it) of hidden databases which are connected to the bulk, but that can be accessed (in theory) only conditionally.
In fact these 'hidden databases' contain pages or files that the Nucleus points to, and that may (or may not) point back to the Nucleus. Yet for commercial (or other access-restrictive) reasons visitors of sites located here are supposed to "pay" (or adhere to some "clan") in order to access them. As you may imagine, these pages are NOT mutually linked (but they might point to some "outside linkers" as well).

Fortunately (for us, unfortunately for the commercial bastards) the web was originally built in order to share (and neither to hoard nor to sell) knowledge. And thus the building blocks, the "basic frames" behind the structure of the web are still helpful for whomever want to share.
If I may dare a 'crackers' comparison: exactly as it is pretty easy to break any software protection written in a higher language if you know (and use) some assembly, so it is easy to break any server-user delivered "barrier" to a given database if you know (and can "outflank" and/or exploit) the protocols used by browsers and servers.

Let's simply say that it is relatively easy to access all pages in this "hidden databases" area of the web reversing the (simple) perl or javascript tricks used to keep them "off limits" for zombies and lusers ("losers users"). A sound knowledge of Internet protocols and the ability of Guessing (which can be a very sofisticated Art) represent an incredibly powerful approach in this field.

How to search
So, now that we have examined the various areas of the web, let's begin to discuss which search strategies you should use, and where. Note that my aim is simply to present you a broad palette of the many possible approaches. The most important thing you should heed is that when searching the web you should use MANY different techniques. Of course you may start any broad query with the help of the main search engines.
Please note the plural: search engines: a common mistake is to 'get used' to a single main search engine, say google, and use always and only google henceforth. In fact you should always use AT LEAST a couple of them, preferably a non-boolean search engine (like google or webtop) and a boolean one (like fast, hotbot or altavista).
Yet, as we have seen, the main search engines DO NOT INDEX a huge part of the Web. Moreover the main search engines apply an active censorship to the results that they will give you (try searching for mp3 on anyone of the main search engines if you are not convinced).
If you limit your searches to the results offered by the main search engines you would therefore never have a chance to grasp 'the guts' of your signal. Hence the importance of the different searching techniques that we will now examine.

A good fundamental idea, when searching for something, is to search those who have already searched. In fact you can be sure that, whatever your interest, many others, somewhere and somewhen, will have already searched for it. Finding the results of their work can 'jumpstart' your own search forward, allow you to ride quickly on their shoulders and spare you a long tedious time searching your signal among the noise.

This kind of approach is called 'combing'.

You may comb either for those that have already searched and are willing to share or you may comb for both these and also those that have already searched and are not willing to share the information and knowledge they have found. In the first case the main approach involves usenet searching, regional searching and local searching (private messageboard perusing, private homepages repositories fishing, etcetera), in the second case you must further add less 'kosher' techniques like stalking, klebing, hacking, guessing and luring.

It is now high time to explain some of this terminology.

Usenet searching
Usenet is EXTREMELY useful for combing purposes because:

Many among those that employ usenet know something indeed. I'm always amazed to perceive the wondrous depth of knowledge possessed by many people that actively post on usenet. On the other hand I'm even more amazed when I discover that part of my audiences didn't even know that usenet exists.
Many usenet messages point to useful sites (a simple URL-gathering bot that you lash onto your selected usenet groups can quickly give you gorgeous results)
There's no point in reinventing a wheel that not only already exist in various forms, but will be most of then time FAR superior to the one you would have soon or later invented if you had had the time :-)
Note that there are special usenet services, like green egg, that automatically gather all URLs that appeared inside usenet messages. I'm sure you can understand how important this can be for searchers.

Regional searching
Regional searching means -in its simplest form- the use of geographically specific search engines. Of course a search for -say- Spanish search engines will give you different results if you search first a main search engine for 'spanish search engines' or if you rather search for 'buscadores hispanos'. Knowledge of languages is of paramount importance when performing regional searches. This may require some specific linguistic search on its own, but, believe me, the gems you'll find searching in Russian or Korean search engines for the local equivalent of 'warez' or 'gamez' will compensate you in abundantiam for the time spent in your own linguistic fine-tuning. FTP-fishing in foreign languages (say Chinese, Japanese, Russian, Polish) will also enable you to find some incredible jewels.
You should always remember that there is a very small overlap between main search engines and good (i.e. REALLY geographically specific) regional search engines. This means that you can get through regional engines to some area of the bulk where no main search engine ever dared to adventure.
As a 'global' starting point for regional searching, have a look at searchalot: http://www.searchalot.com/texis/open/internationalframer

Local searching
Encompasses, among other techniques, the following:

Private messageboard perusing
Private homepages repositories fishing
Public counters backtracing
Web-rings searching
Domain-names searching

Let's quickly examine these various 'local searching' approaches.

Private messageboard perusing
Employing this approach is useful because:

Not everyone makes use of usenet. Public and private messageboards abound on the web. You may find them trough the main and regional (and local) search engines or you may find them combing usenet (see above) or you may find them trough referral backtracing from your own loggings (klebing).
Some of these private messageboards are extremely active and are indeed visited by world-level experts.
People that have searched and are not willing to share the knowledge DO lurk actively on private messageboards, often disguising as newbies when seeking specific information. Simple stalking and luring techniques may come quite handy in these cases :-)

La va sans dire that in general you should not waste your own time perusing huge constellations of small messageboards. Ad hoc bots will allow you to grep (i.e automatically gather, collect and collate) quickly and regularly whatever you are looking for. For some examples in php-language (with source code of course) visit the php lab on my site.

Private homepages repositories fishing
This may, or may not bear fruits. Even using good bots, this kind of work can be extremely tedious. I have calculated that there are at the moment more than 1000 (one thousand!) private homepages 'free' repositories à la Geocities on the web, and that only some of them have good ad hoc search engines.
Yet the commonly upheld idea that 'serious' stuff will always necessarily be located on own specific domains and that, conversely, only amateurish stuff can be found on such private homepages repositories has been proved wrong often enough. Jewels and gems, among mountains of dull crap, shine on private homepages.
Furthermore: the inventiveness of their owners, often stimulated by the harsh censorship applied by the 'free' providers, will let you gasp in awe: porn images one pixel wide and one pixel high that will land 'unseen' in your caches, mp3 files with jpg extensions, disguised as modern art, dvd descramblers source code replicated using prime numbers, collections of tricks that -say- will allow the use of a gsm without paying stenographed inside dull image series à la 'me and my girlfriend Sissie visiting Dallas', and a thousand of other interesting tricks, least but not last the javascript acrobatics used in order to nuke and eliminate the compelled silly advertisement banners that these 'free' homepages coerce on the zombies that use them (just to make an example the HTML 'noembed' trick with a double html body line, but there are many more... search and you'll find).

La va sans dire that in general you should not waste your own time perusing huge constellations of private homepages. Ad hoc bots will allow you to grep quickly and regularly whatever info or knowledge you are looking for. For some bots-examples in php-language (with free source code, of course) visit the php lab on my site.

Public counters backtracing and other 'top 100' rankings
Many - usually small - individual sites on the web use public counters and statistical services. These often possess their own search engines and publish their 'top 100' rankings, that you can use to comb sites devoted to a specific topic.
You also have other, non public counter related, 'top 100' listings. These can at times be quite useful especially if they cover parts of the web that the main search engines do not cover. A good example is the russian 'rambler': for mp3... or for free soft and web tools... http://top100.rambler.ru/top100/Software/index.shtml.ru

Web-rings searching
A webring is a set of sites of similar topic, linked in a ring so that you could (in principle) navigate through all of them all the way back to the starting site.
Some rings have hundreds of sites, some one or two. The population shouldn't usually be considered when selecting promising rings, because a single site can give you all the links and contents you need, while a random (well, not so random) sampling of the sites of a huge ring will cover the important contents and links (which are obviously redundant if there are so many sites).
You may start with the RingSurf Directory at
http://www.ringsurf.com/ringsearch
and check Yahoo webrings at
http://search.webring.yahoo.com/search?ringsearch

Domain-names searching
This may ('may' underlined) be of some help when combing. It is commonly made trough netcraft. Try a search for domainnames that include your quarry and you may find, among many 'dormient' sites, i.e. sitenames that have already been bought but have not yet been built, some useful nuggets.
Try searching for the term "search" for instance and you will find thousands of domainnames dedicated to searching techniques AND to searching all kind of subjects :-)
In fact the importance of names in Internet should never be underestimated.

The above is nice and useful in order to search for idividuals that have nothing against sharing information and knowledge. Unfortunately the harsh reality is that many individual are commercially depravated, and believe it is their right to hide information, plundering the web to gather knowledge for free and trying to sell it, taking advantage of the ignorance of the zombies.

A typical case is represented by porn-sellers. As anyone knows, there are many sites on the web that require 'money' in order to access 'porn images'. This does not make any sense whatsoever, if you just think at the terabites of porn-images (truly for all tastes and perversities -) that are published everyday on the myriad of usenet's porn newsgroups for free.

But porn-images are just an example. There are millions of 'closed' (or 'hidden') databases on the web, where you are theoretically required to 'pay' or to adhere to some 'clan' in order to access.

Fortunately the web itself was built in order to share information, and not to hoard it. As a consequence it is relatively easy to access ANY closed database on the web and o retrieve any information or knowledge that has been published on Internet, no matter which precautions have been taken to keep you out of it.

Klebing and other not kosher techniques
It is now time to examine some of the techniques you can use in order to find out those that have searched the same nuggets you are looking for but -alas- do not want you to share their knowledge and -in general- would rather prefer to keep a low profile on a web they are plundering for money.

nonkosher

The most common approach is called 'klebing'.
This is nothing else than using the referrals inside your loggings. Often it is used together with simple luring techniques, like the following: let's say you are extremely interested in a specific topic, and you have already made your combing, and therefore you have already individuated the more useful usenet newsgroups, private messageboards and so on, and have also already collated a list of 'experts' on that specific topic.
Now create a page, or a site, with some of the valuable informations you have gathered, and publicize it on said messageboards. Some of those really interested in that stuff will immediately visit your page, thus leaving a referral in your loggings that may bring you to a 'secret' page of theirs, to another messageboard you did not know of or to the guts of an hidden database. This use of the referrals is in nuce klebing.
As strange as it may seem to you, very few will take the precaution to visit first a neutral page or to kill or fake their referrals trough proxonomitron or any other anonymizer.

Luring
Of course simple luring tricks can be used for all sort of purposes. Luring is for instance often used in order to caper lusers' email accounts, getting their passwords. This requires a two-steps luring approach. Part ONE: You open an account on -say- yahoo, of course, using a proxy and as always filling the questionaries with only faked data -remember that on the web you should always lie, the more the better, when asked for personal data- but instead of chosing a typical johnsmith@yahoo.com account you chose something like passwords@yahoo.com or passwordretrieval@yahoo.com or technicalservice@yahoo.com or whatever on such lines.
Now you are the proud owner of such an account and, part TWO, you visit a messageboard or newsgroup where lusers abound, for instance a 'wannabie hackers' messageboard, and publish there a message similar to the following one:
"Hey, I have discovered an incredible simple way to get other people passwords at yahoo! Simply send to passwords@yahoo.com an email with 'retrieval' as subject and with only three lines of text: your username, your password and the username of the person you want the password of. After a short while you will receive that password per email."
You wouldn't believe how many idiots will fall for such crap and send you their own passwords :-)

Until a few months ago you could REALLY have got an account like passwords@yahoo.com, but due to this kind of abuse terms like password are now been filtered for new usernames... yet this happens only in english... so you can still ask for motdepasses@yahoo.com and the same trick will work, for a smaller audience, in french... or in urdu or in whatever other language you chose.

The need to fight against advertisement

Advertisement has slowly pervaded every aspect of our lives. We live bombarded with images and sounds whose only purpose is to make us react, like Pavlov's dogs or zombies, in order to consume. We not only HAVE TO buy totally useless products: we have been transformed in products ourselves. Our enemies are now installing ads in the railway tunnels... you will have a stroboscopic effect while passing by. They are going to project ads on clouds and onto the evening sky. They have already transformed our whole life, for profit, in an inferno of unsolecited continuous consumistic pushes. Already now you cannot (and definitely in the future you will not) read, sleep, love, think or move around without being bombarded by ads, all intended to deceive you in order to push you to consume something, to buy something, to waste your life for "their" profit.

Let's begin examining which methods we can use to get rid of those useless ads we meet when searching. Ads in the software we use and ads on the site we visit. If you learn just some crumbs of the powerful reversing lore you'll be able to react.
As you'll see there are many possible options, and some of them -incredible but true- do actually seem to work, eheh.

A first crash-course in anti-advertisement should allow you to kill:
1.) Advertisement pushed by alien sites (HOSTS trick)
2.) Advertisement compelled by 'free' homepages providers à la Geocities (NOEMBED trick)
3.) Advertisement compelled by 'free' software you use ('procustian approach' to anner killing, see the OPERA 5.1 example below)

1.) Advertisement pushed by alien sites (HOSTS trick)

A very simple trick is to compile a new HOSTS file with a list of offending 'advertisement pusher' sites.
Windoze allows an effective way of filtering unwanted connections using the hosts file (this file sits in c:\windows and is called simply hosts, with no extension. You can have a look and modify it using any texteditor).. The hosts file is your 'personal DNS server': before accessing any remote computer through a name to address lookup, windows checks the hosts file to see if the name has been defined as an address ALREADY in hosts (note that this is meant to speed up the web connections you use more often: indeed if you know for sure that a site will always have the same IP, you can add it yourself to the hosts file and avoid the connection to the DNS server). Through use of the hosts file we can list the names of all servers we don't wish to contact by letting the server name match a different address. Typically the address used would be the 'bounce-back' localhost address 127.0.0.1.

2.) Advertisement compelled by 'free' homepages providers à la Geocities (NOEMBED trick)
'Free' homepage providers stick their ugly compulsory advertisement code either at the beginning or at the end of your own HTML code.
On services that stick their banner code at the very end of your page, the simplest way to eliminate the popups is to stick a "NOEMBED" tag after your closing "/HTML"
This seems to work the other way round as well: for services that stick their banner code at the beginning of your page, create a double <<head> and <<body> line, and stick a noembed after the first one and a <</noembed> before the second one :-)

3.) Eliminating advertisement banners in Opera 5.1.

As any searcher soon realizes, the most important 'tool of the trade' for a seeker is the BROWSER used in order to search information and knowledge on the Web.
The best tool, by far, is at the moment Opera, a browser that will offer you many advantages vis-à-vis the two browsersaurii Microsoft Explorer and Netscape, last but not least the possibility to kill image loading on the fly when visiting a site, thus speeding up considerably any serious work on the Web.
Unfortunately the programmers of Opera decided to embed in their great product a 'compelled advertisement': a rotating ad-banner on the top-right corner of the browser.
This kind of hideous advertisement is getting more and more common in the current software panorama, hence the importance of teaching anyone (and not just software reversing experts) how to nuke it.
'Opensourcing' windows software is -as you will now see- great fun, and will allow you to exert a considerable amount of power onto the software you use.
In comparison with version 5.0 the programmers at Opera have now, with version 5.1., tried to 'mask' the 'compelled advertisement' routine that their product would like to impose on viewers.
If you would have had a look at the disassembled code of the older version of Opera (5.0), you would immediately find the 'cursed' window, as I pointed out in a conference of mine at the Ecole Polytechnique, in Paris, some months ago:

:406D43 6A54           push 00000054
:406D45 6A40           push 00000040            ; heigth
:406D47 68D8010000     push 000001D8  		; width
:406D4C 81C128FEFFFF   add ecx, FFFFFE28
:406D52 57             push edi
:406D53 51             push ecx
:406D54 56             push esi
:406D55 FF7014         push [eax+14]
:406D58 FF15A0A55500   Call dword ptr [55A5A0]  ;SetWindowPos

Thus there was an extremely easy and simple way to kill the advertisements of this very good browser. This approach is interesting because it can easily be applied to all advertisement infested software:
1.) You simply get height and width of the hideous banner (64*472 pixels, which means in hexadecimal notation 0x40 height * 0x1d8 width)
2.) You disassemble your target using either wdasm 8.9 (quick and simple, but not very powerful) or IDA 4.04 (slow and complex, but extremely powerful).
3.) You search inside the thus produced dead listing (disassembled code) for the most 'uncommon' banner dimension (in this case the width: 1d8)
4.) You immediately land to the relevant part of the code of your target.
5.) Using any hexeditor you substitute the given width with -say- zero, thus annihilating the banner :-)
406D47 68D8010000 push 1D8
with
406D47 6800000000 push 0
and the advertisement is no more.
Of course it would be relatively easy to go further and just find out the routine that allows you to register your target. The 'width' approach above is in fact nonsense from a reversing point of view, yet it will immediately show to those that do not know a zilch what immense power reverse engineering can put in your hands.
In fact this approach is just 'cosmetic': the advertisements, alas, are still loaded in the background, and thus still slow down your browser even if you do not see them.
As I said it would be easy (but not correct vis-à-vis the programmers at Opera) to reverse the code in order to avoid loading the ads from the beginning, yet I just wanted to negate the very 'raison d'être' of the advertisement: be seen. Moreover to kill them in this way you don't need any real reversing skill... even your aunt could kill on these lines almost anyone of the eyesoring advertisements that infest today's software :-)

Most of the time things will be so easy, but at times there will be some (faible) attempts to 'hyde' the relevant code.
For instance recently (starting with the current version 5.1 of the browser) the programmers at Opera have introduced a more 'indirect' approach to our banner's 'magic dimensions'. Heigth and width of the advertisement rotating banners remain unchanged (0x40 * 0x1d8: a typical web-banner relation which is slightly superior to 1:7, in this case 7,375).
Yet this time you wont find any 1D8 nowhere in the code.
Alas for the programmers, it is easy to find out the relevant snippet of the code EVEN WITHOUT RELYING ON THE EXACT DIMENSIONS.
Obviously the 1D8 width is created 'on the fly', thus you can be sure that the ax register is used, most probably loaded with a 'near' value to the final one, and then something will be added to it (most probably 16, 8, 4, 2 or 1).
Searching for "mov eax, 000001D" (without the last number) you will quickly find out that 'our' width 1d8 is now 'created' starting from a 1d4, on its turn loaded through a subroutine ad hoc.
You will easily find the following two lines of code (Opera 5.10, build 902):

:407B4B E8033D0200   call 42B853                 ;call "mov eax, 1D4"
:407B50 8D7004       lea esi, dword ptr [eax+4]  ;add 4 & thus get 1D8

Hence we just need to modify a line of code inside the call 0042B853 routine:

:42B855 B8D4010000   mov eax, 000001D4

with

:42B855 B801000000   mov eax, 00000001

and we'll now have an only 5 pixel wide, definitely not eye-soring... see the small line on the right behind my shoulders? ...mini-advertisement window... Quod erat demonstrandi :-)