arrows in the web-void
  
"Link farmers: the snake oil seller kind"
by Various Authors
(How to evaluate fake sites)

first published at searchlores in April 2006

An evaluating essay, part of the evaluating results lore.

Well... what should I say? This is an interesting read for any seeker. For sure: this soi-disant 'blogger' is a easy debunking target, a real 'sitting duck silhouette' in front of the 'linkfarmscape'. In fact, the title of this essay could have been "Marcus P. Zillman, M.S., A.M.H.A: the snake oil seller", but after all not everything is that bad: some people may really like his videos, they can be funny and would probably even be really suitable for kids (if he would promote his own linkfarms a little bit less).

Anyway this short essay is in fact just a quick note. Yet this tiny messageboard discussion may result pretty useful for more global evaluation skills purposes. At the same time -and this is the point of the exercise- this discussion underlines the sad degeneration of google's defenses against spammers and linkfarmers.


Introduction   ~   1st msgbrd discussion   ~   some additions   ~   2nd msgbrd discussion


Introduction   


On the World Wide Web, a link farm is any group of web pages that all hyperlink to every other page in the group. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a search engine (sometimes called spamdexing).

On the other hand Snake oil is an expression applied metaphorically to any product with exaggerated marketing but questionable or unverifiable quality.

Spam and linkfarming are per se pretty OLD beasts roaming the searchscape.
Of course it has always been necessary to counter with clever counter-measures the most obnoxious practices of dubious SEOs, low-life link sellers (for a particularly pathetic one see www.linkworth.com), referral and blog comment spammer_slimes and other assorted clowns.
And countering spammers is exactly what serious search engines ARE (and should be) doing all the time. Spammers and Link sellers are not only sworn foes of knowledge spreading, and not only bandwidth thieves: they are enemies of any good functioning search engine. Their very purpose is to make everyone's searches less useful. They are and should be rightly penalized -or even better kicked out altogether from the result lists (SERPs). They MUST be kicked out: no search engine that cares to survive would of course want SERPs full of useless sites.
So far so good, but AS A CONSEQUENCE of this anti-spamming behavior of the main search engines, spammers are now more and more masking their spam-offer with semi-legitimate content.

We come back to link farm and snake oil dilemma: the problem, for seekers, is when both aspects appear indissolubly linked together, because it can be time consuming to evaluate properly the noise/signal ratio of a given target if it is at the same time BOTH a link farm and a snake-oil seller podium.

Techniques must be devised to quickly recognize the ‘snake-oilness’ or the ‘link-farmness’ of some results of our (or our own bots’) queries. The higher and higher levels of google spamming (as we will now see, this kind of spamming problems seem less pronounced in other search engines), are getting out of control.
Who, if not seekers can develop them? Witness the following discussion.

First messageboard discussion   


(Before this page was created)


http://www.virtualprivatelibrary.com/ (n/t) (06/04/06 03:24:30)

Ungomori



seekers sesame street :-) (06/04/06 15:34:26)
um yea, what is it? did you actually take a look at that page, try to find something useful on it? :)

i took a route down the "artificial-intelligence-resources.info", eheh the AI-links shows 20q.net at the top of the list :) .. while a cool funky AI application, i was kind of expecting to find some more serious research.

it seems to me that this mr zillman is trying to fetch pagerank, hits, blogwhatever, directory and all the rest in order to sell his books (which are, regardless of the subject of the page, linked *everywhere*)

actually it is getting rather interesting, browsing it more, it really looks like the entire site is completely FAKE and GENERATED :) it almost makes me wonder if the guy actually exists :) :) even his personal blog consists of nothing but a) the logo of a website b) a general description of a website, and all posted *exactly* either at 4.05, 4.10 or 4.15 am ... weeeeird :)

lol check out his videos at http://www.informationdetective.com/ :) section one, searchengines .. he should teach at kindergarten or something :) "let's go to google.com, that's g-o-o-g-l-e .. search engines are powered by ROBOTS, but we will call them Bots, that's right, we drop the R and the O, in order to make it shorter .. blablablabla" well that's just hilarious :)

ok maybe this one of his many lists of links is a bit useful:
deepwebresearch.info (making a link of this cause i think it's so far the only thing worthdeepwebresearch.info clicking -- ok also the videos are rather funny, a bit like seekers sesame street :) :) )

..arg i'm now hearing him advise people to go "intuitive", "just type what you're looking for in your browser address bar and add .com, .biz, .info (because it gives you IN FOR MA TION *brr*), .edu (yea right that would be shakespeare.edu) etc" .. go intuitive people! the way i hear him speak indicates that he's talking to an audience that wouldn't know the difference between an ad-infested labyrinth of cybersquatters and the info (without the dot) they're actually looking for.

ritz




dammit ritz, (06/04/06 15:44:44)
.., you made me smile ;)

ennan



Re: dammit ritz, (06/04/06 15:52:23)
yeah, that was good :)

/bow ritz

after some time you get a feeling for 'templated' sites. even more.. In fact I feel like I can tell after a few second if a page/site is "alive" or "dead". 'Dead' in the sense of automatically generated, without any life whatsoever behind it, no real content, no unique information. Nothing to do with "professional versus amateurs" or "new versus old" concepts.
An old homepage of Mr.XYZ talking about his dog, not updated for years, still feels alive when you see it. More than any of those .info sites (beware of .info. In itself it's a bad signal already)
See what I'm trying to point out?

loki



Re: Re: dammit ritz, (06/04/06 16:09:21)
> after some time you get a feeling for 'templated' sites. even more.. In fact
> I feel like I can tell after a few second if a page/site is "alive" or "dead".

myeah, i have yet to cultivate such a sense. the more straightforward way, which i used this time:
1) you hit the wall of the frontpage. everything looks interesting, but you don't know where to click or to start. you get a feeling this might be templated. (so far it could also be searchlores.org ;-)
2) now the simple acid test is: just try to find something genuinely useful or interesting on the site..
If you can't, after a few minutes of clicking around, throw the crap away. If you did find something interesting, it's most probably "alive" (and even if it isn't, there's still interesting stuff there, which can't make it all bad). Now you separated the searchlores from the informationdetective :)

Another indicator is simply if they're trying to sell you anything related to or equal to the information you're trying to find. If that's the case, go away because they will usually just tell you only part of the information to wet your appetite and you have to pay to get everything (unless of course you're looking for something which you don't mind paying for)

ritz


Some further additions


First of all, a tag too many "TM" tags all over the place :-)
Could be intended to blind TM-loving-zombies and know-nothing dudes into a false sense of trust: "if everything is trademarked, everything must be perfectly legal, uh?". As we are seeing, this is by far not the case with snake oil sellers. Probably the inverse rule is valid nowadays, and TM signs should be avoided :-)
Another possibility is that they are used to trick google's algos, see below.
In fact note at least a dozen TM-signs, many on the side frames, one even inside the title tag :-)
More generally: the use of trademarked signs, patent warnings, copyright histeries and legal mumbo-jumbo, to create an appearence of 'legality' is always a great big red 'caution' sign.
Usually (but by all means not always) people that really produce ideas or concretely spread knowledge don't use such kind of approach so extensively (if they use such notifications at all). For evaluation purposes this kind of repetitions are a tipical "scammer alarm" and/or "commercial useless site" warning.

Second, this guy repeats his own "fame and name" much too much. He makes such a song and dance of all that crap that it sounds pretty ridicolous already after a few lines.

Modus operandi he bought a zillion domains (for criss-cross linkfarming purposes) and has posted 2979 posts on those (seldom used) blogger profiles... geesh.

The gratuite autoreferences to his own pamphlets, the postings signed 'Marcus' on the blogs, are so continuous and overpushed that create a sort of solipsistic vertigo.

Third: What the quack is actually a M.S., A.M.H.A? There's a whole domain http://www.zillman.info/ that is supposed to explain us that, but it actually does not. As already observed by ritz, it's just repetitive blabbering and boasting, without nothing concrete, apart the linkfarm itself.
Searching with google, we discover that the snake seller, -not surprisingly- managed to occupy all positions for M.S., A.M.H.A. This increases the already pretty strong impression that both sigles (if they are intended as two separate ones - as the comma after that "M.S." implies) are just pulled from thin air like the rest of the text.
MS could mean Master of Science, but for all we know could also mean "Mediocre Spammer" :-)
Moreover AMHA could be everything: from "American Miniature Horse Association" through "Association of Mental Health Advocates" to "Alma Mater Humanitatis Asinorum".

But fourth, and ever more important, there's this sense of "template uselessness" loki spoke about. A fabric structure intrinsically made just to autoreference itself, like this one, a bot created merely for self-pushing commercial purposes, stinks just as what it is: a dead bug carcass, forgotten by some s.e. spiders in the middle of the web :-)

Alas. The interesting lesson from all this is a very sad one: if we retry the same search with Yahoo, the spammed results decrease from more than 30.000 to just two (well, six if you wide to http://search.yahoo.com/search?p=%22M.S.%2C+A.M.H.A%22&sm=Yahoo%21+Search&toggle=1&ei=UTF-8&fr=FP-tab-web-t&dups=1) and these two are probably kept mainly for didactical reasons. This means that it is now much too easy to spam google: yahoo filters are MUCH BETTER.

The advice is hence to use Yahoo more nowadays (despite it being a dangerous serial killer of good search engines like Alltheweb/Fast) together with ask (ex-teoma) and MSN, (unfortunately both not as good as yahoo, even if not as bad as google) and the various inktomi engines with their wondrous syntax,

fravia+



Second messageboard discussion   


(After this page was created)


Whoa (07/04/06 03:52:22)
Wow; I must admit that I didn't expect a whole essay to be created out of this ;).

so, let me try to answer some stuff asked by ritz:

>"did you actually take a look at that page, try to find something useful on it? >:)"

actually yes. true, there's a lot of crap. but some interesting things exist as well, and not only in "deep web research." (i found some interesting stuff under "research resources", "reference resources" and, interestingly, on "information quality resources" [isn't it ironic that for instance the following: http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/Evaluate.html -- comes from link on the latter?] as well as on a bunch of other pages). so the page passes the second ritz' test (imho).
bottom line is: yeah, there's a lot of spammish self-linking and the dude tries to sell some of his crap. i am in no way fond of these, we shall say, activities. but the question i want to ask is-- should we dismiss the whole site in its entirety? or maybe trying to separate the crap from the interesting stuff on it could be a better idea?

Ungomori



Re: spamfilters: G vs. Y (07/04/06 18:21:58)
I'm not sure that your MS AHMA test really does google justice. While it's true that yahoo apparently recognizes AMHA as a spam term, this doesn't really solve the spam problem.

What I mean is that MS AHMA is a unique signal that's generated by a unique target (our beloved Mr. Zillman) and when using such a query I would actually expect a search engine to point me to all of his sites.

The other (worse?) thing is the fact that the presence of AHMA on a page doesn't seem to affect yahoo's rating of that page. So "innocent" queries, like "intuitive searching" resources still lead us to Zillman's sites.

y (not the engine)



Re: Re: spamfilters: G vs. Y (07/04/06 21:25:51)
Yep, you'r right.
Yet the question remains: why should we compelled, with google, to add a
-MS.AHMA
in order to eliminate spam, when Yahoo (apparently) does it automatically?

And who (but us seekers) can anyway be interested in checking spam-centrico queries on the web?

If ever, google should present options the other way round (could be quite interesting btw :)
They could add a 'see spam as well' link after 'cache' (that as we have seen almost does not work no more) and 'similar pages' (or even integrate it in the similar pages)


Btw: have a look at this:
http://zillman.blogspot.com/rss/zillman.xml
the guy is just a script kid imho, maybe now even beginning to take himself seriously, similar to that 'johnny I hackstuff' kid we already discussed

il-li



Re: Re: Re: spamfilters: G vs. Y (07/04/06 22:34:41)
Yet the question remains: why should we compelled, with google, to add a
-MS.AHMA
in order to eliminate spam, when Yahoo (apparently) does it automatically?


Hm.. the point I was trying to make is that yahoo filters Zillman only if the query contains "M.S., A.M.H.A". So when searching for other stuff (see "intuitive searching" resources) you'll have to add a -MS.AHMA with yahoo as well.

Btw, I don't think that the two results we get from yahoo in fravia's query are there for didactic purposes. I have a sneaking suspicion that the results are the only two pages on the web that contain the string "M.S., A.M.H.A" and do not link back to Zillman's self referential structure.

y





(To be continued)




Petit image

(c) 3rd Millennium: [fravia+], all rights reserved, reversed, revealed and reviled... LOL