~ Essays ~
         to essays    essays
(Courtesy of fravia's advanced searching lores)

(¯`·.¸ Getting a list of ALL the IPs that are hosted at the -say- Isle of Man ¸.·´¯)
by ~S~ Humphrey P.
published at fravia's searchlores in June 2000

Woah! What for gems I find on my messageboards! Again and again Humphrey demonstrates his incredible capacities. Cutting through the web like the Master surgeon he is, this great seeker will deliver in this essay enough data to allow my readers to digest (and hopefully also burp) new knowledge for at least a couple of months...

"Getting a list of ALL the IPs that are hosted at the -say- Isle of Man"

by ~S~ Humphrey P.

How can I get a list of ALL the IPs that are hosted at the Isle of Man (Europe), with the corresponding names ? I know the extension is .im, but I need com/net/org/etc... as well.

Well, alltheweb lets you put in Domain Filters: Only Include [im]

And AltaVista has [domain:.im]

You can ask for Isle Of Man with [gov] etc. as well, and sort out the ones you don't want.

But, with search engines, your problem is with the word 'ALL.' How can you guarantee you've got the sites of all the people on the Isle of Man, when none of the search engines cover all of the web? Let alone, all the Isle O' Manner's who use Geocities, Anglefire, Altern.org and such...

So, what you really want is what the NIC has down in their list of paying URL customers whose address is Isle Of Man?

Who is the DNS lookup authority. Grab a list. Grep it.


Perhaps search engines specific to .IM would be most knowledgeable?

Let's take BluePolar's suggestion to use Netcraft, for instance, to find about how many sites we are talking about...
[site ends with] [.im]


That's very few. Compared to big islands...

I see religion could have a big part in this.

Do you see in any of those 254 any search engines, or directories?

Lets try it on alltheweb, just to see how hard Netcraft is trying.

Here: hold my "Jean-Yves Tadié" for a minute, and blank out everything but:
Domain Filter: Only Include: [im]

"4890 documents found."

Documents is different from the number of sites, of course.

(Thread #1. What does Netcraft index? Is it looking at DNS tables, only?)

AllTheWeb popped up Isle of Man government right away. Which suggests it's trying to do some webpage priority ordering.

If we'd use the word 'search' or 'directory' we could pop up something useful...

But, just paging through the...

There! http://www.nic.im/exist.html Isle of Man Domain Registry.

It's in the form of an enquiry. There must be more at that site. One could hope it's the official NIC site for the domain IM. Probably what Netcraft is looking at.

Just for fun, what can AltaVista do?

http://www.altavista.com/cgi-bin/query?kl=XX&pg=aq&text=yes&q=domain%3A im&search=Search
"About 4,215 pages found"

Again, pages are not unique sites.

(Thread #2: there's a way in altavista to compress or combine the multiple page findings. How do you do that in alltheweb? Also, there must be a clever way to get only the xxx.yyy.im part, and not the /zzz/aaa/bbb.htm or whatever.)

You could get whatever you want to bubble to the top by putting the word in Sort by: []

for instance:
Boolean query: [domain:im]
Sort by: [.gov]
"1,152 pages found"

There's all the www.gov.im/ pages listed first.

Hmmm. about 3,000 pages didn't have anything to do with .gov!? But, 1/4 of all of them did. I thought there would be less government with little places.

Hmmm. The www.gov.im doesn't work all the time.

Boolean query: [domain:im]
Sort by: [.nic]
"12 pages found"

As we see on the http://www.nic.im/what.html Isle of Man Domain Registry page, by asking for domain .im, we have picked up the: co.im, ltd.co.im, plc.co.im, net.im, gov.im, org.im, nic.im, and ac.im domains.

Of what you asked for, that leaves what? .edu? (I'd bet .gov is hogged by the US .gov.) .com?

What does the www.nic.im say about these? They should be the authority, right?

(Thread #3: Each country takes it's own money, right? Keeps it's own list... But there's more than one DNS lookup; and the big sites grab a list for their own quick lookup. We should be able to grab a list. Or find a list...)

You can see they say the "domain's designated managers" Advanced Systems Consultants have an http://www.advsys.co.uk address. Ha! They don't live there. It's the Bertrand Russell paradox: does the barber who only shaves those who do not shave themselves, shave himself?

Brings us all back to "What do you want?"
a) All the domain .IM urls? - Netcraft will give you a list, instead of the inquiry you are getting at www.nic.im
b) All urls with physical address on Isle Of Man? - some DNS lists somewhere. (yes, i'm teasing you. this is probably what you want, isn't it?) (Do you know of some .com, and .edu which are there? Does www.nic.im know about them? Who else knows about them?)
c) All spidered and indexed web site pages which mention Isle of Man? - perhaps a MetaSearch engine.
d) All the people on the Isle of Man who are connected to the internet? have a website? - ah, telephone book? demographic survey? i dunno.

What do you want?

Sounds like SPAM to us!


Now, let's see, where was I... Oh yeah, "Jean-Yves Tadié"

Humphrey P., June 2000

Petit image

(c) 2000: [fravia+], all rights reserved