|
Presentation:
Investigating on the internet sometimes means looking for
a needle in a haystack. Information is not often easy to find,
either because rare (few webpages talk about it), either because
it is lost among thousands of other pages which deals with
similar ideas or similar "keywords". I'll try here
to explain my methodology to find information effectively
on the net.
My advices:
Preparing:
How to select your arsenal of tools
What you need for effective searches is to have a arsenal
of tools ready to use.
Search tools on the net includes:
-
search-engines
Now the most powerful search-engine is definitely
Google.com,
which allows web searching of text, images, news and newsgroup
messages (ex-Deja.com). Even if it is only when Google results
have been totally exploited that you may need to look at other
search-engines, it is good to know 2 or 3 other seach-engines.
In some places it is even compulsory: in China, for instance,
Google is often blocked by the authorities! Abondance.com
(in French) gives you good lists of search-engines & directories,
and their specificities.
Notes:
- Yahoo uses the same index than Google, so don't expect much
from Yahoo if you have access to Google.
- You can also use special software as a search-engine, which
crawl the results from many search-engines. Copernic
is a good example of that.
Tips with Google:
- To gain time, don't click directly on the links of Google
results, but (under Windows) right-click and choose "Open
link in a New window"; you can open several results very
quickly and when you'll be browsing the first page, the other
results will have time to finish being downloaded.
You can also change the Preferences of Google (specific to
the computer you're using) so that any click to a result open
a new window automatically.
- Google proposes cache of webpages, which means that if when
clicking on a link from Google results you get a "Page
cannot be displayed", you can go back to Google results
and see the page "cache", to get the information!
This is very powerful and useful.
- With Google you can use different interesting tips to precise
your keywords. These tips are described here.
The most useful are maybe the "..." which filter
the pages which only have the exact group of words that appears
between the brackets. The minus enable you to remove all the
pages that propose a word, for instance if your results are
polluted by porn pages, just put '-sex' (do not put the '),
etc.
-
newsgroups & forums
There are different means to browse Newsgroups. One
is to use a website (like Google in the "Newsgroups"
tab), the other one is to use a News client (for instance
Outlook Express offers you both Email and Newsgroup services).
Newsgroup search can be long, the best is to ask a question
in a dynamic newsgroup which deals with the topic you're looking
information on. It means that you need to know which Newsgroups
cover this topic... thanks to the name of the Newsgroup or
to the results of Google.
The problem is about the same for forums, you need to know
the websites which proposes dynamic forums on your topic.
I personnally use Newsgroup & Forums on regular (not new)
topics, because I know well the NG & Forums associated.
-
e(mail)-penpals
Your network of e-penpals can be also a very good
solution to answer a question... sometimes we don't think
of them! And if you feel that it is not the case, develop
your penpals network!
-
web chatrooms & software-based chat
For chatting with people you can either go on websites
that proposes this service (it will depend on the topic of
your research), or use the following software: ICQ, MSN Messenger,
Yahoo Messenger, AOL Instant Messenger, IRC. I personnally
use Trillian which merge all these software in one interface,
very convenient.
-
P2P Software
When you are looking for ebooks, video or music,
the P2P (Peer-to-peer) software are effective tools. The most
famous software are Naptster, Kazaa, WinMX, Gnutella, etc.
and the recent Tesla.
 |
Using
Search-engines: Choosing the right keywords
The first problem that faces most of internet users is bad
or poor results in their searches. The reason is 9/10 times
a bad choice of keywords. Too generic keywords will bring
you too many pages with very few connection with what you're
looking for. Too many precise keywords will bring you to empty
results.
To choose the right keywords, you need to:
- try to identify an "effective keyword",
by finding a word which is rare enough to give limited results
but which is 100% related to the information you want.
For instance if you're looking for a table to translate Korean
characters, the word "romanization" is exactly what
you need: it is not very common but used to describe exactly
what you're looking for.
Request: romanization
table Korean
- use not ambiguous words, choose words that
have not a double meaning, and if they do, precise them with
another word (if you can use "...", if they both
form a common combination of words).
For instance, if you look information on
a company producing TV shows called Case Production, the words
"case" and "production" are too generic.
Request: "case
production" entertainment
- think of words that have to be in the page
where the information you're looking for is. You have to imagine
the context where the information can be found.
For instance, if you look at the lyrics of "Music"
(Madonna), as far as both the title and author are very common
words, add one of the sentence of the song.
Request: music
madonna lyrics "I wanna dance with my baby"
- filter by adding/removing keywords in your
request.
For instance, if you're looking for information on a rare
strategy book written by Toshishiro Obata, you may start with
"toshishiro obata" but to remove all the websites
which deals with generic information on this Shinkendo master,
you'll add:
Request: "toshishiro
obata" +strategy +book
Note: the + is optional in Google, it is put here to highlight
the fact that you want only the webpages that have the word
"strategy book" in their content.
 |
Surfing:
How to get more information from information
When you'll have found information which is related to
what you're looking for, but you want to know more, you can:
- explore more deeply the website, using any mean, following
all the links, absorbing the whole website with a Offline
Browser and search it locally with Windows Search on text
in the pages, etc. A technique to try is to remove the name
of the file which appears in the web address, in order to
try to get the directory list of the folder where the page
is stored on the server. Sometimes, it shows you other pages
that would not be easy to find by following the links of the
website. You can remove also the last folder name and so on,
to explore all the folders of the website.
- something which costs nothing is to email the webmaster
and to directly ask for the information you're looking for.
It does not work all the time but it is not rare to find very
nice people who will answer you.
- and mostly, when you have got some precisions on the information
you're looking for, don't forget that what you have discovered
has to be re-used in your search-engine request, in order
to filter the pages again and get more precise results!
 |
Searching
into webpages
This tip is a very trivial, but how often I see people who
don't use it... when you look for precise info on the internet,
you often find webpages with long texts. As soon as the webpage
is downloaded (even before, if a part is already loaded),
you should use the Find tool of your browser (Control-F
for Internet Explorer) in order to go straight to the word
that interest you and to see if the information of this page
is valuable or if you can skip it and try another one.
 |
Looking
into the Source of the page
Looking into the source of the page (Menu View/Source
for Internet Explorer) means to look into the programming
code of the page. It is scary for people who have never do
any programmation, but it can be useful. If you know a bit
of HTML language, it will for instance enable you to steal
any picture, even if protected by the website. For not IT
skilled people, it can help find the webmaster name/email,
in the first lines of the pages, or also to see the list of
keywords that have been used to register this page in the
search-engines (all these info starts with "<meta
name=...>") and can help you find more precise keywords.
For looking for all the email address of a page, sometimes
they're hidden in links or forms, just do a search of the
@ character in the source of the page.
 |
Searching
in multi-languages
A limitation of your search is the languages that you
can master. But you don't need to speak a language to find
information in this language! Today, many "translation
assistants" can immediately (and for free) translate
a webpage for you. Of course the translation is not very accurate
and many mistakes can make it hard to re-use as it is, it
is only an "assistant", but it can give a good overall
idea of what the webpage is about. And after, why not trying
to contact the webmaster for more precisions, in English!
Translation assistants: AV
Babelfish, Systran,
Reverso,
WordLingo,
Google
linguistic tool, etc.
 |
 Case
study: Bwang, a martial arts from Micronesia
Here is a concrete example of investigation on the internet,
that happened to me few years ago. At the redaction of Karate-Bushido
magazine, the editor, Patrick Lombardo evocated a martial
arts he had heard of in the past and had no news of... it
was called Bwang. It was impossible to find anything in the
different resources we had. Back home, I decided to surf the
internet. At that time Google was not born and Altavista
was the most efficient seach-engine. But a search on Bwang
(Request: bwang)
did not give anything except links to people called B.Wang
or things like this. To filter these unvaluable results, I
added the word "martial" (Request: bwang
martial) and got a few pages. Some of them were things
like resume of some Mr B.Wang who have been practising Aikido
when they were young... and only one page talked about what
interested me. This page was a very simple HTML page, with
no link, only text, a bibliography (http://www.uog.edu/up/micronesica/indexes/toc.htm,
today the page has changed). In these references list, one
article was mentioned: "Bwang, A Martial Art of the
Caroline Islands, par William A. Lessa & Carlos G. Velez-I".
It was both a very interesting information, it meant that
the word Bwang was the right word with no mistake. This page
was the only one and was not linked to anything. So from that
point, my methodology gave me several means to continue. One
was to look again into the search-engine using the names of
the writers of this article in order maybe to contact them.
Another was to ask to my penpals who knows about that subject
(difficult in that case, the topic is too precise, too rare).
What I did was first to look at the source of the page and
I found the name of the one who created the webpage, but there
was no email. Then I removed the name of the HTML file in
the URL (http://www.uog.edu/up/micronesica/indexes/).
The website was the one of the University of Guam, an island
of Micronesia. By going up into the website, I managed to
find lists of people of the Univeristy and their email contact.
Then it was easy, I even managed to find the one who did the
webpage and contacted him. I managed to get a xerox-copy of
this article in the issue of Micronesica, the Journal of the
University of Guam (old from 1978) for the price of the postage,
and got a very nice article of several pages with technical
pictures of Bwang :-)
Note that the results have changed since that time because
my website and other websites who has visibly read my story
appears in the first ranks with info on Bwang. Moreover (and
fortunately) the Guam University website has been totally
re-designed since that time.
|