~ Introduction ~
         Petit image    Intro
duction
Version 6.12, August 2004
[lore: collective knowledge or learning on a particular subject]
That's exactly what you will find here: various web searching "lores"
(Note that the plural "lores" does not actually exist in the english language: I have made it up myself for reasons that are explained elsewhere)

This page presents various 'introductions' that purpose to give readers a 'taste' of the different possible searching approaches.
[Caetera desiderantur]   [Old Introduction] [Even older introduction]
[Reversing oddities & spammers] [Good seekers are dangerous]
[How many URLs do the search engines cover?]
+fravia's 're-ranking' trilogy [2002]
Part of a cycle of conferences: "Advanced searching: how to find webdiamonds among the commercial Sargassos"
[The yo-yo approach (Tackling the 'down yonder' problem: a discussion about search engines' "depth")]
[The synecdochical searching method (substituting a part for the whole when searching)]
[The epanaleptical approach (and other fuzzy searching tricks)]

[Seven searching snippets [2003]
Part of a workshops of mine in London:
Learning to transform questions into effective queries
(See the [hint & tips] page if you just want to start "working" quickly)

Caetera desiderantur
(actually: caetera desunt)
I have opened my www.searchlores.org, in Oz, in February 2000. Searchlores seems fairly popular: I receive on my main site alone an average of (around) a million hits per month, without counting the (many) hits on mirrors like www.searchlore.org in the States (note the missing "s" after lore), or www.fravia.com in Europe, or the other existing ones.

As per 2004 some sections of searchlores - as you will notice - are still missing, in fieri or incomplete.
This section as well: chaotic and incomplete, its purpose should be to give an idea of the variety and richness of our searching techniques. Maybe all these introductions are useless, and you would be better served reading some small specific essays, like the seven searching snippets "Learning to transform questions into effective queries" that I wrote in 2003.

Please note that you will not find any advertisements whatsoever on my sites: no banners to click on, no sponsors to promote, nothing. I don't need your money: I need your own knowledge, I need your feedback. My only hope is that you will, one beautiful day, contribute yourself to the vast wealth of knowledge.

This site is continuously updated, see the "news" section for ad hoc listings.

Some part of this site are getting obsolete, though: you will have to learn how to evaluate the material you find on the web. One relatively 'ancient' section is this very introduction. In 2000 for instance, when searchlores began, google was still in its infancy and the search engine of choice was still altavista.
This does not mean that you should now only use google, teoma or fast (considered nowadays the best search engines): Smaller search engines, like hotbot can reserve interesting surprises, and offer the possibility to search through powerful advanced filters, which let you pinpoint precisely what you want: domain search, region search, language search, words to include and words to exclude and so on.

An old introduction to web-searching
"Websearching, the sublime art"
(by ~S~ fravia+ 2000)

The web is uncharted and deep. At the time of writing this snippet of mine there are supposed to exist well over 1.500 million indexable pages, expanding exponentially. This is an underestimation: nobody really knows how many there are. The "richest" search engine covers at the moment only approximately less than a third of the existing total (In January 2000 Fast (alltheweb), with 300 millions pages, overtook Altavista and its 250 millions pages). The most "rich" search engine at the moment covers far less than a third of the existing total: May 2000: With 500 million pages the Inktomi powered engines did overtook Altavista (leaving it at 350 million pages) and Fast. Note that search engines will boast that they 'selected' a smaller amount of pages, and in reality have visited many more (a dual number approach now in vogue to cover search engines shortcomings :-) Fact is that their coverage is - even in the best and most optimistic hypothesis - meagre.
As you will see, there are different ways to search the web for nuggets among piles of commercial rubbish, "simple" methods but also other, less simple, paths. There are various possible 'strategic' approaches:
  1. You search yourself - searching
    • using the main search engines
    • using newsgroups
    • using messageboards
    • using maillists
  2. You search people that have already searched - luring, trolling, combing
  3. You follow seekers to where they come from - luring, trolling, klebing
  4. You discover, enter and use "hidden" information databases - seeking, hacking
  5. You write and use YOUR OWN searchbots and let them search for you - programming, algo-reversing
Note also that the PREPARATION phase (topic), the EVALUATION phase and the CONSOLIDATION (grepping) phase of each query are quite important "lores" per se.
You are embarking here on a very long voyage, at the end you'll be what I would like to call, lacking a better definition, a good seeker. As a consequence you will probably be able to find anything you may ever being looking for on the web.

Be warned! This knowledge will inter alia make you quite a dangerous person. You'll realize this perusing my site, if you didn't know it already. This possibility is at the same time the very reason for the existence of this site of mine: Indeed I'll try to teach and explain you some of the main necessary techniques and tricks used by able (and even some "master") seekers all over the web, but at the same time I'll do my best to (try to) keep you safely on that what I believe should be a "knowledge path".
My hope is that once in possession of this knowledge, you will remain on our side, helping us spread knowledge for free in a quickly disappearing web of knowledge, which has unfortunately been almost stomped to death by a web infested by those commercial barbarians that you will now find everywhere, zombies and lackeys of the slave-masters who use their sharp (and dangerous) horns of pushed advertisement and money in order to loot, rape and ultimately destroy all minds seeking knowledge.

For these reasons some sections of my site are dedicated to matters that I consider quite relevant for any good searcher: Anyway you are a host: you are not compelled to read or do anything I wish. It is up to you and you'll decide. I offer some knowledge for free, choose, pick, refuse whatever you will.
My hope is that some of you will help and contribute with their own work. I'm aware of the fact that there is no guarantee, though, never.


OLD INTRODUCTION

An older introduction to web-searching
(Websearching, the sublime art, by fravia+ 1997)

I see it coming... in a few years (actually already now, even if most zombies establishments have not yet realized it) one of the most important jobs will be, of course, websearcher.
We'll have many specialized branches: web-searchers, web-stalkers, web-seekers and so on. Zen and 'feeling' as well as a very broad 'global' knowledge will be required.
It's a good antidote to the hyperspecialisation that has nearly brought the whole silly commercial oriented society we are compelled to live in into a well deserved dead end: only large-minded, capable searchers will be able to keep the 'larger' over-perspective, and will be able to find ANYTHING they need (for free, of course), from Vivaldi's Concerto n.7 in F for four violins and cello (it's on the Web) through the second edition of the Police Criminelle, Technique et Tactique (it's on the Web) to A Western Australian survival kit for writing English (it's on the Web).
For the first time in the history of humanity, as long as you have web access it DOES NOT MATTER ANYMORE (for knowledge purposes) if you are located in a big rich city with huge libraries, good universities and a smart cultural life or if you happen to live in the middle of nowhere in a very poor country! The dream of the lighthouse guardian is now reality!
EVERYTHING is on the Web for free! I mean: any book, any newspaper, any university paper and any image, moreover (soon) any sound, any music, any film!
This means that - amidst mountains of useless garbage - ALL ACCUMULATED KNOWLEDGE is on the Web, free for you to discover and enjoy! If you still don't believe it, just learn how to search, you are in for some surprises!

So, what is a good searcher?
I sincerely hope you will be able to gain here some very handy knowledge, that I believe you will not easily find elsewhere. Anyway, I'm sure that the development of the Web (or at least of the still existing 'sound' part of it, neither commercialised nor brainwashed :-) will more and more underline the importance of these activities.
This whole endeavour is a 'living' workshop, of course, which will flourish gathering more and more additions from my readers (I sincerely hope that some "real" wizard searcher will join my efforts :-)
Hope to hear from you, and receive contributions from many searchers. Remember: we'll gain a lot only if we will be able to build on the shoulders of others, letting them build on our ones... if you just leech, you lose and we all lose at the same time!

Oddities and reversing spammers

As you probably already know the various advanced techniques you may use in order to search the web amount to a difficult and ill-understood art.

If you visit the various ad hoc pages for the main.htm search engines, you will be able to study some of their specific 'quirks'.
Some quirks are due to the specific algorithms that the search engines use. Searching is still far from being a completely understood science. There is an 'art' aspect (a 'lore' aspect IMO) that plays a role, as you'll see more often than not.

The imperative of preparing a good advanced query notwithstanding, all searchers like to try a few "quick searches" to test a search engine or a query idea. This happens continuously on our messageboards. Thus oddities are found. Typing in a few terms into a blank box and seeing what comes up can be great fun, since every now and then, sifting through a pile of less relevant material, you may even find some truly interesting results. More often, something appears that makes you wonder where it came from. These 'odd' results are at times worth investigating per se, since they can help you to reverse engineer the algos used by the main search engines.

Here some quick masks for the three best search engines:
 Always 100 rez, safe off
 
  fastsearching for:
Find this Phrase


Note that this search engines "reverse engineering" is actively performed by thousands and thousands of little commercial bastards, whose only aim is to spam each and every search engine with their pathetic commercial sites for banal profit purposes.

Yet even this kind of verminous activity can be useful for seekers: some of the tricks devised by those commercial hooligans, in order to spam the search engines, can open for us whole horizonts of new and useful techniques that we will use (and spread) in order to ELIMINATE those very spamming sites from the web landscape, when searching for knowledge.

In fact we can - and will - use those same tricks REVERSED, in order to cut our queries deep through the spam sites and catch the little (and often rare) gems we are looking for. I'll make an example: the very moment you find in a page images with single pixel width/height (webbugs) that are pointing to the main index page of a given site (an old Architext trick) you know that you have to do with some evil spammers. And you just need to filter such crap out from your result lists (SERPs) using simple specific filters... Perilli praemium adipiscunt! Eheh :-)

Other oddities that may appear in your SERPs are due to the fact that all search engines have different defaults and basic features, and thus their specific working is not always intuitive.

Often these different settings are the culprits that cause those unusual, funny or "false" results.

For example, the default for many Web engines is to OR terms together, then provide results based on relevancy. This combination produces a retrieval that has all terms present in the first hits, and then fewer terms as you move to the bottom of your SERPs. This explains why, even though your terms were "ORed" together, the last hits do not even contain all of your search terms. Unless you specifically ask to AND terms together, do not trust your search retrievals to accurately portray the number of hits from your search strategy.

Another typical default is automatic truncation on each term. So if your search is for web search you will also retrieve documents with the terms "searching," "searcher," and even "web-spiders" in them.

Another way of explaining "false" results is by determining exactly what the search engine is searching. Usually, the default is the URL, but sometimes a search engine retrieves texts where your search terms appear anywhere in the document. An address might include your search term, but the actual document may not show your term when retrieved.
Also never forget the quicksand nature of the web: you may retrieve a page that indeed had your term some time ago, but that has been updated in the mean time, whereby your term disappeared.
Seekers can also retrieve disappeared pages.
Pay close attention to any Web engine documentation to clarify just how and what it searches. All "searching mysteries" can be solved if you have the time and will to do so.


Good seekers are dangerous

My hope is that you will learn here, how to find every IMAGE, every SOUND and -especially- every SCRIPT or BOOK or SOFTWARE program known to man. As you will learn and understand perusing this side, there is no way anybody can put something on the web and block you from seeing it, given the weaknesses in all security protections actually available. Of course you should respect copyrights, yet you will be quite surprised by the incredible amount of knowledge and sheer information you will be able to gather from those that do not respect them. Keep in mind that a good searcher can develop into a very dangerous fellow, if needs be, since no knowledge known to man can be hidden from him. This notwithstanding, I hope you will strive to remain on the correct path and choose to diffuse knowledge instead of hoarding it. Believe me: you'll gain more than anybody else from this approach.

To state things even more clearly: I hope you will learn here how to find ANYTHING you may fancy for free (apart from the lot of your time and of your brain required to understand) as long as it is something that can be translated in the virtual world: images, books, ideas, source code, games, sounds, documents, applications, trends...
You are embarking here on a very long voyage. Good luck.
~S~ fravia+, February 2000



How many URLs do the search engines cover?

How many URLs do the search engines cover?
I discovered a long time ago an interesting trick to find it out, using Northernlight (once upon a time one of the top search engines with Fast, Teoma, Google & hotbot, this last being a very underrated search engine). You did just perform the following query:
[http://www.northernlight.com/nlquery.fcg?cb=0&qr=search+or+not+search&orl=2%3A1]
This querystring search or not search gave in March 2001 for Northernlight well over 322 million URLs... and the first positions were quite interesting per se.

The same search on Altavista
[http://www.altavista.com/cgi-bin/query?sc=on&hl=on&q=search+or+not+search&kl=XX&pg=q&Translate=on&text=yes&search=Search]
gave in 2001 "only" 224 million URLs, but this depends on the stop words used in Alta. End 2003 the results have been 33 mlillions. But if we search for a OR NOT a on Alta now (end 2003) we get 545 millions.
As you can see, both these engines have now slided into inisgnificance compared with big effective engines (the best in 2003) like Google, Teoma & Fast.

The real problem is WHICH URLs the search engines cover... alas the "largest" search engines cover (at best) only a tiny part of the web. Moreover they DO NOT index the most interesting parts of the web: they index commercial over educational sites, US sites over European sites and 'popular' sites (read sites loved by the zombies) over relatively unknown sites. Moreover in their commercial 'race' to the 'we have indexed one billion pages' tag, the biggest search engines have recently begun to bloat the indexes, including relatively 'useless' pages (say in those "3 billions" there is a whole collection of 2 millions pages with the images of 2 million different galaxies... that should have been more correctly considered a single database count).

This said (even if you'll have to learn other searching techniques, as you will see) the main search engines are far from being useless! Note for instance how inputting a long exact phrase you'll immediately find a specific page "cutting" it out of the "pudding" masse. For instance this very page!
Try Teoma's search for
"The web is uncharted and deep. At the time of writing this snippet of mine there are supposed to exist"
Try it also on Google, where you can use the "I'm feeling lucky" option as well...
The web is uncharted and deep. At the time of writing this snippet of mine there are supposed to exist"
Finally, try it on Fast: The web is uncharted and deep. At the time of writing this snippet of mine there are supposed to exist"
and see (and compare!) the results by yourself.

Yet the Web is a quicksand, and the algos are continuously a-changing, so that our techniques must evolve as well. As you'll realize perusing my site, there is a lot to learn and re-learn in this field. Collective work and the contributions of my readers are the sine qua non to keep abreast. Our ultimate aim, as always is "simply" to spread for free the light of our knowledge to anyone that cares.
Recent workshops of mine
may give you a more exact Idea of what this site is about


Recent workshops of mine

Leaving so soon?


Maybe you have enough. Maybe this looks too complex and awkward... Maybe you wish to go away... Well, I'll just quote from a nice 'novice computer guide':
Thank you for visiting my page, and I do hope you learn something, even if it's something as simple as learning the short cut key to open that darn file explorer (windows key + e) or the one to minimize all windows at once (windows key + m) or the short cut you can use when closing the window you'r currently using (ALT+F4).

Such are the mysterious ways of the web: teaching is easier than learning, learning is harder than forgetting, forgetting is easier than teaching: the moebius band revolves, we all gain, or loose.

finding or locating picture

(c) III Millennium: [fravia+], all rights reserved, reversed, deserved, revered, revealed and reviled