My interest in search engines had taken me to some reading on clustering. There are various experiments going on, some of them have even been transformed into products. Tara Calishain does a roundup of various options available. Alex Iskold did an overview of the clustering mechanim and its advantages.
It is really interesting, however, I would like to consider a different perspective. While searching the popular search engines return flat results by default. Flat meaning, there is not other dimension to it, they are results of the phrase search. If the search engine could identify the context in which I wanted to make the search, the results would be much more accurate. Instead of trying to guess the context, clustering provides you with the multiple contexts possible. It is like you search for a phrase, and then you are presented with another phrase by which your results are indexed.
For the technically inclined, checkout this tutorial on clustering. I am not sure if clustering is the best way to add the context. These contexts can be wrong if the phrase is not understood properly. One of the phrases I use regularly to test is book on google or book about google. This phrase usually gives more results from Google Book Search rather than about books on Google. I know natural language search is not very popular, but can considering the phrase rather than just the worlds and their order, together with clustering increase accuracy? Clustering is a popular way of organizing objects in groups, but can something that looks at the semantics help? Just a thought.



January 11th, 2007 at 11:01 am
[...] Martin Fowler illustrates the effect of the natural language homonyms in modeling and design. A book can either mean the literary work or its physical body. What do we mean by book is usually communicated by the context of the conversation. One realisation I have had in my software development experience that it is important to capture this context, and at the same time it is one of the most difficult things to do. Search engines have used clustering to solve a similar problem. [...]