Sunday, December 14, 2008

More search questions

As usual, tough to drop a subject once you start thinking about it.

A couple of things I did not go into in my previous post on search [http://architectguy.blogspot.com/2008/12/search-questions.html] that I think are relevant to the discussion:

- The reputation system needs to be made contextual (yes, here it comes again). The computation of the reputation for a given document needs to take into account the context / motivation the search is taking place in. For example, a document on operating system may have very little reputation overall, but it talks about a microkernel architecture it is highly relevant for those searching for information on software architecture. The reputation of that document should be much higher in the context of a search for information on software architecture than it is in context-less searches.
What that means is essentially that the current focus on just hits is not enough. Google has of course realized that (everything I say they probably have thought of for decades) and has introduced tons of tweaks to the algorithm, including giving more weight to certain referrers. But the key issue remains: the reputation is uni-dimensional and too much information gets lost in that compression.
I think a system that takes into account semantic information provided by the sites (explicit or calculated) may help create multi-dimensional reputation scores that could be leveraged by searches for which the context or motivation is available (again explicitly - which I believe in - or explicitly - which I find tough to imagine at this point).

- Part of the context is "time". When we make searches, the recency of the documents returned by the search is frequently not well managed. The algorithms are tweaked to manage recency, but there is contextual information that should allow the search to be smarter on how it orders what it presents. Again, I believe that is part of the context.

- An issue with personalization is, again, the problem that the motivation for a search is contextual to the point where taking a past search into account to decide on the answers for the future ones may be counterproductive.
This is not new, it has been commented on many times as part of the feedback on large commercial recommendation capabilities (Amazon, TiVo): make a few searches for an exceptional reason (buying a present to a friend with very different tastes to yours, gathering information on a given subject to write a paper you will never look at again, etc..), and mis-targeted recommendations will start coming. These systems cannot guess the context, and hence cannot know when to remember about your interactions, when to forget about them. And both learning and forgetting are key.
With no context on the motivation for the search, only minimal personalization is possible. I have two Amazon accounts - one personal, one professional, precisely to avoid the pollution of the recommendations for one with that for the other.

I am certain lots of people more educated than me on these issues have already explored these issues and provided their opinion on how relevant they are.

I am certain the technical challenges are significant, but I believe the combination of verticalized searches (those for which you have an easy way of specifying the motivation / context for the search), semantically tagged documents, and semantically aware reputation systems will enable better searches, and more targeted monetizable ads.

This has been fun, moving on to other preoccupations.

No comments: