Improving Search Relevance

Submitted by guy on Tue, 2005-06-14 01:46.

Search relevance determines the quality of a search engine. An ideal search engine should return a list of search results where the first ranked one is the most reliable and useful result to its targeted audience.

Google's proprietary technology PageRank determines the relevance largely by popularity: "In essence, Google interprets a link from page A to page B as a vote, by page A, for page B." More weighted votes a page gets, more popular it becomes to Google.

For structured data, such as a database that DBSight searches, PageRank no longer applies because there's no such a thing as link structure. Then, how to "calculate" a document's popularity to help improving the search relevance?

Thanks to the internet, nowadays before buying a product, from as small as a pocket-size digital camera, to as big as an SUV, we go online and read user reviews. Most time we tend to choose the one with the best reviews among a group of similar products. In other words, in the cyberspace, review/rating equals quality/credibility. Just look at EBay, a key component of its ecosystem is its seller's rating.

Now it seems natural to augment DBSight with a user rating system. For example, place five stars next to each search result, and let user click one of the stars to give the document a 1 to 5 scale rating after viewing it. Thus, a document's quality/popularity can be reflected on its average rating and the number of users who have voted.

Just as in PageRank "Votes cast by pages that are themselves 'important' weigh more heavily", it is possible to extend the rating system so that votes by certain "power users" weigh more heavily than those by normal users. In a software company, an architect could be a power user.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Submitted by bpang on Tue, 2005-06-14 11:47.

Interesting perspective. However, it has a few dependencies. First, you have to institute a system that strongly encourages users to rate the file. Second, how do you prevent people from gaming the system (e.g. writing a lot of good reviews of one file to boost its rating)? The link analysis largely solves the problem first because of the huge scale of the internet and second, through the weighting of each site. Maybe you can also do a weighting of the user in some way. Third, How do you rank a file with 10 great reviews and 2 bad reviews vs. 100 files with 80 good reviews and 20 bad reviews?

It's an interesting topic to explore.

Submitted by guy on Tue, 2005-06-14 15:02.

The first issue is in fact a common problem to tackle for building a successful web-based business. It's particularly difficult for a search application in that user normally leaves the search result page immediately after getting the information he's looking for.

The second might not be an issue for enterprise search. It's unlikely that internal users adversely manipulate the ranking on the intranet.