A pseudo press release from Microsoft discusses the work of Marc Najork, a researcher at MSR in Silicon Valley.
“I’m interested in Web search,†Najork explains, “and in particular, I’m interested in the ranking of search results.â€
That interest has resulted in the creation of the Scalable Hyperlink Store, a specialized database that distributes a compressed version of the entire Web graph across a series of computers to deliver fast access to hyperlink information.
“There are, customarily, three different ways to rank search results,†Najork says. “One way is to see how well the query terms correlate with what’s on a Web page. You might compare the query to the actual content of the page. That’s been well explored.
“Another way to look at it would be to see what pages are popular with users. Essentially, you get feedback from the actual user population. There has been work done in that area, as well.
“The third possibility is that you examine the link structure of the Web: how Web pages link to one another. If a Web page links to another Web page, then, presumably, it’s saying: ‘This other Web page is a good page; why don’t you visit that?’ People caught up to that a long time ago.â€
What hasn’t occurred, though, is a thorough evaluation of both types of algorithms that evaluate link structure. Query-independent algorithms, in which Web content is analyzed without taking a specific query into account, rely on the hypothesis that pages to which many other pages link are important and therefore elevated in the ranking of results to subsequent queries. This is the model for search technology, circa 2006.
