Challenges in Web Search Engines

Web Search–Google
More on the Google/Anti-Semitic Site Story
Important and interesting reads from Seth Finkelstein and Danny Sullivan. No need to comment on this specific issue again but a couple of comments about the issue of search engine manipulation.

Last October, I commented that while most of the press coverage was focusing on paid inclusion (which Google doesn’t offer) and paid placement and its potential effects on the web searcher, it was hard to find press coverage that organic search results can be manipulated (yes, even Google’s results). This manipulation is the nature of the beast (we should learn to deal with it), and another reminder that general web engines are more than just “research tools” like a librarian might think of Dialog, LN, Factiva, and many others. Finkelstein correctly points out, “Google ranks popularity, not authority. And popularity is a measure which is vulnerable to many games. Any system of evaluation is subject to manipulation.” While link analysis is similar in many ways to citation analysis, tools like ISI’s Citation Indexes and ISI’s Impact Factors are less susceptible to manipulation (NOT totally free of it) because it’s a much smaller universe of material to control.

Let’s remember web engines are also advertising/marketing vehicles. As Danny points out, results appearing in the 20th position are all but invisible to the average searcher. Sullivan’s comments remind me of what someone told me at a presentation for the book I co-authored with Chris Sherman. A member of the audience told me that Chris and I failed to mention a large portion of the Invisible Web in our book. After taking a deep breath, I asked her what we forgot. She told me that for many searchers if it’s not in the first five or seven results it’s all but invisible. She was right!

The power searcher needs, first, to be aware of this issue and, second, to utilize advanced search syntax, term selection, specialized databases and other tools to assist in producing more precise result sets. This can help minimize problems. I also think that Teoma’s method of determining relevance might be less susceptible to manipulation.

See Also: Challenges in Web Search Engines
This twelve-page paper was written by Dr. Monika Henzinger (Research Director, Google), Dr. Rajeev Motwani (Professor at Stanford) and Dr. Craig Silverstein (Director of Technology, Google). From the abstract, “…article presents a high-level discussion of some of the problems with information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.” Content quality, spam, cloaking, duplicate hosts and vaguely structured data are some of the topics discussed.

See Also, Full Text, Just Released, Web Spam Taxonomy
From the abstract, “Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures.”

Comments are closed.