Web Organization and Searching
Source: Research/Penn State
The Work of Lee Giles
Read about the important and interesting work of Lee Giles at Penn State University. From the article, Lee Giles is not much interested in surfing. Mining and extraction are terms more to his liking. Giles, the David Reese professor of information sciences and technology at Penn State, has devoted his career to finding better ways to get at information, to wring the most out of it, to marshal it efficiently. A few key passages follow. Make sure to read the entire article.
-
+ The Web exists as a distributed sort of information base, Giles says, with typical understatement. Un-regulated, decentralized, the work of tens of millions of disparate authors, and constantly growing at an ever-accelerating rate, the Web is no easy object to take the measure of. Yet characterizing the Web, understanding its parameters and its behavior, was the first thing Giles set about doing. Whats there, how it is connected, how it changes, who uses it, why they use it the more you know about these things, the more efficiently youre able to use it, he says.
-
+ In another study, published last year in the Proceedings of the National Academy of Sciences, he and his co-authors challenge the widely held notion that the competition for attention on the Web is purely winner-take-all, i.e., that new sites on the Web are more likely to attach themselves to sites that already have many links, insuring that a small number of established sites will always receive a disproportionate share of Web traffic. While this preferential behavior does accurately describe the Web as a whole, Giles and his co-authors write, it varies significantly by the type of site considered. Thus, while a new newspaper or entertainment site might find it difficult competing with similar sites that are already popular, university sites and the pages of individual scientists exhibit a more egalitarian link growth. The behavior is more complicated than had been thought, Giles says.
-
+ But automatic engines have their limitations, too. For one thing, most current crawlers are unable to recognize spam, which in this context means unreliable information. In the unregulated environment of the Web, Giles says, people claiming to be what theyre not is an ongoing problem.
-
+ A more praccompletelytion [to completley personalized search tool], at least in the short term, is what Giles calls the niche search engine, designed specifically to meet the needs of a group of people with similar interests: employees of a company, say, or members of a profession. By limiting its crawling to a specific subject area, the niche engine can burrow deeper, providing more consistently useful information. A prime example is CiteSeer [aka ResearchIndex], a tool that Giles and Steve Lawrence created for the field of computer and information science. .
Note: We completely agree with Dr. Giles. Those of you who read ResourceShelf on a regular basis know that we try hard to provide info about useful specialized and ‘niche’ search tools.
—-
—-
See Also: eBizSearch
Another niche search tool that Giles has developed. It focuses on materials about electronic business. eBizSearch was a Resource of the Week when it was officially launched in January, 2003.
–
See Also: Direct to Lee Giles Home Page
Plenty of interesting reading here.
