New Research Paper: Web Content Categorization Using Link Information
Web Content Categorization Using Link Information
A new research paper by Jan Pedersen, Yahoo, Zoltan Gyongyi, Stanford and Hector Garcia-Molina, Stanford.
From the abstract:
Document categorization is one of the foundational problems in (web) information retrieval. Even though web documents are hyperlinked, most proposed classification techniques take little advantage of the link structure and rely primarily on text features, as it is not immediately clear how to make link information intelligible to supervised machine learning algorithms. This paper introduces a link-based approach to classification, which can be used in isolation or in conjunction with text-based classification. Various large-scale experimental results indicate that link-based classification is on par with text-based classification, and the combination of the two offers the best of both worlds.
Direct to Full Text (PDF; 24 pages)
Source: Stanford Info Lab
