Blocking Blog Spam with Language Model Disagreement
by Gilad Mishne, University of Amsterdam and David Carmel & Ronny Lempel, IBM Research Lab in Haifa.
6 pages; PDF.
Tech stuff but the abstract can clue us non-techies into the main concepts:
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. In contrast to other link spam filtering approaches, our method requires no training, no hard-coded rule sets, and no knowledge of complete-web connectivity. Preliminary experiments with identification of typical blog spam show promising results.
