Improving Within-Network Classification with Local Attributes [postscript] [pdf]
Appears in the Workshop on Text-Mining and Link Analysis (Textlink) at the Twentieth International Joint Conference on Artificial Intelligence, January 7, 2007, Hydarabad, India.

Sofus A. Macskassy

Abstract

This paper is about using multiple types of information for classification of networked data in the transductive setting: given a network with some nodes labeled, predict the labels of the remaining nodes. One method recently developed for doing such inference is a guilt-by-association model. This method has been independently developed in two different settings. One setting assumes that the networked data has explicit links such as hyperlinks between web-pages or citations between research papers. The second setting assumes a corpus of non-relational data and creates links based on similarity measures between the instances. Both use only the known labels in the network to predict the remaining labels but use very different information sources. The thesis of of this paper is that if we were to combine the two types of links, the resulting network would carry more information than either type of link by itself. This thesis is tested on six benchmark data sets where we show that this is indeed correct. We further do a sensitivity study on how many links should be created, showing that the combined network gets most of its immediate gain using only a few extra links.