Abstract
This paper is about using multiple types of information for
classification of networked data in the transductive setting: given a
network with some nodes labeled, predict the labels of the remaining
nodes. One method recently developed for doing such inference is a
guilt-by-association model. This method has been independently
developed in two different settings. One setting assumes that the
networked data has explicit links such as hyperlinks between web-pages
or citations between research papers. The second setting assumes a
corpus of non-relational data and creates links based on similarity
measures between the instances. Both use only the known labels in the
network to predict the remaining labels but use very different
information sources. The thesis of of this paper is that if we were
to combine the two types of links, the resulting network would carry
more information than either type of link by itself. This thesis is
tested on six benchmark data sets where we show that this is indeed
correct. We further do a sensitivity study on how many links should
be created, showing that the combined network gets most of its
immediate gain using only a few extra links.