Simple Models and Classification in Networked Data [postscript] [pdf]
CeDER Working Paper 03-04, Stern School of Business, New York University, NY, NY 10012. 2004.

Sofus A. Macskassy, Foster J. Provost

Abstract

When entities are linked by explicit relations, classification methods that take advantage of the network can perform substantially better than methods that ignore the network. This paper argues that studies of relational classification in networked data should include simple network-only methods as baselines for comparison, in addition to the non-relational baselines that generally are used. In particular, comparing more complex algorithms with algorithms that only consider the network (and not the features of the entities) allows one to factor out the contribution of the network structure itself to the predictive power of the model. We examine several simple methods for network-only classification on previously used relational data sets, and show that they can perform remarkably well. The results demonstrate that the inclusion of network-only classifiers can shed new light on studies of relational learners.