The many faces of guilt-by-association [pdf]
In Proceedings of the Workshop on Information in Networks (WIN-2009).

Sofus A. Macskassy

Abstract

This past decade has seen a remarkable growth in the availability of networked data and interest in analyzing and understanding these networks. This interest spans a wide variety of fields, each having a different perspective and set of analytic tools they bring to bear. Many of the analytical tools available for large networks evolve around global and local metrics to characterize or describe the network. These methods are often generative such as the exponential models from social network analysis or the Kronecker models that have come out of computer science. We have also seen validation of the small worlds phenomena and in particular the six-degrees separation on data such as LinkedIn. More sophisticated tools such as clustering or relational learning are often only tractable on relatively small networks. We here consider the problem of analyzing the network in the context of within-network classification. In this setup, one is given a homogeneous weighted network that is partially labeled. A homogeneous network consists of one type of entity (e.g., all people or all companies) and has undirected weighted edges that represent the strength of the relation between the two entities (e.g., how often they call each other, cite each other, co-occur in text, etc.). Further, the network has been partially labeled, meaning that some of the nodes have been categorized or tagged. The core theme in current network mining and analysis literature is that many existing sophisticated methods are computationally expensive and not tractable on large networks. However, fast approximate methods can be often be used to address all of the above problems with little to no loss in performance of the final analysis over that of more sophisticated methods.