Abstract
This past decade has seen a remarkable growth in the availability of
networked data and interest in analyzing and understanding these
networks. This interest spans a wide variety of fields, each having a
different perspective and set of analytic tools they bring to bear.
Many of the analytical tools available for large networks evolve
around global and local metrics to characterize or describe the
network. These methods are often generative such as the exponential
models from social network analysis or the Kronecker models that have
come out of computer science. We have also seen validation of the
small worlds phenomena and in particular the six-degrees separation on
data such as LinkedIn. More sophisticated tools such as clustering or
relational learning are often only tractable on relatively small networks.
We here consider the problem of analyzing the network in the context
of within-network classification. In
this setup, one is given a homogeneous weighted network that is
partially labeled. A homogeneous network consists of one type of
entity (e.g., all people or all companies) and has undirected weighted
edges that represent the strength of the relation between the two
entities (e.g., how often they call each other, cite each other,
co-occur in text, etc.). Further, the network has been partially
labeled, meaning that some of the nodes have been categorized or
tagged.
The core theme in current network mining and analysis literature is
that many existing sophisticated methods are computationally expensive
and not tractable on large networks. However, fast approximate
methods can be often be used to address all of the above problems with
little to no loss in performance of the final analysis over that of
more sophisticated methods.