Linking in Social Media Does Not a Community Make
[pdf]
In Proceedings of the Workshop
on Information in Networks (WIN), New York, New York, 2010.
Abstract
Community detection algorithms have received significant attention in
recent years. The most common approaches take a graph (such as a
social network) and split it into k disjoint clusters, where each
cluster supposedly represents a "community" in that graph.
This kind of approach is appropriate when one can reasonably expect
that there is a clear enough signal in the graph, such that the found
communities are likely to represent real sub-communities. For example,
in Zachary's Karate Club, we have personal interactions between
people, and we can identify the two groups that the club eventually
splits into. Voting records in Congress can (with some accuracy)
split into two clusters based on party affiliation, and
sports-networks (team playing against team) can be split into regions.
In particular, this kind of approach works well on relatively
well-defined, small networks, with a single well-defined and
appropriate semantic interpretation to the edges. Depending on the
domain, it is also important that the networks are collected and
aggregated over a small timeframe.
However, the assumptions the above methods rely on start to break down
when we want to identify communities in online social media such as
Facebook, LinkedIn, Twitter, Digg, the Blogosphere, Flickr, etc. In
these cases, the social graph is an exceedingly large and dynamic
network (thousands if not millions of links and content are created
every day), where relations between people are not clearly defined,
and where the notion of a community itself may not be well-defined.
Despite these difficulties, being able to identify and characterize
online communities can be incredibly useful across a broad array of
applications. Once found, we can gain deep insight into what moves
the communities and their constituents making it possible to rapidly
identify community-specific problems, needs, interests, etc. Even
modest improvements in solving this problem can yield significant
changes in how government, national security, industry and academia
can use social media.
In this paper we first define the problem more formally, and then
outline possible ways to address the problem, focusing on one approach
in particular.