Linking in Social Media Does Not a Community Make [pdf]
In Proceedings of the Workshop on Information in Networks (WIN), New York, New York, 2010.

Sofus A. Macskassy and Matthew Michelson

Abstract

Community detection algorithms have received significant attention in recent years. The most common approaches take a graph (such as a social network) and split it into k disjoint clusters, where each cluster supposedly represents a "community" in that graph. This kind of approach is appropriate when one can reasonably expect that there is a clear enough signal in the graph, such that the found communities are likely to represent real sub-communities. For example, in Zachary's Karate Club, we have personal interactions between people, and we can identify the two groups that the club eventually splits into. Voting records in Congress can (with some accuracy) split into two clusters based on party affiliation, and sports-networks (team playing against team) can be split into regions. In particular, this kind of approach works well on relatively well-defined, small networks, with a single well-defined and appropriate semantic interpretation to the edges. Depending on the domain, it is also important that the networks are collected and aggregated over a small timeframe. However, the assumptions the above methods rely on start to break down when we want to identify communities in online social media such as Facebook, LinkedIn, Twitter, Digg, the Blogosphere, Flickr, etc. In these cases, the social graph is an exceedingly large and dynamic network (thousands if not millions of links and content are created every day), where relations between people are not clearly defined, and where the notion of a community itself may not be well-defined. Despite these difficulties, being able to identify and characterize online communities can be incredibly useful across a broad array of applications. Once found, we can gain deep insight into what moves the communities and their constituents making it possible to rapidly identify community-specific problems, needs, interests, etc. Even modest improvements in solving this problem can yield significant changes in how government, national security, industry and academia can use social media. In this paper we first define the problem more formally, and then outline possible ways to address the problem, focusing on one approach in particular.