Contextual Linking Behavior of Bloggers: Leveraging text-mining to enable topic-based analysis
[pdf]
To be published in Social Network Analysis and Mining.
Read the official published paper online at Springer.
Abstract
The last decade has seen an explosion in blogging and the blogosphere
is continuing to grow, having a large global reach and many vibrant
communities. Researchers have been pouring over blog data with the
goal of finding communities, tracking what people are saying, finding
influencers, and using many social network analytic tools to analyze
the underlying social networks embedded within the blogosphere. One
of the key technical problems with analyzing large social networks
such as those embedded in the blogosphere is that there are many links
between individuals and we often do not know the context or meaning of
those links. This is problematic because it makes it difficult if not
impossible to tease out the true communities, their behavior, how
information flows, and who the central players are (if any). This
paper seeks to further our understanding of how to analyze large blog
networks and what they can tell us. We analyze 1.13M blogs posted
by 185K bloggers over a period of 3 weeks. These bloggers span
private blog sites through large blog-sites such as LiveJournal and
Blogger. We show that we can, in fact, tag links in meaningful ways
by leveraging topic-detection over the blogs themselves. We use these
topics to contextually tag links coming from a particular blog post.
This enrichment enables us to create smaller topic-specific graphs
which we can analyze in some depth. We show that these topic-specific
graphs not only have a different topology from the general blog graph
but also enable us to find central bloggers which were otherwise hard
to find. We further show that a temporal analysis identifies
behaviors in terms of how components form as well as how bloggers
continue to link after components form. These behaviors come to light
when doing an analysis on the topic-specific graphs but are hidden or
not easily discernable when analyzing the general blog graph.