Abstract
The last decade has seen an explosion in blogging and the blogosphere
is continuing to grow, having a large global reach and many vibrant
communities. Researchers have been pouring over blog data with the
goal of finding communities, tracking what people are saying, finding
influencers, and using many social network analytic tools to analyze
the underlying social networks embedded within the blogosphere. One
of the key technical problems with analyzing large social networks
such as those embedded in the blogosphere is that there are many links
between individuals and we often do not know the context or meaning of
those links. This is problematic because it makes it difficult if not
impossible to tease out the true communities, their behavior, how
information flows, and who the central players are (if any). This
paper seeks to further our understanding of how to analyze large blog
networks and what they can tell us. We analyze 1.24M blogs posted by
298K bloggers over a period of three weeks. These bloggers span
private blog sites through large blog-sites such as livejournal and
blogspot. We first characterize the behavior of bloggers, validating
some (but not all) common beliefs about how often bloggers post, how
long their posts are, who they link to and how much reciprocity there
is in links. We then take a look at bloggers from the larger blog
sites to understand whether and how they differ in terms of these
metrics. Finally, we extend our analysis to focus on contextual
links: what is the textual content of the blog which had a link. We
identify topics from the textual content of all the blog posts and use
these to tag links based on the topics that were discussed in the
blog.