Abstract
One challenge for content providers on the Web is determining who
consumes their content. For instance, online newspapers want to know
who is reading their articles. Previous approaches have tried to
determine such audience demographics by placing cookies on users'
systems, or by directly asking consumers (e.g., through surveys).
The first approach may make users uncomfortable, and the second is not
scalable. In this paper we focus on determining the demographics of a
Website's audience by analyzing the blogs that link to the Website.
We analyze both the text of the blogs and the network connectivity of
the blog network to determine demographics such as whether a person
"is married" or "has pets." Presumably bloggers linking to sites also
consume the content of those sites. Therefore, the discovered
demographics for the bloggers can be used to represent a proxy set of
demographics for a subset of the Website's consumers. We demonstrate
that in many cases we can infer sub-audiences for a site from these
demographics. Further, this feasibility demonstrates that very specic
demographics for sites can be generated as we improve the methods for
determining them (e.g., nding people who play video games). In our
study we analyze blogs collected from more than 590,000 bloggers
collected over a six month period that link to more than 488,000
distinct, external websites.