Suspicion scoring based on guilt-by-association, collective inference, and focused data access [postscript] [pdf]
International Conference on Intelligence Analysis, 2005

Sofus A. Macskassy and Foster J. Provost

Abstract

We describe a guilt-by-association system that can be used to rank entities by their suspiciousness. We demonstrate the algorithm on a suite of data sets generated by a terrorist-world simulator developed under a DoD program. The data sets consist of thousands of people and some known links between them. We show that the system ranks truly mali-cious individuals highly, even if only relatively few are known to be malicious ex ante. When used as a tool for identifying promising data-gathering opportunities, the sys-tem focuses on gathering more information about the most suspicious people and thereby increases the density of link-age in appropriate parts of the network. We assess per-formance under conditions of noisy prior knowledge (score quality varies by data set under moderate noise), and whether augmenting the network with prior scores based on profiling information improves the scoring (it doesnt). Al-though the level of performance reported here would not support direct action on all data sets, it does recommend the consideration of network-scoring techniques as a new source of evidence in decision making. For example, the system can operate on networks far larger and more com-plex than could be processed by a human analyst.