Abstract
This paper describes a system to help intelligence analysts track and
analyze information being published in multiple sources, particularly
open sources on the Web. The system integrates technology for Web
harvesting, natural language extraction, and network analytics, and
allows analysts to view and explore the results via a Web application.
One of the difficult problems we address is the entity resolution
problem, which occurs when there are multiple, differing ways to refer
to the same entity. The problem is particularly complex when noisy
data is being aggregated over time, there is no clean master list of
entities, and the entities under investigation are intentionally being
deceptive. Our system must not only perform entity resolution with
noisy data, but must also gracefully recover when entity resolution
mistakes are subsequently corrected. We present a case study in arms
trafficking that illustrates the issues, and describe how they are
addressed.