Data Mining in the Context of Entity Resolution [pdf]
Workshop on Data Mining for Business Applications at the 14th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD).

Sofus A. Macskassy and Evan S. Gamble

Abstract

We have encountered several practical issues in performing data mining on a database that has been normalized using entity resolution. We describe here four specific lessons learned in such mining and the meta-level lesson learned through dealing with these issues. The four specific lessons we describe deal with handling correlated values, getting canonical records, getting authoritative records and ensuring that relations are properly stored. The perhaps most important lesson learned is that one ought to know the kind of data mining is to be done on the data before designing the schema of the normalized database such that data specific to the mining is derivable from the database.