An Efficient Sequential Covering Algorithm for Explaining Subsets of Data [pdf]
Proceedings of the 2010 International Conference on Artificial Intelligence (ICAI).

Matthew Michelson and Sofus A. Macskassy

Abstract

Given a subset of data that differs from the rest, a user often wants an explanation as to why this is the case. For instance, in a database of flights, a user may want to understand why certain flights were very late. This paper presents ESCAPE, a sequential covering algorithm designed to generate explanations of subsets that take the form of disjunctive normal rules describing the characteristics ({attribute, value} pairs) that differentiates the subsets from the rest of the data. Our experiments demonstrate that ESCAPE discovers explanations that are both compact, in that just a few rules cover the subset, and specific, in that the rules cover the subset but not the rest of the data. Our experiments compare ESCAPE to RIPPER, a popular, traditional rule learning algorithm and show that ESCAPE's rules yield better covering explanations. Further, ESCAPE was designed to be efficient, and we formally demonstrate that ESCAPE runs in loglinear time.