An Efficient Sequential Covering Algorithm for Explaining Subsets of Data
[pdf]
Proceedings of the 2010 International Conference on Artificial Intelligence (ICAI).
Abstract
Given a subset of data that differs from the rest, a user often wants
an explanation as to why this is the case. For instance, in a
database of flights, a user may want to understand why certain flights
were very late. This paper presents ESCAPE, a sequential covering
algorithm designed to generate explanations of subsets that take the
form of disjunctive normal rules describing the characteristics
({attribute, value} pairs) that differentiates the subsets from the
rest of the data. Our experiments demonstrate that ESCAPE discovers
explanations that are both compact, in that just a few rules cover the
subset, and specific, in that the rules cover the subset but not the
rest of the data. Our experiments compare ESCAPE to RIPPER, a
popular, traditional rule learning algorithm and show that ESCAPE's
rules yield better covering explanations. Further, ESCAPE was
designed to be efficient, and we formally demonstrate that ESCAPE runs
in loglinear time.