Flexible query formulation for federated search [pdf]
In the Seventh International Workshop on Information Integration on the Web (IIWeb 2009).

Matthew Michelson, Sofus A. Macskassy, and Steve N. Minton

Abstract

One common framework for data integration in practice is federated search. Here an agent queries disjoint sources simultaneously, and then clusters the returned records in the absence of unique keys. However, formulating the correct queries to the sources can be challenging because of the possible query value variations. For instance, some sources may contain a first name as "John" while other sources use the name "Jonathan" for the same person. If the underlying sources do not support sophisticated matching then a single query of "John" will miss many records from the "Jonathan" sources. This paper presents an approach to formulating queries for federated search that leverages automatically discovered transformations such as synonyms and abbreviations to create the set of possible queries for the given sources. Our preliminary results demonstrate that indeed, transformations mined from a subset of sources will apply to a new, distinct source, thereby allowing query expansions based on the discovered transformations.