Graph Mining using Graph Pattern Profiles [pdf]
Proceedings of the 2008 International Conference on Artificial Intelligence (ICAI).

Sofus A. Macskassy and Claude C. Nanjo

Abstract

This paper presents our investigation into graph mining methods to help users understand large graphs. Our approach is a two-step process: First calculate subgraph labels and then calculate distribution statistics on these labels. Our approach is flexible in that it can identify a range of patterns from very abstract to very specific (e.g., isomorphisms). The statistics that we calculate can be used to find rare and common patterns, patterns that are (dis)similar to the distribution of induced subgraphs of the same size, patterns that are (dis)similar to each other, as well as variance of graph patterns given a specific set of input node types. We also investigate a method to understand structural characteristics by analyzing clusters that are created by ``collapsing'' overlapping instances of user-specified patterns. We evaluated our approach on two publicly available networks---the Texas CS web-site from WebKB and the internet movie database.