Mining frequent neighborhood patterns in a large labeled graph

/ Authors

/ Abstract

Over the years, frequent subgraphs have been an important kind of targeted pattern in pattern mining research, where most approaches deal with databases holding a number of graph transactions, e.g., the chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google's Knowledge Graph and Facebook's social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support have resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets pattern mining in the single-graph setting. We propose mining a new class of patterns called frequent neighborhood patterns, which is free from the "DCP-intuitiveness" dilemma of mining frequent subgraphs in a single graph. A neighborhood is a specific topological pattern in which a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant interpretations as subgraph patterns. Experiments on real-life datasets support the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered by prior methods.

Journal: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

DOI: 10.1145/2505515.2505530