Fast Influence-based Coarsening for Large Networks
Manish Purohit† manishp@cs.umd.edu B. Aditya Prakash⇤ badityap@cs.vt.edu Chanhyun Kang† chanhyun@cs.umd.edu V. S. Subrahmanian† vs@cs.umd.edu Yao Zhang⇤ yaozhang@cs.vt.edu Computer Science Department, Virginia Tech., USA
Department of Computer Science, University of Maryland - College Park, USA
⇤
†
ABSTRACT
sheer size of today’s large social networks makes it challenging to perform sophisticated network analysis.
Given a propagation graph, possibly learnt from cascade analysis, is it possible to get a smaller nearly di↵usionequivalent representation for it? Getting a smaller equivalent graph will help multiple algorithmic and data mining tasks like influence maximization, immunization, understanding cascade data and data compression. In this paper, we study a novel graph coarsening problem with the aim of approximating a large social network by a much smaller graph that approximately preserves the network structure.
Our primary goal is to find a compact representation of a large graph such that di↵usion and propagation processes on the large graph can be studied by analyzing the smaller representation. Intuitively, most of the edges in a real network are relatively unimportant; hence we propose characterizing and “contracting” precisely such edges in a graph to obtain a coarse representation.
The main contributions of this paper are:
Given a social network, can we quickly ‘zoom-out’ of the graph? Is there a smaller equivalent representation of the graph that preserves its propagation characteristics? Can we group nodes together based on their influence properties?
These are important problems with applications to influence analysis, epidemiology and viral marketing applications.
In this paper, we first formulate a novel Graph Coarsening
Problem to find a succinct representation of any graph while preserving key characteristics for di↵usion processes on that graph. We then provide a fast and e↵ective near-linear-time
(in nodes and edges) algorithm coarseNet for the same.
Using extensive experiments on multiple real datasets, we demonstrate the quality and scalability of coarseNet, enabling us to reduce the graph by 90% in some cases without much loss of information. Finally we also show how our method can help in diverse applications like influence maximization and detecting patterns of propagation at the level of automatically created groups on real cascade data.
(a) Problem Formulation: We carefully formulate a novel
Graph Coarsening Problem (GCP) to find a succinct representation of a given social network so that the di↵usion characteristics of the network are mostly preserved.
(b) E cient Algorithms: We develop coarseNet, an efficient (near-linear time) and e↵ective algorithm for
GCP, using careful approximations. We show that due to our novel scoring technique, the coarsened graph retains most of the di↵usive properties of the original network. (c) Extensive Experiments: We show that coarseNet is able to coarsen graphs up to 90% without much loss of key information. We also demonstrate the usefulness of our approach via a number of interesting applications. A major application we consider in this work is that of influence maximization in the Independent Cascade model. We propose a framework cspin that involves coarsening the graph and then solving influence maximization on the smaller graph to obtain high quality solutions. As the coarsened graph is much smaller than the original graph, the influence maximization algorithm runs orders of magnitude faster on the coarsened graph. Further using real cascade data from Flixster, we show how GCP can potentially help in understanding propagation data and constructing non-network surrogates for finding nodes with similar influence. Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications—
Data Mining