A testing based approach to identifying statistically significant communities in social networks

James Wilson, UNC (Statistics & Operations Research)
An important problem in the study of networks is how to divide the vertices of a given network into one or more groups, called communities, in such a way that vertices of the same community are more interconnected than vertices belonging to different ones. A large number of community detection methods assume that every vertex belongs to a well-defined community; however, in many applications, networks contain a significant number of non-preferentially attached “background” vertices that do not belong to any distinct community. In these applications, contemporary detection methods can over fit the network and provide misleading results due to false discovery. To address this issue, we incorporate a criterion for statistical significance based on p-values that measure the strength of connection between a single vertex and a set of vertices by comparison to a reference distribution derived from the configuration random graph model. We propose and investigate a testing based community extraction procedure that identifies statistically significant communities while distinguishing background vertices. We evaluate the performance and potential use of our method through its application to the Enron email network as well as the author’s Facebook network. Optimality properties of the extraction method will also be discussed.
September, 8 2014 | 12:30 - 14:00 | 230E Gross Hall

Return to seminar series