Please wait a minute...
Tsinghua Science and Technology  2019, Vol. 24 Issue (06): 716-727    doi: 10.26599/TST.2018.9010106
REGULAR ARTICLES     
Local Community Detection Based on Network Motifs
Yunlei Zhang, Bin Wu*, Yu Liu, Jinna Lv
∙ Yunlei Zhang, Bin Wu, Yu Liu, and Jinna Lv are with the Beijing Key Laboratory of Intelligence Telecommunications Software and Multimedia, School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China. E-mail: yunlei0518@bupt.edu.cn; imyuliu@outlook.com; lvjinna@bupt.edu.cn.
Download: PDF (1576 KB)      HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Local community detection aims to find a cluster of nodes by exploring a small region of the network. Local community detection methods are faster than traditional global community detection methods because their runtime does not depend on the size of the entire network. However, most existing methods do not take the higher-order connectivity patterns crucial to the network into consideration. In this paper, we develop a new Local Community Detection method based on network Motif (LCD-Motif) which incorporates the higher-order network information. LCD-Motif adopts the local expansion of a seed set to identify the local community with minimal motif conductance, representing a generalization of the conductance metric for network motifs. In contrast to PageRank-like diffusion methods, LCD-Motif finds the community by seeking a sparse vector in the span of the local spectra, such that the seeds are in its support vector. We evaluate our approach using real-world datasets across various domains and synthetic networks. The experimental results show that LCD-Motif can achieve a higher performance than state-of-the-art methods.



Key wordscommunity detection      network motifs      local spectral clustering      seed set expansion      random walk     
Received: 19 March 2018      Published: 20 June 2019
Corresponding Authors: Bin Wu   
About author:

Jinna Lv received the BEng and MEng degrees from Zhengzhou University, China, in 2006 and 2009, respectively. She is a PhD candidate at Beijing University of Posts and Telecommunications since 2015. Her research interests include multimedia content analysis, social relation extraction, and social network analysis.

Cite this article:

Yunlei Zhang, Bin Wu, Yu Liu, Jinna Lv. Local Community Detection Based on Network Motifs. Tsinghua Science and Technology, 2019, 24(06): 716-727.

URL:

http://tst.tsinghuajournals.com/10.26599/TST.2018.9010106     OR     http://tst.tsinghuajournals.com/Y2019/V24/I06/716

Fig. 1 Network motifs (13 motifs in three-node motifs).
M𝟔, 𝒮?𝒮:1 denotes that node 1 is the seed node, the middle part is the motif adjacency matrix transformed from the original network, the right part is the local community which contains seed node 1 and three instances of motif M𝟔).">
Fig. 2 Illustration of motif adjacency matrix and local community detection (the given motif type is M𝟔, 𝒮?𝒮:1 denotes that node 1 is the seed node, the middle part is the motif adjacency matrix transformed from the original network, the right part is the local community which contains seed node 1 and three instances of motif M𝟔).
ParameterDescriptionValue
nNumber of nodes2000
μMixing parameter0.1, 0.2, 0.3,  , 0.8
kAverage degree10
kmaxMaximum degree50
|C|minMinimum community size20
|C|maxMaximum community size50
Table 1 Parameters for the LFR datasets.
DomainDatasetnmCCGT
Food chainFloridaBay12821060.3346Y
BiologyE.Coli4185190.0865Y
CitationHepPh34 546421 5780.2848N
SocialSlashdot77 360905 4680.0555N
WWWWebStanford281 9032 312 4970.5976N
Table 2 Statistics for the networks. n: number of nodes; m: number of edges; CC: clustering coefficient of the network; GT: ground truth.
MethodLFR1LFR2LFR3LFR4LFR5LFR6LFR7LFR8
HeatKernel0.0400.0440.0600.0560.1310.1570.1670.183
LEMON0.9500.8730.7680.6320.4660.3450.2030.093
PPR0.9140.3910.2220.1260.1440.1610.1720.194
SSE0.3140.1960.1200.1210.1320.1430.1370.154
LinLog+Motif1.00.4250.2300.1720.1140.0370.0520.037
MAPPR1.00.4720.3080.1390.1230.0590.0740.013
TECTONIC0.9910.4740.3030.1330.1210.0430.1180.008
LCD-Motif0.9700.9540.9370.8870.8290.8320.4780.188
Table 3 Average F1 score on LFR1–LFR8 datasets corresponding mixing parameter from 0.1 to 0.8.
Fig. 3 Average F1 score on LFR datasets with varying mixing parameter from 0.1 to 0.8.
MethodF1MC
FloridaBayE.ColiHepPhStanfordWebSlashdot
HeatKernel0.10.290.9810.0860.993
LEMON0.6670.2730.9830.9600.567
PPR0.070.420.9840.1330.999
SSE0.30.140.5070.0310.561
LinLog+Motif0.1430.1940.172
MAPPR0.5780.90.2260.9160.606
TECTONIC0.2860.3640.8340.7780.407
LCD-Motif0.7350.9410.2590.3370.391
Table 4 Performance of methods on 5 public realworld networks (MC denotes motif conductance).
Fig. 4 Performance of methods on realworld networks.
Fig. 5 Motif conductance on realworld networks with varying dimension l and random walk step k, respectively.
y^.">
Fig. 6 Motif conductance of different motifs in HepPh network. Size of set is the size of the community truncated from the sorted vector y^.
[1]   Fortunato S., Community detection in graphs, Phys. Rep., vol. 486, nos. 3–5, pp. 75-174, 2010.
[2]   Li Y. X., He K., Bindel D., and Hopcroft J. E., Uncovering the small community structure in large networks: A local spectral approach, in Proc. 24th Int. Conf. World Wide Web, Florence, Italy, 2015, pp. 658-668.
[3]   Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D., and Alon U., Network motifs: Simple building blocks of complex networks, Science, vol. 298, no. 5594, pp. 824-827, 2002.
[4]   Mangan S. and Alon U., Structure and function of the feed-forward loop network motif, Proc. Nat. Acad. Sci. USA, vol. 100, no. 21, pp. 11 980-11 985, 2003.
[5]   Holland P. W. and Leinhardt S., A method for detecting structure in sociometric data, Am. J. Sociol., vol. 76, no. 3, pp. 492-513, 1970.
[6]   Honey C. J., K?tter R., Breakspear M., and Sporns O., Network structure of cerebral cortex shapes functional connectivity on multiple time scales, Proc. Nat. Acad. Sci. USA, vol. 104, no. 24, pp. 10 240-10 245, 2007.
[7]   Rosvall M., Esquivel A. V., Lancichinetti A., West J. D., and Lambiotte R., Memory in network flows and its effects on spreading dynamics and community detection, Nat. Commun., vol. 5, p. 4630, 2014.
[8]   Benson A. R., Gleich D. F., and Leskovec J., Higher-order organization of complex networks, Science, vol. 353, no. 6295, pp. 163-166, 2016.
[9]   Fortunato S. and Hric D., Community detection in networks: A user guide, Phys. Rep., vol. 659, pp. 1-44, 2016.
[10]   Lim S. and Lee J. G., Motif-based embedding for graph clustering, J. Stat. Mech., vol. 2016, no. 12, p. 123 401, 2016.
[11]   Yin H., Benson A. R., Leskovec J., and Gleich D. F., Local higher-order graph clustering, in Proc. 23rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 555-564.
[12]   Chen L., Zhang J., Cai L. J., and Deng Z. Y., Fast community detection based on distance dynamics, Tsinghua Sci. Technol., vol. 22, no. 6, pp. 564-585, 2017.
[13]   Wu B., Xiao Y., and Zhang Y. L., Parallel incremental dynamic community detection algorithm based on Spark, (in Chinese), J. Tsinghua Univ. (Sci. Technol.), vol. 57, no. 10, pp. 1030-1037, 2017.
[14]   Tsourakakis C. E., Pachocki J., and Mitzenmacher M., Scalable motif-aware graph clustering, in Proc. 26th Int. Conf. on World Wide Web, Perth, Australia, 2017, pp. 1451-1460.
[15]   Andersen R. and Lang K. J., Communities from seed sets, in Proc. 15th Int. Conf. on World Wide Web, Edinburgh, Scotland, 2006, pp. 223-232.
[16]   Whang J. J., Gleich D. F., and Dhillon I. S., Overlapping community detection using neighborhood-inflated seed expansion, IEEE Trans. Knowl. Data Eng., vol. 28, no. 5, pp. 1272-1284, 2016.
[17]   Kloumann I. M. and Kleinberg J. M., Community membership identification from small seed sets, in Proc. 20th ACM Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 1366-1375.
[18]   Kloster K. and Gleich D. F., Heat kernel based community detection, in Proc. 20th ACM Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 1386-1395.
[19]   Latapy M., Main-memory triangle computations for very large (sparse (power-law)) graphs, Theor. Comput. Sci., vol. 407, nos. 1–3, pp. 458-473, 2008.
[20]   Megiddo N., Linear programming in linear time when the dimension is fixed, J. ACM, vol. 31, no. 1, pp. 114-127, 1984.
[21]   Andrea L., Santo F., and Filippo R., Benchmark graphs for testing community detection algorithms, Phy. Rev. E Stat. Nonlinear Soft Matter Phys., vol. 78, no. 4, p. 046110, 2008.
[1] Wei Zhang, Feng Kong, Liming Yang, Yunfang Chen, Mengyuan Zhang. Hierarchical Community Detection Based on Partial Matrix Convergence Using Random Walks[J]. Tsinghua Science and Technology, 2018, 23(1): 35-46.
[2] Lei Chen,Jing Zhang,Lijun Cai,Ziyun Deng. Fast Community Detection Based on Distance Dynamics[J]. Tsinghua Science and Technology, 2017, 22(6): 564-585.
[3] Benika Hall,Andrew Quitadamo,Xinghua Shi. Identifying MicroRNA and Gene Expression Networks Using Graph Communities[J]. Tsinghua Science and Technology, 2016, 21(2): 176-195.
[4] . Locating Highly Connected Nodes in P2P Networks with Heterogeneous Structures[J]. Tsinghua Science and Technology, 2009, 14(4): 465-469.