Please wait a minute...
Tsinghua Science and Technology  2021, Vol. 26 Issue (4): 536-547    doi: 10.26599/TST.2020.9010024
    
Incremental Face Clustering with Optimal Summary Learning Via Graph Convolutional Network
Xuan Zhao,Zhongdao Wang,Lei Gao,Yali Li,Shengjin Wang*()
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.
First Research Institute of Ministry of Public Security, Beijing 100084, China.
Download: PDF (10270 KB)      HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

In this study, we address the problems encountered by incremental face clustering. Without the benefit of having observed the entire data distribution, incremental face clustering is more challenging than static dataset clustering. Conventional methods rely on the statistical information of previous clusters to improve the efficiency of incremental clustering; thus, error accumulation may occur. Therefore, this study proposes to predict the summaries of previous data directly from data distribution via supervised learning. Moreover, an efficient framework to cluster previous summaries with new data is explored. Although learning summaries from original data costs more than those from previous clusters, the entire framework consumes just a little bit more time because clustering current data and generating summaries for new data share most of the calculations. Experiments show that the proposed approach significantly outperforms the existing incremental face clustering methods, as evidenced by the improvement of average F-score from 0.644 to 0.762. Compared with state-of-the-art static face clustering methods, our method can yield comparable accuracy while consuming much less time.



Key wordsincremental face clustering      supervised learning      Graph Convolutional Network (GCN)      optimal summary learning     
Received: 13 July 2020      Published: 12 January 2021
Fund:  National Natural Science Foundation of China(61701277);State Key Development Program in 13th Five-Year(2016YFB0801301)
Corresponding Authors: Shengjin Wang     E-mail: wgsgj@tsinghua.edu.cn
About author: Xuan Zhao is a PhD candidate at the Department of Electronic Engineering, Tsinghua Univerisity, advised by Prof. Shengjin Wang. Before that, he received the bachelor and master degrees from Tsinghua University in 2004 and 2007, respectively. His research interests include image processing, pattern recognition, computer vision, video analysis, etc.|Zhongdao Wang is a PhD candidate at the Department of Electronic Engineering, Tsinghua Univerisity, advised by Prof. Shengjin Wang. Before that, he received the bachelor degree in mathematics and physics from Tsinghua University in 2017. His research interests are computer vision and machine learning.|Lei Gao received the PhD degree in computer science from Beihang University, Beijing, China in 2008. He is an associate professor at the First Research Institute of Ministry of Public Security. His current research interests include computer vision and artificial intelligence.|Yali Li received the BE degree with excellent graduates award from Nanjing University in 2007 and the PhD degree from Tsinghua University in 2013. Currently, she is a research assistant at Department of Electronic Engineering, Tsinghua University. Her research interests include image processing, pattern recognition, computer vision, video analysis, etc.|Shengjin Wang received the BEng from Tsinghua University in 1985 and the PhD degree from Tokyo Institute of Technology, Tokyo, Japan in 1997. From May 1997 to August 2003, he was a member of the research staff at Internet System Research Laboratories, NEC Corporation, Japan. Since September 2003, he has been a professor at Department of Electronic Engineering, Tsinghua University, where he is currently also a director of the Research Institute of Image and Graphics (RIIG). He has published more than 50 papers on image processing. He is the holder of ten patents. His current research interests include image processing, computer vision, video surveillance, and pattern recognition.
Cite this article:

Xuan Zhao,Zhongdao Wang,Lei Gao,Yali Li,Shengjin Wang. Incremental Face Clustering with Optimal Summary Learning Via Graph Convolutional Network. Tsinghua Science and Technology, 2021, 26(4): 536-547.

URL:

http://tst.tsinghuajournals.com/10.26599/TST.2020.9010024     OR     http://tst.tsinghuajournals.com/Y2021/V26/I4/536

s𝟏 is the summary of cluster 1, and s𝟐 is the summary of cluster 2. If l𝟏 < l𝟐, then n should be grouped into cluster 1; thus, we do not need to compare n with each existing feature. In (c) and (d), a situation wherein existing clusters have errors, that is, features of the blue and red persons are grouped into one cluster, is analyzed. To estimate summaries, (c) previous methods mainly use the statistical information of existing clusters, such as cluster centers. (d) Our method directly learns summaries from original data distribution using a graph convolutional network. In addition, our method generates improved summaries when previous clusters contain errors (best viewed in color).
">
Fig. 1 High-level idea of the proposed method. (a) We address the problem of incremental face clustering. (b) Estimating summaries is a fundamental approach for incremental clustering. Summaries are a small number of features that represent existing data in feature space. In (b), n denotes a new face feature, s𝟏 is the summary of cluster 1, and s𝟐 is the summary of cluster 2. If l𝟏 < l𝟐, then n should be grouped into cluster 1; thus, we do not need to compare n with each existing feature. In (c) and (d), a situation wherein existing clusters have errors, that is, features of the blue and red persons are grouped into one cluster, is analyzed. To estimate summaries, (c) previous methods mainly use the statistical information of existing clusters, such as cluster centers. (d) Our method directly learns summaries from original data distribution using a graph convolutional network. In addition, our method generates improved summaries when previous clusters contain errors (best viewed in color).
Fi arrives, we take four steps. In Step 1, we generate candidate graphs by summaries. We cluster Fi with previous summaries Si-𝟏 and then change each summary in Si-𝟏 to its corresponding features in Ei-𝟏, thereby forming a set of clusters. For each cluster, we generate several graphs by multi-thresholds. In Step 2, we predict the confidence score for each graph through a learned GCN. In Step 3, as the candidate graphs overlap, we merge them for clustering results by confidence scores. In Step 4, we generate current summaries Si and their corresponding features Ei and store them for new data.
">
Fig. 2 Main framework of our method for incremental face clustering. When a new batch of face embedding features Fi arrives, we take four steps. In Step 1, we generate candidate graphs by summaries. We cluster Fi with previous summaries Si-𝟏 and then change each summary in Si-𝟏 to its corresponding features in Ei-𝟏, thereby forming a set of clusters. For each cluster, we generate several graphs by multi-thresholds. In Step 2, we predict the confidence score for each graph through a learned GCN. In Step 3, as the candidate graphs overlap, we merge them for clustering results by confidence scores. In Step 4, we generate current summaries Si and their corresponding features Ei and store them for new data.
Fig. 3 Toy example to illustrate the effectiveness of the confidence metric: (a) perfect cluster, (b) an error node, (c) a missing node, and (d) an error node and a missing node. The proposed fusion metric is sensitive to error and missing nodes.
MethodAFANMIAT (s)
i-Kmeans[10]0.3750.856849
BSAS[28]0.4620.859342
MBSAS[29]0.4700.8981383
i-DBSCAN[10]0.6440.925109
Xmeans[30]0.5400.883136 800
HAC[7]0.6370.93420 877
ARO[3]0.1920.862367
Affinity[1]0.7400.951339
Linkage[2]0.6920.9571099
Our method0.7620.955257
Table 1 Results on the MS small testing dataset. Comparison with baseline and state-of-the-art methods in terms of the AF, ANMI, and AT of 10 batches. For all methods, we tune the corresponding hyperparameters and report the best result.
MethodAFANMIAT (min)
Affinity[1]0.6600.950237
Linkage[2]0.6610.952554
Our method0.6720.951203
Table 2 Results on the MS big testing dataset. We compare the proposed method with state-of-the-art methods in terms of the AF, ANMI, and AT of 10 batches. For all methods, we tune the corresponding hyperparameters and report the best result.
MethodAFANMIAT (s)
Affinity[1]0.6770.93441
Linkage[2]0.6690.937126
Our method0.6900.93631
Table 3 Results on the IJB-B testing dataset. We compare our method with state-of-the-art methods in terms of the AF, ANMI, and AT of 10 batches. For all methods, we tune the corresponding hyperparameters and report the best result.
Fig. 4 AF of 10 batches vs. AT (s). The closer to the top left, the better. For easy display, we set a maximum AT of 500 s. AT for i-Kmeans, BSAS, MBSAS, Xmeans, and HAC is 849, 1876, 1383, 136 860, and 20 877 s, respectively.
MetricPrecisionRecallAFANMI
Coverage0.8010.7060.7500.954
Purity0.9120.6160.7350.946
Fusion0.8390.6990.7620.955
Table 4 Comparison with different metrics for learning graph confidence.
ηAFANMIAT (s)
(0.30, 0.35)0.6240.91844
(0.35, 0.40)0.6700.92935
(0.40, 0.45)0.6900.93631
(0.45, 0.50)0.6720.93430
(0.50, 0.55)0.6520.92729
Table 5 AF, ANMI, and AT of 10 batches vs. 𝜼 on the IJB-B testing set.
𝝉 on the IJB-B testing set. For improved view, we divide AT by 50 s in the figure.
">
Fig. 5 AF, ANMI, and AT of 10 batches vs 𝝉 on the IJB-B testing set. For improved view, we divide AT by 50 s in the figure.
[1]   Yang L., Zhan X. H., Chen D. P., Yan J. J., Loy C. C., and Lin D. H., Learning to cluster faces on an affinity graph, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 2293-2301.
[2]   Wang Z. D., Zheng L., Li Y. L., and Wang S. J., Linkage based face clustering via graph convolution network, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 1117-1125.
[3]   Otto C., Wang D. Y., and Jain A. K., Clustering millions of faces by identity, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 2, pp. 289-303, 2018.
[4]   Lloyd S., Least squares quantization in PCM, IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 129-137, 1982.
[5]   Shi J. and Malik J., Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888-905, 2000.
[6]   Ester M., Kriegel H. P., Sander J., and Xu X. W., A density-based algorithm for discovering clusters in large spatial databases with noise, in KDD-96 Proc., Menlo Park, CA, USA, 1996, pp. 226-231.
[7]   Sibson R., SLINK: An optimally efficient algorithm for the single-link cluster method, Comput. J., vol. 16, no. 1, pp. 30-34, 1973.
[8]   Ailon N., Jaiswal R., and Monteleoni C., Streaming k-means approximation, presented at Neural Processing Systems 22: 23rd Annu. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 10-18.
[9]   Shindler M., Wong A., and Meyerson A., Fast and accurate k-means for large datasets, in Proc. of the 24th International Conference on Neural Information Processing Sysems, Granada, Spain, 2011, pp. 2375-2383.
[10]   Bagirov A. M., Ugon J., and Webb D., Fast modified global k-means algorithm for incremental cluster construction, Pattern Recognit., vol. 44, no. 3, pp. 866-876, 2011.
[11]   Ester M., Kriegel H. P., Sander J., Wimmer M., and Xu X. W., Incremental clustering for mining in a data warehousing environment, in Proc. 24th Int. Conf. Very Large Data Bases, New York, NY, USA, 1998, pp. 323-333.
[12]   Mansfield P. A., Wang Q., Downey C., Wan L., and Moreno I. L., Links: A high-dimensional online clustering method, arXiv preprint arXiv: 1801.10123, 2018.
[13]   Kulshreshtha P. and Guha T., An online algorithm for constrained face clustering in videos, in Proc. 25th IEEE Int. Conf. Image Proc. (ICIP), Athens, Greece, 2018, pp. 2670-2674.
[14]   Arthur D. and Vassilvitskii S., k-means++: The advantages of careful seeding, in Proc. 18th Annu. ACM-SIAM Symp. on Discrete Algorithms, Philadelphia, PA, USA, 2007.
[15]   Sun Y. F. and Lu Y. S., A grid-based subspace clustering algorithm for high-dimensional data streams, in Proc. Int. Conf. Web Information Systems, Wuhan, China, 2006.
[16]   Zhou A. Y., Cao F., Qian W. N., and Jin C. Q., Tracking clusters in evolving data streams over sliding windows, Knowl. Inf. Syst., vol. 15, no. 2, pp. 181-214, 2008.
[17]   Namadchian A. and Esfandani G., DSCLU: A new data stream clustring algorithm for multi density environments, in Proc. 13th ACIS Int. Conf. Software Engineering, Artificial Intelligence, Networking and Parallel Distributed Computing (SNPD), Kyoto, Japan, 2012.
[18]   Hamilton W. L., Ying Z., and Leskovec J., Inductive representation learning on large graphs, arXiv preprint arXiv: 1706.02216, 2017.
[19]   Kipf T. N. and Welling M., Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv: 1609.02907, 2017.
[20]   Yan S. J., Xiong Y. J., and Lin D. H., Spatial temporal graph convolutional networks for skeleton-based action recognition, arXiv preprint arXiv: 1801.07455, 2018.
[21]   Chen Z. M., Wei X. S., Wang P., and Guo Y. W., Multi-label image recognition with graph convolutional networks, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 5172-5181.
[22]   Shi Y. C., Otto C., and Jain A. K., Face clustering: Representation and pairwise constraints, IEEE Trans. Inf. Forens. Secur., vol. 13, no. 7, pp. 1626-1640, 2018.
[23]   Li Y. L., Wang S. J., Tian Q., and Ding X. Q., A survey of recent advances in visual feature detection, Neurocomputing, vol. 149, pp. 736-751, 2015.
[24]   Li Y. L., Wang S. J., Tian Q., and Ding X. Q., Feature representation for statistical-learning-based object detection: A review, Pattern Recognit., vol. 48, no. 11, pp. 3542-3559, 2015.
[25]   Guo Y. D., Zhang L., Hu Y. X., He X. D., and Gao J. F., MS-celeb-1m: A dataset and benchmark for large-scale face recognition, in Proc. 14th European Conf. Computer Vision, Amsterdam, the Netherlands, 2016.
[26]   Yi D., Lei Z., Liao S. C., and Li S. Z., Learning face representation from scratch, arXiv preprint arXiv: 1411.7923, 2014.
[27]   Whitelam C., Taborsky E., Blanton A., Maze B., Adams J. C., Miller T., Kalka N. D., Jain A. K., Duncan J. A., Allen K., et al., IARPA Janus benchmark-B face dataset, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 2017, pp. 592-600.
[28]   Theodoridis S. and Koutroumbas K., Pattern Recognition. 4th ed. San Diego, CA, USA: Elsevier, 2009.
[29]   Duda R. O., Hart P. E., and Stork D. G., Pattern Classification. 2nd ed. New York, NY, USA: John Wiley & Sons, 2001.
[30]   Pelleg D. and Moore A., X-means: Extending k-means with efficient estimation of the number of clusters, in Proc. 17th Int. Conf. Machine Learning, Pattsburgh, PA, USA, 2000.
[1] Renjie Lu, Haihua Shen, Zhihua Feng, Huawei Li, Wei Zhao, Xiaowei Li. HTDet: A Clustering Method Using Information Entropy for Hardware Trojan Detection[J]. Tsinghua Science and Technology, 2021, 26(1): 48-61.
[2] Yong Jiang, Jiong Cai, Kewei Tu. Robust Unsupervised Discriminative Dependency Parsing[J]. Tsinghua Science and Technology, 2020, 25(02): 192-202.
[3] Lu Tian, Shengjin Wang. Improved Bag-of-Words Model for Person Re-identification[J]. Tsinghua Science and Technology, 2018, 23(2): 145-156.
[4] . One-Class Support Vector Machine with Relative Comparisons[J]. Tsinghua Science and Technology, 2010, 15(2): 190-197.