Please wait a minute...
Tsinghua Science and Technology  2019, Vol. 24 Issue (06): 677-693    doi: 10.26599/TST.2018.9010103
REGULAR ARTICLES     
Deep Model Compression for Mobile Platforms: A Survey
Kaiming Nan, Sicong Liu, Junzhao Du, Hui Liu*
∙ Kaiming Nan and Sicong Liu are with School of Computer Science and Technology, Xidian University, Xi’an 710071, China. E-mail: nankaiming@stu.xidian.edu.cn; liusc@stu.xidian.edu.cn.
∙ Junzhao Du and Hui Liu are with School of Software and Institute of Software Engineering, Xidian University, Xi’an 710071, China. E-mail: dujz@xidian.edu.cn.
Download: PDF (8106 KB)      HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Despite the rapid development of mobile and embedded hardware, directly executing computation-expensive and storage-intensive deep learning algorithms on these devices’ local side remains constrained for sensory data analysis. In this paper, we first summarize the layer compression techniques for the state-of-the-art deep learning model from three categories: weight factorization and pruning, convolution decomposition, and special layer architecture designing. For each category of layer compression techniques, we quantify their storage and computation tunable by layer compression techniques and discuss their practical challenges and possible improvements. Then, we implement Android projects using TensorFlow Mobile to test these 10 compression methods and compare their practical performances in terms of accuracy, parameter size, intermediate feature size, computation, processing latency, and energy consumption. To further discuss their advantages and bottlenecks, we test their performance over four standard recognition tasks on six resource-constrained Android smartphones. Finally, we survey two types of run-time Neural Network (NN) compression techniques which are orthogonal with the layer compression techniques, run-time resource management and cost optimization with special NN architecture, which are orthogonal with the layer compression techniques.



Key wordsdeep learning      model compression      run-time resource management      cost optimization     
Received: 31 January 2018      Published: 20 June 2019
Corresponding Authors: Hui Liu   
About author:

Hui Liu received the BS, MS, and PhD degrees from Xidian University in 1998, 2003, and 2011, respectively. She is currently an associate professor in School of Software at Xidian University. Her research interests includes big data analysis, task scheduling, and mobile computing. She is the member of ACC, IEEE, and CCF.

Cite this article:

Kaiming Nan, Sicong Liu, Junzhao Du, Hui Liu. Deep Model Compression for Mobile Platforms: A Survey. Tsinghua Science and Technology, 2019, 24(06): 677-693.

URL:

http://tst.tsinghuajournals.com/10.26599/TST.2018.9010103     OR     http://tst.tsinghuajournals.com/Y2019/V24/I06/677

TypeMethodLayerWeight sizeComputation
NoneN/AFCABAB
N/ACONVMNUVHout?Wout?UVNM
Weight compressionSVD based[15] (Tech1)FC(A+B)?k(A+B)?k
Sparse coding[16] (Tech2)FC(A+B)?k(A+B)?k
Pruning[17] (Tech3)FCAB-n|weight|<ϵAB-n|weight|<ϵ
Pruning[17] (Tech3)CONVMNUV-n|weight|<ϵ[MNUV-n|weight|<ϵ]HoutWout
Convolution decompositionSparse[18] (Tech4 and Tech5)CONV(1-θ)MNUV(1-θ)HoutWoutUVMN
Depth-wise[19] (Tech6)CONVγMUV+γ2MNUVγMHinWin+γ2MNHinWin
Sparse random[20] (Tech7)CONV(1-θ)MNUV(1-θ)HoutWoutUVMN
Special layerFire[21] (Tech8)CONVM1+N1+M2N2+3×3×M3N3Hout1Wout1N1M1+Hout2Wout2N2M2+3×3×Hout3Wout3N3M3
MlpConv[22] (Tech9)CONVM1?N1+M2?N2i=12Hout?i?Wout?i?Ui?Vi?Mi?Ni
Global average pool[22] (Tech10)FCMNHoutWoutMN
Table 1 Comparison and quantification of layer compression techniques for DNNs.
Fig. 1 Weight factorization by using Tech1 or Tech2.
Fig. 2 Example of direct sparse convolution computation (Tech5).
Fig. 3 An example of Tech9.
Fig. 4 Novel combination of compression techniques in LeNet.
Fig. 5 Novel combination of compression techniques in AlexNet.
No.TaskModelDataset
T1DigitLeNet60000 pieces of data from MNIST[34]
T2ImageLeNet60000 pieces of data from CIFAR-10[35]
T3ImageAlexNet60000 of pieces data from CIFAR-10[35]
T4AudioLeNet7500 pieces of data UbiSound[36]
Table 2 Recognition tasks, datasets, and models.
TypeNo.DeviceProcessorDRAMCacheBatteryMAC process rate
Android phone1Xiaomi RedMi 3SQualcomm 4303 GBL1 (64 KB), L2 (2 MB)4100 mAh691.3 Mflops
2Xiaomi Mi 5SQualcomm B214 GBL1 (32 KB), L2 (1 MB)3200 mAh3.87 Gflops
3Xiaomi Mi 6Qualcomm 8356 GBL1 (32 KB), L2 (2 MB)3350 mAh3.11 Gflops
4Huawei PRA-AL00HiSilicon kirin6553 GBL1 (64 KB), L2 (2 MB)3000 mAh1.21 Gflops
5Samsung Note5Samsung exynos74204 GBL1 (38 KB), L2 (2 MB)3000 mAh2.51 Gflops
6Huawei P9HiSilicon kirin9553 GBL1 (48 KB), L2 (4 MB)3000 mAh2.29 Gflops
Table 3 Resource constraints study on six resource-constrained Android platforms.
𝜸 values on Tech6 at CONV layers, and (e) different 𝜽 values on Tech7 at CONV layers. In this figure, A means accuracy, Sp is parameter size, Sf is the intermediate feature size, T stands for latency, and E and C are energy cost and MAC number, respectively. x-axis shows a different parameter setup, and y-axis is the accuracy/cost ratio of compressed layer to origin layers.">
Fig. 6 Impact of various compression hyper-parameters to the metrics for different techniques. (a) Different k values on Tech1 at FC layers, (b) different k values on Tech1 at CONV layers, (c) different k values on Tech2 at FC layers, (d) different 𝜸 values on Tech6 at CONV layers, and (e) different 𝜽 values on Tech7 at CONV layers. In this figure, A means accuracy, Sp is parameter size, Sf is the intermediate feature size, T stands for latency, and E and C are energy cost and MAC number, respectively. x-axis shows a different parameter setup, and y-axis is the accuracy/cost ratio of compressed layer to origin layers.
Sp, intermediate feature size Sf, latency T, energy cost E, and MAC computation C. Tech1 is used for FC1 and CONV2, Tech2 and Tech3 act on FC1, Tech4 and Tech5, Tech6, Tech7, Tech8, and Tech9 target at CONV2, and Tech10 is used for all FC layers. y-axis shows the cost ratio of compressed layers to the original layers.">
Fig. 7 Comparison of the impact of different compression techniques to the certain layers. (a) LeNet model and (b) AlexNet model in terms of accuracy A, parameter size Sp, intermediate feature size Sf, latency T, energy cost E, and MAC computation C. Tech1 is used for FC1 and CONV2, Tech2 and Tech3 act on FC1, Tech4 and Tech5, Tech6, Tech7, Tech8, and Tech9 target at CONV2, and Tech10 is used for all FC layers. y-axis shows the cost ratio of compressed layers to the original layers.
Fig. 8 Comparison of the impact of different compression techniques to the whole model. (a) LeNet model and (b) AlexNet model. Tech1 is used for FC1 and CONV2, Tech2 and Tech3 act on FC1, Tech4 and Tech5, Tech6, Tech7, Tech8, and Tech9 target at CONV2, and Tech10 is used for all FC layers. The y-axis shows the overhead of compressed model compared with the original model.
Compression techniqueAlexNet + CIFAR-10LeNet + MNIST
A?(%)Sp?(MB)T?(ms)E?(mJ)A?(%)Sp?(MB)T?(ms)E?(mJ)
Initial model82.1254.3918065.2399.312.49322.30
SVD-based compression for FC (Tech1)77.1644.0223061.2999.144.03121.51
SVD-based compression for CONV (Tech1)60.3252.6812843.2592.1912.32311.97
Sparse-coding compression (Tech2)79.1244.0223061.2998.784.30121.51
Weight pruning (Tech3)83.791.5618530.7297.893.63321.05
Sparse and direct sparse (Tech4 and Tech5)75.6352.1921093.4898.7512.2920.51.73
Depth-wise separable (Tech6)78.2652.2212237.2198.712.30331.75
Sparse random (Tech7)57.3152.6319547.3498.912.44252.16
Fire module (Tech8)80.2452.2812037.2199.112.34312.07
MlpConv module (Tech9)83.0758.55330112.8198.912.32312.08
Global average pool (Tech10)85.002.3516044.9097.10.2541.10
Table 4 Performance of compression techniques on LeNet + MNIST and AlexNet + CIFAR-10, as evaluated on RedMi 3S phone (Device1).
Metric ratioAccuracy loss ratio r1=Aorigin/AcompressedParameter compression ratio r2=Spcompressed/SporiginLatency compression ratio r3=Tcompressed/ToriginMAC amount compression ratio r4=Ccompressed/CoriginEnergy cost compression ratio r5=Ecompressed/Eorigin
T1 (Weight pruning (Tech3))1.010.210.440.300.45
T2 (Global average pool (Tech10))1.070.020.580.620.48
T3 (Depth-wise separable (Tech6))1.040.320.230.130.38
T4 (Sparse and direct sparse (Tech4 and Tech5))1.010.410.430.350.39
Table 5 Performance of different recognition tasks which choose best compression techniques with Eq. (4).
Technique with min Eq. (4)Accuracy lossParameter compressionLatency compressionMAC amount compressionEnergy cost compression
Device1 (Weight pruning (Tech3))1.01 (p1=0.5)0.21 (p2=0.2)0.44 (p3=0.1)0.30 (p4=0.1)0.45 (p5=0.1)
Device2 (Depth-wise separable (Tech6))1.04 (p1=0.3)0.32 (p2=0.3)0.23 (p3=0.1)0.13 (p4=0.1)0.38 (p5=0.2)
Device3 (Depth-wise separable (Tech6))1.04 (p1=0.4)0.32 (p2=0.2)0.23 (p3=0.1)0.13 (p4=0.1)0.38 (p5=0.2)
Device4 (Sparse and Direct sparse (Tech4 and Tech5))1.01 (p1=0.2)0.41 (p2=0.2)0.43 (p3=0.1)0.35 (p4=0.2)0.39 (p5=0.3)
Device5 (Sparse coding compression (Tech2))1.01 (p1=0.3)0.18 (p2=0.2)0.22 (p3=0.1)0.35 (p4=0.2)0.25 (p5=0.2)
Device6 (Fire Module (Tech8))0.94 (p1=0.4)0.82 (p2=0.1)0.23 (p3=0.1)0.12 (p4=0.2)0.42 (p5=0.2)
Table 6 Performance of NN compression techniques on different resource-constrained Android devices.
[1]   Lindholm E., Nickolls J., Oberman S., and Montrym J., NVIDIA Tesla: A unified graphics and computing architecture, IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008.
[2]   Jouppi N. P., Young C., Patil N., Patterson D., Agrawal G., Bajwa R., Bates S., Bhatia S., Boden N., Borchers A., et al., In-datacenter performance analysis of a tensor processing unit, in Proc. 44th Annu. Int. Symp. Computer Architecture, Toronto, Canada, 2017, pp. 1-12.
[3]   Krizhevsky A., Sutskever I., and Hinton G. E., ImageNet classification with deep convolutional neural networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012.
[4]   Simonyan K. and Zisserman A., Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2015.
[5]   He K. M., Zhang X. Y., Ren S. Q., and Sun J., Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
[6]   Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z., Rethinking the inception architecture for computer vision, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2818-2826.
[7]   Szegedy C., Ioffe S., Vanhoucke V., and Alemi A., Inception-v4, inception-ResNet and the impact of residual connections on learning, arXiv preprint arXiv:1602.07261, 2016.
[8]   Das A., Degeling M., Wang X. Y., Wang J. J., Sadeh N., and Satyanarayanan M., Assisting users in a world full of cameras: A privacy-aware infrastructure for computer vision applications, in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 2017, pp. 1387-1396.
[9]   Haridas A. V., Marimuthu R., and Sivakumar V. G., A critical review and analysis on techniques of speech recognition: The road ahead, International Journal of Knowledge-Based and Intelligent Engineering Systems, vol. 22, no. 1, pp. 39-57, 2018.
[10]   Li Z. J., Li M., Mohapatra P., Han J. S., and Chen S. Y., iType: Using eye gaze to enhance typing privacy, in Proc. IEEE INFOCOM 2017-IEEE Conf. Computer Communications, Atlanta, GA, USA, 2017, pp. 1-9.
[11]   Jusoh S., A study on NLP applications and ambiguity problems, Journal of Theoretical Applied Information Technology, vol. 96, no. 6, pp. 1486-1499, 2018.
[12]   Chervirala S., Mallya S., and Li W. C., Method and system to recommend applications from an application market place to a new device, US Patent No. 9881050, Jan. 30, 2018.
[13]   Deng J., Dong W., Socher R., Li L. J., Li K., and Li F. F., ImageNet: A large-scale hierarchical image database, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255.
[14]   Jain L. C., Halici U., Hayashi I., Lee S. B., and Tsutsui S., Intelligent Biometric Techniques in Fingerprint and Face Recognition. Boca Raton, FL, USA: CRC Press, 1999.
[15]   Xue J., Li J. Y., and Gong Y. F., Restructuring of deep neural network acoustic models with singular value decomposition, in Proc. INTERSPEECH, Lyon, France, 2013, pp. 2365-2369.
[16]   Bhattacharya S. and Lane N. D., Sparsification and separation of deep learning layers for constrained resource inference on wearables, in Proc. 14th ACM Conf. Embedded Network Sensor Systems CD-ROM, Stanford, CA, USA, 2016, pp. 176-189.
[17]   Han S., Man H. Z., and Dally W. J., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv: 1510.00149, 2016.
[18]   Liu B. Y., Wang M., Foroosh H., Tappen M., and Pensky M., Sparse convolutional neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 806-814.
[19]   Howard A. G., Zhu M. L., Chen B., Kalenichenko D., Wang W. J., Weyand T., Andreetto M., and Adam H., Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.
[20]   Park J., Li S., Wen W., Tang P. T. P., Li H., Chen Y. R., and Dubey P., Faster CNNs with direct sparse convolutions and guided pruning, arXiv preprint arXiv: 1608.01409, 2017.
[21]   Landola F. N., Han S., Moskewicz M. W., Ashraf K., Dally W. J., and Keutzer K., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size, arXiv preprint arXiv: 1602.07360, 2016.
[22]   Lin M., Chen Q., and Yan S. C., Network in network, arXiv preprint arXiv: 1312.4400, 2014.
[23]   Lane N. D., Bhattacharya S., Georgiev P., Forlivesi C., Lei J., Qendro L., and Kawsar F., DeepX: A software accelerator for low-power deep learning inference on mobile devices, in Proc. 2016 15th ACM/IEEE Int. Conf. Information Processing in Sensor Networks (IPSN), Vienna, Austria, 2016, pp. 1-12.
[24]   Han S., Pool J., Tran J., and Dally W. J., Learning both weights and connections for efficient neural network, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015.
[25]   Park J., Li S. R., Wen W., Li H., Chen Y. R., and Dubey P., Holistic SparseCNN: Forging the trident of accuracy, speed, and size, arXiv preprint arXiv: 1608.01409, 2017.
[26]   Spring R. and Shrivastava A., Scalable and sustainable deep learning via randomized hashing, in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 445-454.
[27]   Mahendran A. and Vedaldi A., Understanding deep image representations by inverting them, in Proc. Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 5188-5196.
[28]   Raghu M., Gilmer J., Yosinski J., and Sohl-Dickstein J., SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability, in Proc. 31st Annu. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6078-6087.
[29]   Han S., Liu X. Y., Mao H. Z., Pu J., Pedram A., Horowitz M. A., and Dally W. J., EIE: Efficient inference engine on compressed deep neural network, arXiv preprint arXiv: 1602.01528, 2016.
[30]   TensorFlow Lite, , 2019
[31]   Changpinyo S., Sandler M., and Zhmoginov A., The power of sparsity in convolutional neural networks, arXiv preprint arXiv: 1702.06257, 2017.
[32]   TensorFlow, , 2019.
[33]   LeCun Y., Bottou L., Bengio Y., and Haffner P., Gradient-based learning applied to document recognition, Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[34]   LeCun Y., Cortes C., and Burges C. J. C., The mnist database of handwritten digits, , 1998.
[35]   Krizhevsky A., Vinod N., and Geoffrey H., The cifar-10 dataset, , 2014.
[36]   Liu S. C., Zhou Z. M., Du J. Z., Shangguan L. F., Han J., and Wang X., UbiEar: Bringing location-independent sound awareness to the hard-of-hearing people with smartphones, Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2, p. 17, 2017.
[37]   Primate labs, Geebench4, , 2017.
[38]   Han S., Shen H. C., Philipose M., Agarwal S., Wolman A., and Krishnamurthy A., MCDNN: An approximation-based execution framework for deep stream processing under resource constraints, in Proc. 14th Annu. Int. Conf. Mobile Systems, Applications, and Services, Singapore, 2016, pp. 123-136.
[39]   Georgiev P., Lane N. D., Rachuri K. K., and Mascolo C., LEO: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources, in Proc. 22nd Annu. Int. Conf. Mobile Computing and Networking, New York, NY, USA, 2016, pp. 320-333.
[40]   Yao S. C., Zhao Y. R., Zhang A., Su L., and Aldelzaher T., DeepIoT: Compressing deep neural network structures for sensing systems with a compressor-critic framework, in Proc. 15th ACM Conf. Embedded Network Sensor Systems, Delft, Netherlands, 2017, p. 4.
[41]   Srivastava N., Hinton G., Krizhevsky A., Sutskever I., and Salakhutdinov R., Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
[42]   Konda V. R. and Tsitsiklis J. N., Actor-critic algorithms, in Proc. 13th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1008-1014.
[43]   Teerapittayanon S., McDanel B., and Kung H. T., BranchyNet: Fast inference via early exiting from deep neural networks, in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2016, pp. 2464-2469.
[44]   Liu L. L. and Deng J., Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution, arXiv preprint arXiv: 1701.00299, 2018.
[1] Weiwei Jiang, Lin Zhang. Geospatial Data to Images: A Deep-Learning Framework for Traffic Forecasting[J]. Tsinghua Science and Technology, 2019, 24(1): 52-64.
[2] Qi Dang, Jianqin Yin, Bin Wang, Wenqing Zheng. Deep Learning Based 2D Human Pose Estimation: A Survey[J]. Tsinghua Science and Technology, 2019, 24(06): 663-676.
[3] Xiaocheng Feng,Lifu Huang,Bing Qin,Ying Lin,Heng Ji,Ting Liu. Multi-Level Cross-Lingual Attentive Neural Architecture for Low Resource Name Tagging[J]. Tsinghua Science and Technology, 2017, 22(6): 633-645.
[4] Zhenlong Yuan,Yongqiang Lu,Yibo Xue. DroidDetector: Android Malware Characterization and Detection Using Deep Learning[J]. Tsinghua Science and Technology, 2016, 21(1): 114-123.