Please wait a minute...
Tsinghua Science and Technology  2019, Vol. 24 Issue (2): 195-206    doi: 10.26599/TST.2018.9010074
Parallel ADR Detection Based on Spark and BCPNN
Li Sun, Shan Sun, Tianlei Wang, Jiyun Li*, Jingsheng Lin
∙ Li Sun, Shan Sun, Tianlei Wang, and Jiyun Li are with School of Computer Science and Technology, Donghua University, Shanghai 201620, China. E-mail:;;
∙ Jingsheng Lin is with the Ruijin Hospital Affiliated to Shanghai Jiao Tong University, Shanghai 200020, China. E-mail:
Download: PDF (2848 KB)      HTML
Export: BibTeX | EndNote (RIS)      


Adverse Drug Reaction (ADR) is one of the major challenges to the evaluation of drug safety in the medical field. The Bayesian Confidence Propagation Neural Network (BCPNN) algorithm is the main algorithm used by the World Health Organization to monitor ADRs. Currently, ADR reports are collected through the spontaneous reporting system. However, with the continuous increase in ADR reports and possible use scenarios, the efficiency of the stand-alone ADR detection algorithm will encounter considerable challenges. Meanwhile, the BCPNN algorithm requires a certain number of disk I/O, which leads to considerable time consumption. In this study, we propose a Spark-based parallel BCPNN algorithm, which speeds up data processing and reduces the number of disk I/O in BCPNN, and two optimization strategies. Then, the ADR data collected from the FDA Adverse Event Reporting System are used to verify the performance of the proposed algorithm and its optimization strategies. Experiments show that the parallel BCPNN can significantly accelerate data processing and the optimized algorithm has a high acceleration rate and can effectively prevent memory overflow. Finally, we apply the proposed algorithm to a dataset provided by a real medical consortium. Experiments further prove the performance and practical value of the proposed algorithm.

Key wordsAdverse Drug Reaction (ADR)      Bayesian Confidence Propagation Neural Network (BCPNN)      parallel      Spark     
Received: 16 August 2017      Published: 29 April 2019
Corresponding Authors: Jiyun Li   
About author:

Jingsheng Lin received the BEng degree from East China University of Science and Technology in 2009, and the Engineering Master degree from Donghua University in 2011. Now he is working at Ruijin Hospital Affiliated to Shanghai Jiao Tong University. His main research interests include deep learning, artificial intelligence, intelligent medical, etc.

Cite this article:

Li Sun, Shan Sun, Tianlei Wang, Jiyun Li, Jingsheng Lin. Parallel ADR Detection Based on Spark and BCPNN. Tsinghua Science and Technology, 2019, 24(2): 195-206.

URL:     OR

ItemObjective ADR eventOther ADR eventSummation
Objective drugabintAB=a+b
Other drugcdc+d
Table 1 Two-by-two contingency table.
Fig. 1 Spark calculation model.
Fig. 2 Spark-based BCPNN algorithm flowchart.
Fig. 3 RDD transformation in the second stag.
Fig. 4 RDD transformation in the third stage.
Fig. 5 Connection operation without using partitionBy().
Fig. 6 Connection operation using partitionBy().
Fig. 7 Task assignment without improving shuffling parallelism.
Fig. 8 Task assignment with improving shuffling parallelism.
Fig. 9 Time comparison of BCPNN on Spark and stand-alone.
Fig. 10 Time comparison before and after optimization.
Fig. 11 Speedup before and after optimization.
Memory (GB)Before optimizationAfter optimization
1OverflowNo overflow
2OverflowNo overflow
4No overflowNo overflow
Table 2 Memory overflow situation.
Fig. 12 Time consumption of the algorithms.
[1]   Bate A., Lindquist M., Edwards I. R., and Orre R., A data mining approach for signal detection and analysis, Drug Saf., vol. 25, no. 6, pp. 393-397, 2002.
[2]   Bate A., The use of a Bayesian confidence propagation neural network in pharmacovigilance, PhD dissertation, Ume? University, Sweden, 2003.
[3]   Karimi S., Wang C., Metke-Jimenez A., Gaire R., and Paris C., Text and data mining techniques in adverse drug reaction detection, ACM Computing Surveys, vol. 47, no. 4, p. 56, 2015.
[4]   Chen W. G. and Deng J. X., A study on signal detection and automatic warning algorithm for adverse drug reaction, in Proc. 2008 International Conference on Computer Science and Software Engineering, 2008.
[5]   Swetha K. V., Sathyadevan S., and Bilna P., Network data analysis using spark, in Software Engineering in Intelligent Systems, Silhavy R., Senkerik R., Oplatkova Z., Prokopova Z., and Silhavy P., eds. Springer, 2015, pp. 253-259.
[6]   Bate A., Lindquist M., Edwards I. R., Olsson S., Orre R., Lansner A., and De Freitas R. M., A Bayesian neural network method for adverse drug reaction signal generation, Eur. J. Clin. Pharmacol., vol. 54, no. 4, pp. 315-321, 1998.
[7]   Farahini N., Hemani A., Lansner A., Clermidy F., and Svensson C., A scalable custom simulation machine for the Bayesian Confidence Propagation Neural Network model of the brain. in Proc. 2014 19th Asia and South Pacific Design Automation Conf., Singapore, 2014, pp. 578-585.
[8]   Honigman B., Lee J., Rothschild J., Light P., Pulling R. M., Yu T., and Bates D. W., Using computerized data to identify adverse drug events in outpatients, J. Am. Med. Inform. Assoc., vol. 8, no. 3, pp. 254-266, 2001.
[9]   Duan L., Khoshneshin M., Street W. N., and Liu M., Adverse drug effect detection, IEEE Journal of Biomedical & Health Informatics, vol. 17, no. 2, pp. 305-311, 2013.
[10]   Bates D. W., Evans R. S., Murff H., Stetson P. D., Pizziferri L., and Hripcsak G., Detecting adverse events using information technology, Journal of the American Medical Informatics Association, vol. 10, no. 2, pp. 115-128, 2003.
[11]   Han Z. J. and Zhang Y. J., Spark: A big data processing platform based on memory computing, in Proc. 7th Int. Symposium on Parallel Architectures, Algorithms and Programming, Nanjing, China, 2015, pp. 172-176.
[12]   Solovyev A., Mikheev M., Zhou L. M., Dutta-Moscato J., Ziraldo C., An G., Vodovotz Y., and Mi Q., SPARK: A framework for multi-scale agent-based biomedical modeling, International Journal of Agent Technologies & Systems, vol. 2, no. 3, pp. 18-30, 2010.
[13]   Cao P., Optimization and implementation of clustering algorithm based on Spark platform, (in Chinese), master degree dissertation, Beijing Jiaotong University, Beijing, China, 2016.
[14]   Zhang A., Insider of Spark Technology, (in Chinese). Beijing, China: Mechanical Industry Press, 2015.
[15]   Fu J., Sun J. W., and Wang K. Y., SPARK—A big data processing platform for machine learning, in Proc. 2016 International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration, Wuhan, China, 2017, pp. 48-51.
[16]   Brewer T. and Colditz G. A., Postmarketing surveillance and adverse drug reactions: Current perspectives and future needs, JAMA, vol. 218, no. 9, pp. 824-829, 1999.
[17]   Lansner A. and Ekeberg ?., A one-layer feedback artificial neural network with a Bayesian learning rule, International Journal of Neural Systems, vol. 1, no. 1, pp. 77-87, 1989.
[18]   Karau H., Konwinski A., Wendell P., and Zaharia M., Learning Spark: Lightning-Fast Data Analysis, (in Chinese). Beijing, China: People’s Posts and Telecommunications Press, 2015.
[19]   Li W. Y., Research on apache spark for big data processing, (in Chinese), Modern Computer, no. 8, pp. 55-60, 2015.
[20]   Xie S. L., Research and application of distributed ETL based on spark, (in Chinese), master degree dissertation, Donghua University, Shanghai, China, 2017.
[21]   FDA adverse event reporting system (FAERS): Latest quarterly data files, , 2017.
[22]   Wu Z. F., Zhang T., and Xiao Y., Improvement and parallel implementation of K-means clustering algorithm based on the Spark platform, (in Chinese), China Internet, no. 1, pp. 44-50, 2016.
[23]   Wang P. F. and Li L., Research on multi-pattern matching algorithms based on Aho-Corasick algorithm, (in Chinese), Application Research of Computers, vol. 28, no. 4, pp. 1251–1253&1259, 2011.
[24]   Aho A. V. and Corasick M. J., Efficient string matching: An aid to bibliographic search, Communications of the ACM, vol. 18, no. 6, pp. 333-340, 1975.
[1] Jianjiang Li, Peng Wei, Shaofeng Yang, Jie Wu, Peng Liu, Xinfu He. Crystal-KMC: Parallel Software for Lattice Dynamics Monte Carlo Simulation of Metal Materials[J]. Tsinghua Science and Technology, 2018, 23(4): 501-510.
[2] Daming Zhang,Yongpan Liu,Shuangchen Li,Tongda Wu,Huazhong Yang. Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs[J]. Tsinghua Science and Technology, 2015, 20(6): 644-660.
[3] Yanping Zhang,Zihui Jing,Yiwen Zhang. MR-IDPSO: A Novel Algorithm for Large-Scale Dynamic Service Composition[J]. Tsinghua Science and Technology, 2015, 20(6): 602-612.
[4] Hui Li,Linxuan Zhang,Tianyuan Xiao,Jietao Dong. Synergic Motion Trajectory Planning for Airplane Docking Based on 6PURU Parallel Mechanism[J]. Tsinghua Science and Technology, 2015, 20(2): 188-199.
[5] Liu Yang, Wu Bin, Wang Hongxu, Ma Pengjiang. BPGM: A Big Graph Mining Tool[J]. Tsinghua Science and Technology, 2014, 19(1): 33-38.
[6] . Optimizing a Parallel Video Encoder with Message Passing and a Shared Memory Architecture[J]. Tsinghua Science and Technology, 2011, 16(4): 393-398.
[7] . Workspace Analysis of the 4RRR Planar Parallel Manipulator with Actuation Redundancy[J]. Tsinghua Science and Technology, 2010, 15(5): 509-516.
[8] . Error Analysis and Distribution of 6-SPS and 6-PSS Reconfigurable Parallel Manipulators[J]. Tsinghua Science and Technology, 2010, 15(5): 547-554.
[9] . Rendering Grass Blowing in the Wind with Global Illumination[J]. Tsinghua Science and Technology, 2010, 15(2): 133-137.
[10] . Iterative Reconstruction for Transmission Tomography on GPU Using Nvidia CUDA[J]. Tsinghua Science and Technology, 2010, 15(1): 11-16.
[11] . Invariants for Parallel Mapping[J]. Tsinghua Science and Technology, 2009, 14(5): 646-654.
[12] . Optimum Kinematic Design of the 4R 2-DOF Parallel Mechanism[J]. Tsinghua Science and Technology, 2009, 14(5): 663-668.
[13] . Parallel Frequent Pattern Discovery: Challenges and Methodology[J]. Tsinghua Science and Technology, 2007, 12(6): 719-728.
[14] . Optimal Kinematic Design of a 2-DOF Planar Parallel Manipulator[J]. Tsinghua Science and Technology, 2007, 12(3): 269-275.
[15] . Optimization of Adaptive Transit Signal Priority Using Parallel Genetic Algorithm[J]. Tsinghua Science and Technology, 2007, 12(2): 131-140.