Application of Data Mining in the Prediction of COVID-19 Outcome
Main Article Content
Abstract
In December 2019, the novel coronavirus, COVID-19, emerged in Wuhan, China, and rapidly spread across the globe, leading to a significant increase in morbidity and mortality rates. The virus presented with diverse clinical manifestations, and robust predictive models were needed to anticipate outcomes and implement timely preventive and corrective measures. This study was designed to identify patterns in COVID-19 outcomes and develop prediction models for patient survival using data mining techniques. The study was conducted at a tertiary care hospital, Bharati Vidyapeeth (Deemed to be University) Medical College and Hospital, Sangli, analysing cases from June 2020 to December 2020. Data were retrospectively collected from the Record Department using a structured pro forma form and analysed using Microsoft Office 2016, SPSS-22, and WEKA-3.8.6, with cases completing at least 80% of the information. Various simple and ensemble machine learning algorithms were applied to classify patient survival and COVID-19 test results. Through statistical and data mining approaches, the study identified patterns in parameters for both survivors and nonsurvivors, as well as COVID-positive and negative patients. The finalised model for predicting patient survival or non-survival was functions.SMO, with 71.64% (±0.83%) of instances correctly classified; and for distinguishing COVID-19 positive from negative cases, the best-performing model was trees.RandomForest, achieving an accuracy of 84.41% (±0.35%). These prediction models serve as valuable tools for physicians to diagnose and manage COVID-19, identify critical cases in the early stages, and enhance patient care through timely interventions.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020;382(8):727-33. doi: 10.1056/NEJMoa2001017, PMID 31978945.
De Ceukelaire W, Bodini C. We need strong public health care to contain the global corona pandemic. Int J Health Serv. 2020 July 1;50(3):276-7. doi: 10.1177/0020731420916725, PMID 32188308.
Andrews MA, Areekal B, Rajesh KR, Krishnan J, Suryakala R, Krishnan B, et al. First confirmed case of COVID-19 infection in India: A case report. Indian J Med Res. 2020 May 1;151(5):490-2. doi: 10.4103/ijmr.IJMR_2131_20, PMID 32611918.
Anonim. Worldometer. p. 1-25; 2022. COVID live – coronavirus statistics – Worldometer [cited May 10 2022]. Available from: https://www.worldometers.info/coronavirus/.
Villavicencio CN, Macrohon JJE, Inbaraj XA, Jeng JH, Hsieh JG. Covid-19 prediction applying supervised machine learning algorithms with comparative analysis using Weka. Algorithms. 2021;14(7). doi: 10.3390/a14070201.
Cortés-Martínez KV, Estrada-Esquivel H, Martínez-Rebollar A, Hernández-Pérez Y, Ortiz-Hernández J. The state of the art of data mining algorithms for predicting the COVID-19 pandemic. Axioms. 2022;11(5):242. doi: 10.3390/axioms11050242.
Bian J, Modave F. The rapid growth of intelligent systems in health and health care. Health Informatics J. 2020;26(1):5-7. doi: 10.1177/1460458219896899, PMID 31928307.
Arpaci I, Huang S, Al-Emran M, Al-Kabi MN, Peng M. Predicting the COVID-19 infection with fourteen clinical features using machine learning classification algorithms. Multimed Tools Appl. 2021 March 1;80(8):11943-57. doi: 10.1007/s11042-020-10340-7, PMID 33437173.
Mengistie TT. COVID-19 outbreak data analysis and prediction modeling using data mining technique. Int J Comput (IJC). 2020;38(1):37-60.
Srinivasa Rao ASR, Vazquez JA. Better hybrid systems for disease detections and early predictions. Clin Infect Dis. 2022 February 1;74(3):556-8. doi: 10.1093/cid/ciab489, PMID 34037741.
Çela EK, Frasheri N. A literature review of data mining techniques used in healthcare databases. ICT Innov. 2012 Web Proceedings; 2012:577-82.
Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: towards a unifying framework. p. 82-8; 1996. International Conference on Knowledge Discovery and Data Mining [internet] [cited May 10 2022]. Available from: http://www.aaai.org/Papers/KDD/1996/KDD96-014.
Peng M, Yang J, Shi Q, Ying L, Zhu H, Zhu G, et al. Artificial Intelligence Application in COVID-19; Diagnosis and Prediction. SSRN Journal. 2020 (April). doi: 10.2139/ssrn.3541119.
Riley RD, Ensor J, E Snell KI, Harrell FE, Martin GP, Reitsma JB, et al. Calculating the sample size required for developing a clinical prediction model. 2020 [cited 2023 Apr 13]; Available from: http://www.bmj.com/permissionsSubscribe:http://www.bmj.com/subscribeBMJ2020;368:m441doi:10.1136/bmj.m441
Guidotti E, Ardia D. COVID-19 data hub. J Open Source Softw. 2020 July 10;5(51):2376. doi: 10.21105/joss.02376.
Moulaei K, Shanbehzadeh M, Mohammadi-Taghiabad Z, Kazemi-Arpanahi H. Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med Inform Decis Mak. 2022;22(1):2. doi: 10.1186/s12911-021-01742-0, PMID 34983496.
Chapelle O, Haffner P, Vapnik VN. Support vector machines for histogram-based image classification. IEEE Trans Neural Netw. 1999;10(5):1055-64. doi: 10.1109/72.788646, PMID 18252608.
Al Sadig M, Khalid N, Sattar A. Developing a prediction model using J48 algorithm to predict symptoms of COVID-19 causing death. Int J Comput Sci Netw Secur. 2020;20(8):80-3.
Brinati D, Campagner A, Ferrari D, Locatelli M, Banfi G, Cabitza F. Detection of COVID-19 infection from routine blood exams with machine learning: A feasibility study. J Med Syst. 2020;44(8):135. doi: 10.1007/s10916-020-01597-4. PMID 32607737.
Albahri AS, Hamid RA, Alwan JK, Al-Qays ZT, Zaidan AA, Zaidan BB et al.. Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): a Systematic Review. J Med Syst. 2020;44(7):122. doi: 10.1007/s10916-020-01582-x. PMID 32451808.