Predicting Ischemic Heart Disease and Determining Its Risk Factors: A Comparison of Various Classification Methods in Machine Learning
Keywords:
Ischemic heart disease, risk factors, prediction, machine learning algorithmsAbstract
This study was conducted to identify the important risk factors of ischemic heart disease (IHD) amongst the population of Balochistan and to determine the most accurate machine learning (ML) algorithms for the prediction of IHD. The data were collected from 300 individuals (100 IHD cases and 200 control cases) on common risk factors of IHD. The risk factors included marital status, physical activity, socioeconomic position, type of oil used for cooking, diet, body mass index, blood pressure, random blood sugar, history of known disease, and cholesterol level. We employed linear discriminant analysis (LDA), artificial neural networks (ANN), naïve Bayes (NB) and random forest (RF) classification methods. The data were randomly partitioned into training (70%) and testing (30%) sets. The classification methods were evaluated based on their accuracy rates, sensitivity, specificity, positive and negative predictive values, and area under the receiver operating characteristic curve. The results of the study indicated that ANN was the most accurate classification method, with an accuracy of 88.89%, followed by NB, LDA and RF, with accuracy rates of 86.67%, 85.56% and 84.44%, respectively. Moreover, in most classification methods, blood pressure, cholesterol levels, physical activity, diet, BMI, and family history were found as the important factors for developing the risk of IHD. The study’s results indicated that ML methods, especially ANN, can be employed for accurately predicting the state of IHD and determining the important risk factors.
References
Abbas S, Kitchlew AR, Abbas S. Disease burden of ischemic heart disease in Pakistan and its risk factors. Ann Pak Inst Med Sci. 2009;5(3):145-150.
Adavi M, Salehi M, Roudbari M. Artificial neural networks versus bivariate logistic regression in prediction diagnosis of patients with hypertension and diabetes. Med J Islam Repub Iran. 2016; 30: 1-5.
Anderson KM, Odell PM, Wilson PW, Kannel WB. Cardiovascular disease risk profiles. J Am Heart Assoc. 1991; 121(1): 293-298.
Ayatollahi H, Gholamhosseini L, Salehi M. Predicting coronary artery disease: a comparison between two data mining algorithms. BMC Public Health. 2019; 19(1): 1-9.
Barolia R, Sayani AH. Risk factors of cardiovascular disease and its recommendations in Pakistani context. J Pak Med Assoc. 2017; 67(11): 1723-1729.
Batty GD. Physical activity and coronary heart disease in older adults: a systematic review of epidemiological studies. Eur J Public Health. 2002; 12(3): 171-176.
Bhatia SK. Biomaterials for clinical applications. New York: Springer; 2010.
Bhopal R, Unwin N, White M, Yallop J, Walker L, Alberti KGMM. Heterogeneity of coronary heart disease risk factors in Indian, Pakistani, Bangladeshi, and European origin populations: a cross-sectional study. BMJ. 1999; 319(7204): 215-220.
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Florida: Chapman & Hall; 1984.
Breiman L. Random forest. Mach Learn. 2001; 45: 5-32.
Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med. 2005; 34(2): 113-127.
Fisher RF. The use of multiple measurements in taxonomic problems. Ann Eugen. 1936; 7(2): 179-188.
Goldstein BA, Navar AM, Carter RE. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur Heart J. 2017; 38(23): 1805-1814.
Good IJ. Probability and the weighing of evidence. London: Charles Griffin; 1950.
Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, Khan Z. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. Peer J Comput. Sci. 2021; 7: e562; https://doi.org/10.7717/peerj-cs.562
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction (2nd edition). New York: Springer; 2009.
HeidarAbadi NN, Hakemi L, Kolivand P, Safdari R, Saeidi MG. Comparing performances of intelligent classifier algorithms for predicting type of pain in patients with spinal cord injury. Electron. Physician. 2017; 9(7): 4847-4852.
Hussain S, Sattar U, Azhar MA. Risk factors for ischemic heart disease in Southern Punjab. Pak Heart J. 2013; 46(4): 232-237.
Iqbal F, Jafri YZ, Siddiqi AR, Sabir MA. Determining risk factors for ischemic heart disease using logistic regression and classification tree. Sylwan. 2014; 158(6): 69-87.
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer; 2013.
Kate RJ, Perez RM, Mazumdar D, Pasupathy KS, Nilakantan V. Prediction and detection models for acute kidney injury in hospitalized older adults. BMC Med Inform Decis Mak. 2016; 16(1):1-11.
Keys A, Menotti A, Aravanis C, et al. The seven countries study: 2,289 deaths in 15 years. Prev Med. 1984; 13(2): 141-154.
Khan Z, Naeem M, Khalil U, Khan DM, Aldahmani S, Hamraz M. Feature selection for binary classification within functional genomics experiments via interquartile range and clustering. IEEE Access. 2019; 7: 78159-78169.
Kim JK, Kang S. Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng. 2017; 2017: 1-13.
Koolhaas CM, Dhana K, Golubic R, Schoufour JD, Hofman A, Rooij FA, Franco OH. Physical activity types and coronary heart disease risk in middle-aged and elderly persons: the Rotterdam study. Am J Epidemiol. 2016; 183(8): 729-738.
Kuhn M, Johnson K. Applied predictive modeling. New York: Springer; 2013.
Kurt I, Ture M, Kurum AT. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl. 2008; 34(1): 366-374.
Lee IM, Sesso HD, Paffenbarger RS Jr. Physical activity and coronary heart disease risk in men: does the duration of exercise episodes predict risk? Circulation. 2000; 102(9): 981-986.
Lee IM, Rexrode KM, Cook NR, Manson JE, Buring JE. Physical activity and coronary heart disease in women: is “no pain, no gain” passe? JAMA. 2001; 285(11): 1447-54.
Luo XQ, Yan P, Zhang NY, et al. Machine learning for early discrimination between transient and persistent acute kidney injury in critically ill patients with sepsis. Sci Rep. 2021; 11(1): 1-12.
MacMahon S, Peto R, Cutler J, et al. Blood pressure, stroke, and coronary heart disease. part 1, prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias. Lancet. 1990; 335(8692): 765-774.
McCulloch WS, Pitts WH. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943; 5(4): 115-133.
Olisah CC, Smith L, Smith M. Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Comput Methods Programs Biomed. 2022; 220: 106773; https://doi.org/10.1016/j.cmpb.2022.106773
Peng CYJ, So TSH, Stage FK, John EPS. The use and interpretation of logistic regression in higher education journals: 1988-1999. Stud High Educ. 2002; 43(3): 259-293.
Peter WF, Wilson, Ralph B, et al. prediction of coronary heart disease using risk factor categories. Am Heart J. 1998; 97(18): 1837-1847.
Qasim OS, Algamal ZY. Improving feature selection for credit scoring classification using a novel hybrid algorithm. Thail Stat. 2021; 19(3): 593-605.
R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. 2018; URL http://www.R-project.org/.
Ross EG, Shah NH, Dalman RL, Nead KT, Cooke JP, Leeper NJ. The use of machine learning for the identification of peripheral artery disease and future mortality risk. J Vasc Surg. 2016; 64(5): 1515-1522.
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med. 2021; 13(1): 1-7.
Trevethan R. Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice. Front Public Health. 2017; 5: 307.
Wazir B, Khan DM, Khalil U, Hamraz M, Gul N, Khan, Z. Regulatory genes identification within functional genomics experiments for tissue classification into binary classes via machine learning techniques. J Pak Med Assoc. 2020; 70(12): 2356-2362.
Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017; 12: e0174944; https://doi.org/10.1371
/journal.pone.0174944
WHO. Cardiovascular diseases: Key Facts; 2017. Available from: http://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
Yusuf S, Hawken S, Ôunpuu S, et al. Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the interheart study): a case-control study. Lancet. 2004; 364(9438): 937-952.
Zhou X, Obuchowski N, NcClish D. Statistical methods in diagnostic medicine. New York: Wiley-Interscience; 2002.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
