Data mining model and application for stroke prediction: A combination of demographic and medical screening data approach

Main Article Content

Sotarat Thammaboosadee
Teerapat Kansadub


This paper presents the data mining process that was used for building a stroke prediction model based on demographic information and medical screening data. The data that was gathered from a physical therapy center in Thailand comprised of outpatients’ medical records, medical screening forms, and a target variable. A group of 147 stroke patients and 294 non-stroke individuals with six demographic predictors were selected for the study. Three classification algorithms were used in the study. These were; Na¨ıve Bayes, Decision Tree, and Artificial Neural Network (ANN). They were used to analyze the data collected and the results were compared. They were evaluated by use of a 10-fold cross-validation method. The selection criteria were primarily measured by accuracy and the area under ROC curve (AUC). The secondary selection criteria were indicated by False-Positive Rate (FPR) and False-Negative Rate (FNR). The results showed that the best performing algorithm that was studied was ANN combined with integrated data. This approach have an overall accuracy of 0.84, an AUC of 0.90, a FPR of 0.12 and an FNR of 0.25. The results of the study demonstrated that ANN with the integration of demographic and medical screening data produced the best predictive performance compared to the other models. This result was found according to both the primary and secondary model selection criteria.

Article Details

How to Cite
Thammaboosadee, S., & Kansadub, T. (2019). Data mining model and application for stroke prediction: A combination of demographic and medical screening data approach. Interdisciplinary Research Review, 14(4), 61–69. Retrieved from
Research Articles


K. K. Andersen, T. S. Olsen, C. Dehlendorff, L. P. Kammersgaard, Hemorrhagic and ischemic strokes compared, Stroke 40(2009) 2068–72.

World Health Statistics 2015 [Internet], World Health Organization, World Health Organization; 2016 [cited Mar 1, 2018]. Available from:

I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data mining: practical machine learning tools and techniques, Amsterdam: Morgan Kaufmann (2017).

G. Piatetsky-Shapiro, W. Frawley, Knowledge discovery in databases, Menlo Park, CA: AAAI Press (1991).

N. Poungvarin, Stroke. 2nd ed. Bangkok: Siriraj hospital (2001).

S. Hanchaiphiboolkul, N. Poungvarin, S. Nidhinandana, N. C. Suwanwela, P. Puthkhao, S. Towanabut, textit{et al}. Prevalence of stroke and stroke risk factors in Thailand: Thai Epidemiologic Stroke (TES) study, Journal of the Medical Association of Thailand 94(2011) 427–36.

A. K. Arslan, C. Colak, M. E. Sarihan, Different medical data mining approaches based prediction of ischemic stroke, Computer Methods and Programs in Biomedicine 130(2016) 87–92.

L. Amini, R. Azarpazhouh, M. T. Farzadfar, S. A. Mousavi, F. Jazaieri, F. Khorvash, textit{et al.}, Prediction and control of stroke by data Mining, International Journal of Preventive Medicine 4(2013) s245–s249.

A. Sudha, P. Gayathri, N. Jaisankar, Effective analysis and predictive model of stroke disease using classification methods, International Journal of Computer Applications 43(2012) 26–31.

J. F. Easton, C. R. Stephens, M. Angelova, Risk factors and prediction of very short term versus short/intermediate term post-stroke mortality: A data mining approach, Computers in Biology and Medicine 54(2014) 199–210.

S. Panzarasa, S. Quaglini, L. Sacchi, A. Cavallini, G. Micieli, M. Stefanelli, Data mining techniques for analyzing stroke care processes, Studies in Health Technology and Informatics 2(2010) 939–43.

International Classification of Diseases, 10th Revision (ICD-10) [Internet]. World Health Organization. World Health Organization; 2010 [cited Jun 16, 2018], Available from: http:// classifications/icd/en/

C. X. Ling, V. S. Sheng, Class imbalance problem, Encyclopedia of Machine Learning and Data Mining (2017) 204–5.

S. Russell, P. Norvig, Artificial intelligence: A modern approach, S.L.: PEARSON (2018).

H. Jeffreys, Scientific inference. Cambridge: Cambridge University Press (2010).

J. R. Quinlan, C4.5 - programs for machine learning. San Mateo, CA: Kaufmann (1992).

S. Kullback, Information theory and statistics. Mineola, N.Y: Dover Publications (1997).

A. Amani, D. Mohammadyani, Artificial neural networks: Applications in nanotechnology, Artificial neural networks - Application. Nov 2011.

L. Franco, J. M. Jerez, J. M. Bravo, Role of function complexity and network size in the generalization ability of feedforward networks, Computational Intelligence and Bioinspired Systems Lecture Notes in Computer Science (2005) 1–8.

R. Eberhart, P. Simpson, R. Dobbins, Computational intelligence PC tools, Boston: AP Professional (1996).

P. A. Devijver, J. Kittler, Pattern recognition: a statistical approach, Taipei: Sung Kang (1982).

G. J. McLachlan,K-A. Do, C. Ambroise, Analyzing microarray gene expression data, Hoboken, NJ: Wiley-Interscience (2004).