Temporal Data and Diabetes Classification in Thailand
Main Article Content
Abstract
- Diabetes mellitus is a chronic disease that reduces quality of life since it often causes other complications such as heart disease, high blood pressure, neuropathy and the loss of some organs in the body. This work proposes a temporal features extraction model which extracts the features embedded in historical data such as health examination data for classification. The proposed model can be used with any promising classification methods such as Naïve Bayes, Logistic Regression, C4.5 (J48), Bagging and SVMs. This work evaluates the proposed method on health examination data during 2004-2010 (7 years) of factory employees in Thailand. It consists of 43,523 employees in total where 28,808 employees have only one record and 14,715 employees are examined more than once. Resampling with replacement is applied to the dataset for balancing training instances among the classes before proceeding to training process. Features used for diabetes classification are categorized into three groups: Physical Examination, Urinalysis and Biochemistry. The results of experiments show that the data with temporal feature gains higher classification performance than the data without temporal feature.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
I/we certify that I/we have participated sufficiently in the intellectual content, conception and design of this work or the analysis and interpretation of the data (when applicable), as well as the writing of the manuscript, to take public responsibility for it and have agreed to have my/our name listed as a contributor. I/we believe the manuscript represents valid work. Neither this manuscript nor one with substantially similar content under my/our authorship has been published or is being considered for publication elsewhere, except as described in the covering letter. I/we certify that all the data collected during the study is presented in this manuscript and no data from the study has been or will be published separately. I/we attest that, if requested by the editors, I/we will provide the data/information or will cooperate fully in obtaining and providing the data/information on which the manuscript is based, for examination by the editors or their assignees. Financial interests, direct or indirect, that exist or may be perceived to exist for individual contributors in connection with the content of this paper have been disclosed in the cover letter. Sources of outside support of the project are named in the cover letter.
I/We hereby transfer(s), assign(s), or otherwise convey(s) all copyright ownership, including any and all rights incidental thereto, exclusively to the Journal, in the event that such work is published by the Journal. The Journal shall own the work, including 1) copyright; 2) the right to grant permission to republish the article in whole or in part, with or without fee; 3) the right to produce preprints or reprints and translate into languages other than English for sale or free distribution; and 4) the right to republish the work in a collection of articles in any other mechanical or electronic format.
We give the rights to the corresponding author to make necessary changes as per the request of the journal, do the rest of the correspondence on our behalf and he/she will act as the guarantor for the manuscript on our behalf.
All persons who have made substantial contributions to the work reported in the manuscript, but who are not contributors, are named in the Acknowledgment and have given me/us their written permission to be named. If I/we do not include an Acknowledgment that means I/we have not received substantial contributions from non-contributors and no contributor has been omitted.
References
2. สานักงานสารวจสุขภาพประชาชนไทย. “รายงานการสารวจสุขภาพประชาชนไทยโดยการตรวจร่างกาย.” ออนไลน์]. เข้าถึงได้จาก : https://nheso.or.th/loadfile/diabetes_mellitus.pdf, 2554.
3. B. H. Cho, H. Yu, K. Kim, T. H. Kim, I. Y. Kim and S. I. Kim. Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods. Journal Artificial Intelligence in Medicine. 2008, 42 : 37-53.
4. H. N. A. Pham and E. Triantaphyllou. Prediction of Diabetes by Employing a New Data Mining Approach Which Balance Fitting and Generalization. Computer and Information Science. 2008, 131:11-26.
5. K. Takahashi, H. Uchiyama, S. Yanagisawa and I. Kamae. The Logistic Regression and ROC Analysis of Group-based Screening for Predicting Diabetes Incidence in Four Years. The Kobe journal of medical science. 2006, 52 (6): 171-180.
6. B. A. Tama, Rodiyatul F.S. and Hermansyah. An Early Detection Method of Type-2 Diabetes Mellitus in Public Hospital. Proceeding of The International Conference on Informatics, Cybernetic,and Computer Applications. Bangalore. 2010, 9 (2): 287-294.
7. R. Peter and T. Thomas. Temporal Data Classification using Linear Classifiers. Journal Information Systems. 2011, 36 (1): 30-41.
8. G. Parthiban, A. Rajesh, and S. K. Srivatsa. Diagnosis of Heart Disease for Diabetic Patients using Naïve Bayes Method. International Journal of Computer Applications. 24 (2011) : 7-11.
9. World Health Organization. BMI Classification. Online]. Available : https://apps.who.int/bmi/index.jsp? introPage=intro_3.html, 2011.
10. World Health Organization. 2003 World Health Organization (WHO)/International Society of Hypertension (ISH) statement on management of hypertension. Online]. Available : https://www.who.int/ cardiovascular_diseases/guidelines/hypertension/en/, 2011.
11. I. H. Witten, E. Frank. Data mining: Practical Machine Learning Tools and Techniques, 3rd Edition. San Francisco: Morgan Kaufmann, 2011.
12. A. T. Arnholt. Resample with R. Teaching Statistics, 2007, 29(1), 21-26.