Application of Text Mining for Data Clustering: A Case Study for Cancer

Main Article Content

สุภาพร วีระพันธ์ยานนท์
พยุง มีสัจ

Abstract

In this research, application of text mining for data clustering in case study for cancer. We used testing data set by searching a definition keyword on website that related to cancer such as cancer, cancer treatment, cancer symptoms, diet for cancer patients, anti-cancer supplements and cancer treatment herb. We propose a simple method of text mining by comparing document indexing using TFIDF, WTFIDF and FTFIDF formulas. The experiment has been done using hierarchical clustering algorithm such as single link, average link and complete link. The results of testing showed that WTFIDF with Complete link algorithm gives the better accuracy for text classification when compared to other algorithms.

Article Details

How to Cite
1.
วีระพันธ์ยานนท์ ส, มีสัจ พ. Application of Text Mining for Data Clustering: A Case Study for Cancer. Prog Appl Sci Tech. [Internet]. 2019 Jun. 27 [cited 2024 May 7];9(1):71-9. Available from: https://ph02.tci-thaijo.org/index.php/past/article/view/242976
Section
Information and Communications Technology

References

ทินกร คุณาสิทธิ์, สิรภัทร เชี่ยวชาญวัฒนา และ คํารณ สุนัติ. “การจัดกลุ่มเอกสารคำอธิบายเว็บภาษาไทยสำหรับผลการสืบค้นด้วยนอนเน็กกาทีฟเมริกซ์แฟกทอไรเซชั่น.” ใน The 4th National conference on Coputer and Information Technology: NCCIT2008. vol. 23–24 พฤษภาคม 2551, no. 4 : 386–391.

Chainapaporn, P. and Netisopakul, P. “Thai Herb Information Extraction from Multiple Websites.” Knowledge and Smart Technology (KST), 2012 4th International Conference. on, Jul. 2012 : 16–23.

Gupta, V. and Lehal, S. G. “A Survey of Text Mining Techniques and Applications.” JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE. Vol.1 No.1 (Aug 2009) : 60-76.

CARPINETO, C. OSI´STANISlAW OSI´, S. ROMANO,G. and WEISS, D. “A Survey of Web Clustering Engines.” ACM Computing Surveys vol. July 2009 : No. 3.

Thanadechteemapat, W. and Chun Che Fung. “Automatic Content Extraction and Visualization of Thai Websites for Improved Information Representation.” IEEE International Conference on Systems, Man, and Cybernetics. vol. 2012, Oct. 2012 : 2230–2234.

Nuipian, V. Meesad, P. and Boonrawd, P. “A Comparison between Keywords and Key-phrases in Text Categorization using Feature Section Technique.” 2011 Ninth International Conference on ICT and Knowledge Engineering. (2012) : 156-160.

Ren, F. and Sohrab, G. M. “Class-indexing-based term weighting for automatic text classification.” Information Sciences 236 (2013). Vol. 236 No. (July 2013) : 109-125.

Kwok, T-Y. J. “Automated Text Categorization Using Support Vector Machine.” In Proceedings of the International Conference on Neural Information Processing (ICONIP). 1998 : 347-351.

Trstenjak, B. Mikac, S. and Donko, D. “KNN with TF-IDF Based Framework for Text Categorization.” Proceeding on the 24th DAAAM International Symposium on Intelligent Manufacturing and Automation, 2013. (2013) : 1356-1364.

C.-F. Tsai, C.-T. Tsai, C.-S. Hung, and P.-S. Hwang, “Data Mining Techniques for Identifying Students at Risk of Failing a Computer Proficiency Test Required for Graduation,” Australasian Journal of Educational Technology, vol. 27, no. 3, pp. 481–498, 2011.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques: Concepts and Techniques. Elsevier, 2011.

วราภรณ์ คงสมพงษ์ และธีรพงษ์ สังข์ศรี. "การสกัดเอกสารสนเทศของเอกสารโครงงานนักศึกษาแบบอัตโนมัติบนฐานของกฎ." ใน The 3rd ASEAN Undergraduate Conference in Computing (AUC2) 2015.

Quinlan, J. R. (1986) “Introduction of Decision Trees.” In Machin Learning. 81-106.