Main Article Content
This study aims to develop a system to automatically extract and select celebrity information from websites, as traditionally celebrity information is gathered and selected by hand, which is rather time-consuming and often unable to stay updated due to large number of celebrities in Thailand, and potential ambiguity and conflict between information sources on the Internet. This study proposes a novel method that uses pattern matching and association rules to extract date of birth, height and weight of celebrities from websites. In addition, a weight estimation system based on height and BMI is developed in this study. It is found that our system is able to obtain more celebrity information than many Thai websites such as MThai.com. Also, the weight estimation system is able to estimate celebrities’ weights based on height and BMI index.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
 D. Xu, Meizhuo Z, and et al., “Data-driven information extraction from Chinese electronic medical records,” PloS
one, vol. 10, no. 8, pp. 1-18, 2015.
 S. S. Sazali, N. A. Rahman, and Z. A. Bakar, “Information extraction: Evaluating named entity recognition from classical Malay documents,” in 2016 Third International Conference on Information Retrieval and Knowledge Management (CAMP), 2016, pp. 48-53.
 M. De Choudhury, S. Sharma, and E. Kiciman, “Characterizing dietary choices, nutrition, and language in food deserts via social media,” in Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing, 2016, pp. 1157-1170.
 J. Li, G. Jiang, A. Xu, and Y. Wang, “The Automatic Extraction of Web Information Based on Regular Expression.,” JSW, vol. 12, pp. 180-188, 2017.
 Y. Chen, J. Zhou, and M. Guo, “A context-aware search system for internet of things based on hierarchical context
model,” Telecommunication Systems, vol. 62, no. 1, pp. 77-91, Mar. 2016.
 J. Li, A. Ritter, and E. Hovy, “Weakly supervised user profile extraction from twitter,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 165-174, 2014.
 Y. Chen, S. Y. M. Lee, and C.-R. Huang, “A robust web personal name information extraction system,” Expert
Systems with Applications, vol. 39, no. 3, pp. 2690-2699, 2012.
 M. Aboaoga and M. J. Ab Aziz, “Arabic person names recognition by using a rule based approach,” Journal of
Computer Science, vol. 9, no. 7, pp. 922, 2013.
 J. Qu and Y. Lu, “Automatic identification and multi-translatable translation of vocabulary terms with a combined approach,” in 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI), 2016, pp. 342-348.
 A. Imsombut and C. Sirikayon, “An alternative technique for populating thai tourism ontology from texts based on
machine learning,” in 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS),
2016, pp. 1-4.