Pattern-based Extraction of Named Entities in Thai News Documents

Main Article Content

Nattapong Tongtep
Thanaruk Theeramunkong

Abstract

Named entity extraction is a nontrivial and challenging task for information extraction in Thai language since a Thai text has no word, phrase and sentence boundary. This paper proposes a pattern-based method to extract Thai named entities, such as person name, organization name, location, date and time, as well as action phrases from a text, without assistance of word segmentation and part-of-speech tagging. The experimental results show that the proposed method can detect named entities with approximately 68-100% correctness, using a large-scale Thai dictionary and a set of predefined pattern matching templates.

Article Details

How to Cite
Tongtep, N., & Theeramunkong, T. (2015). Pattern-based Extraction of Named Entities in Thai News Documents. Science & Technology Asia, 15(1), 70–81. Retrieved from https://ph02.tci-thaijo.org/index.php/SciTechAsia/article/view/41311
Section
Articles