Pattern-based Extraction of Named Entities in Thai News Documents

##plugins.themes.bootstrap3.article.main##

Nattapong Tongtep
Thanaruk Theeramunkong

摘要

Named entity extraction is a nontrivial and challenging task for information extraction in Thai language since a Thai text has no word, phrase and sentence boundary. This paper proposes a pattern-based method to extract Thai named entities, such as person name, organization name, location, date and time, as well as action phrases from a text, without assistance of word segmentation and part-of-speech tagging. The experimental results show that the proposed method can detect named entities with approximately 68-100% correctness, using a large-scale Thai dictionary and a set of predefined pattern matching templates.

##plugins.themes.bootstrap3.article.details##

栏目
Articles