Enhancement of Character-Level Representation in Bi-LSTM model for Thai NER

  • Kitiya Suriyachay School of ICT, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, 12120, Thailand
  • Thatsanee Charoenporn Asia AI Institute, Faculty of Data Science, Musashino University, Tokyo, 202-0023, Japan
  • Virach Sornlertlamvanich School of ICT, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, 12120, Thailand, Asia AI Institute, Faculty of Data Science, Musashino University, Tokyo, 202-0023, Japan
  • Natsuda Kaothanthong School of Management Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, 12120, Thailand
Keywords: Named Entity Recognition, Recurrent Neural Network, Bidirectional LSTM, CNN, CRF, Thai language, Thai named entity, TCC

Abstract

Named Entity Recognition (NER) in the Thai language is a relatively challenging task because the Thai language does not have an explicit word boundary. This normally can cause difficulties in word segmentation, which affects the efficiency in NLP post-processing such as NER tasks. Moreover, one of the important problems is the ambiguity in using common nouns to express named entities. According to the Thai language, most named entities are usually placed close to a verb or a preposition with a specific pattern. This means that the part of speech (POS) can be effectively used as a feature to consider the type of named entity. For these reasons, in this paper, we generate the BiLSTM-CNN-CRF model to investigate the effectiveness of a combination of the features among word, POS, and Thai character clusters (TCCs). We use TCCs instead of characters to minimize word segmentation errors in the corpora and increase the efficiency in generating the model. Experimental results show that our proposed model outperforms other models. The TCC is a suitable unit for character embedding, providing better results than single character embedding.

Downloads

Download data is not yet available.
Published
2021-06-25
How to Cite
Suriyachay, K., Charoenporn, T., Sornlertlamvanich, V., & Kaothanthong, N. (2021). Enhancement of Character-Level Representation in Bi-LSTM model for Thai NER. Science & Technology Asia, 26(2), 61-78. Retrieved from https://ph02.tci-thaijo.org/index.php/SciTechAsia/article/view/230527
Section
Engineering