Enhancement of Character-Level Representation in Bi-LSTM model for Thai NER

Kitiya  Suriyachay; Thatsanee  Charoenporn; Virach Sornlertlamvanich; Natsuda  Kaothanthong

PDF

Published: Jun 25, 2021

Keywords:

Named Entity Recognition Recurrent Neural Network Bidirectional LSTM CNN CRF Thai language Thai named entity TCC

Kitiya Suriyachay

School of ICT, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, 12120, Thailand

Thatsanee Charoenporn

Asia AI Institute, Faculty of Data Science, Musashino University, Tokyo, 202-0023, Japan

Virach Sornlertlamvanich

School of ICT, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, 12120, Thailand, Asia AI Institute, Faculty of Data Science, Musashino University, Tokyo, 202-0023, Japan

Natsuda Kaothanthong

School of Management Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, 12120, Thailand

Abstract

Named Entity Recognition (NER) in the Thai language is a relatively challenging task because the Thai language does not have an explicit word boundary. This normally can cause difficulties in word segmentation, which affects the efficiency in NLP post-processing such as NER tasks. Moreover, one of the important problems is the ambiguity in using common nouns to express named entities. According to the Thai language, most named entities are usually placed close to a verb or a preposition with a specific pattern. This means that the part of speech (POS) can be effectively used as a feature to consider the type of named entity. For these reasons, in this paper, we generate the BiLSTM-CNN-CRF model to investigate the effectiveness of a combination of the features among word, POS, and Thai character clusters (TCCs). We use TCCs instead of characters to minimize word segmentation errors in the corpora and increase the efficiency in generating the model. Experimental results show that our proposed model outperforms other models. The TCC is a suitable unit for character embedding, providing better results than single character embedding.

How to Cite

Suriyachay, K. ., Charoenporn, T. ., Sornlertlamvanich, V., & Kaothanthong, N. . (2021). Enhancement of Character-Level Representation in Bi-LSTM model for Thai NER. Science & Technology Asia, 26(2), 61–78. retrieved from https://ph02.tci-thaijo.org/index.php/SciTechAsia/article/view/230527

Issue

Vol.26 No.2 (April-June 2021)

Section

Engineering

Article Sidebar

Main Article Content

Abstract

Article Details