TGF-GRU: A Cyber-bullying Autonomous Detector of Lexical Thai across Social Media

Main Article Content

PAKPOOM MOOKDARSANIT

Abstract

Continually, one of the most the fragile states in Thailand are originated from cyber-bullying across social media networks (OSNs). Cyber-bullying intentionally is plotted to offend other people, particularly in politics. This paper develops a novel linguistic model to detect the Thai-bullying label on OSNs. Our model is based on “Gated Recurrent Unit (GRU)” that has a pre-process for dimensional reduction algorithm called “Lexical Thai Grammatical Filtering (TGF)”. Our developed TGF-GRU is formulated by the 10,900 Thai texts from posts/comments on Facebook. From the results, TGF can improve the accuracy of normal GRU as 8.41% with a little time consumption. Notwithstanding, some synonyms, homographs or insinuations of Thai jargons can easily confuse the detection. In a nutshell, TGF-GRU model will be able to be used as an additional AI feature to autonomously detect the inappropriate Thai text before a user posts or comments on Facebook. For years, some cyber-bullying labels (e.g. pejorative, sexual comment, hate speech, rumor, slandering, etc.) will have been autonomously detected and filtered out; the causes of social fragile state will be gradually mitigated by Thai-bullying detector.

Article Details

How to Cite
[1]
P. MOOKDARSANIT, “TGF-GRU: A Cyber-bullying Autonomous Detector of Lexical Thai across Social Media”, NKRAFA J SCI TECH, vol. 15, no. 1, pp. 50–58, Dec. 2019.
Section
Research Articles

References

Satienkoses, Y. (1981). Essays on Thai Folklore. Bangkok: Duang Kamol.

Inthajakra, L., Prachyapruit, A. & Chantavanich, S. (2016). The Emergence of Communication Intellectual History in Sukhothai and Ayutthaya Kingdom of Thailand. Social Science Asia, 2(4): 32-41.

World Bank. (2018). Population, Total. Retrieved on May 15, 2019, from https://data.worldbank.org/indicator/SP.POP.TOTL?locations=TH

Haruechaiyasak, C., Kongyoung, S. & Dailey, M. (2008). A comparative study on Thai word segmentation approaches. In proceedings of the 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (125-128). Krabi, Thailand: IEEE.

Klahan, A., et al. (2018). Thai Word Safe Segmentation with Bounding Extension for Data Indexing in Search Engine. In Proceedings of the 14th International Conference on Computing and Information Technology (83-92). Chiang Mai, Thailand: Springer.

W3 Techs. (2018). Historical trends in the usage of content languages for websites. Retrieved on May 15, 2019, from https://w3techs.com/technologies/history_overview/content_language

Smith, P. K., et al. (2008). Cyberbullying: its nature and impact in secondary school pupils. The Journal of Child Psychology and Psychiatry, 49(4): 376-385.

Sittichai, R. & Smith, P. K. (2018). Bullying and Cyberbullying in Thailand: Coping Strategies and Relation to Age, Gender, Religion and Victim Status. Journal of New Approaches in Educational Research, 7(1): 24-30.

Social Media Today. (2018). Facebook Reaches 2.38 Billion Users, Beats Revenue Estimates in Latest Update. Retrieved on May 15, 2019, from https://www.socialmediatoday.com/news/facebook-reaches-238-billion-users-beats-revenue-estimates-in-latest-upda/553403/

Koanantakool, T., Karoonboonyanan, T. & Wutiwiwatchai, C. (2009). Computers and the Thai Language. IEEE Annals of the History of Computing, 31(1): 46-61.

Sornlertlamvanich, V., et al. (2000). The state of the art in Thai language processing. In proceedings of the 38th Annual Meeting on Association for Computational Linguistics(1-2). Stroudsburg, PA, USA: ACM.

Mittrapiyanuruk, P., et al. (2000). Issues in Thai Text-to-Speech Synthesis: The NECTEC Approach. In proceeding of NECTEC Annual Conference (483-495). Bangkok, Thailand: NECTEC.

Chinathinmatmongkhon N., Suchato, A. & Punyabukkana, P. (2008). Implementing Thai Text-to-Speech Synthesis for Hand-held Devices. In proceeding of the 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (125-128). Krabi, Thailand: IEEE.

Wutiwiwatchai, C. & Furui S. (2007). Thai speech processing technology: A review. Journal Speech Communication. 49(1): 8-27.

Assawinjaipetch, P., et al. (2016). Recurrent Neural Network with Word Embedding for Complaint Classification. In proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (36-43). Osaka, Japan: ACL Technology.

Nomponkrang, T. & Sanrach C. (2016). The Comparison of Algorithms for Thai-Sentence Classification. International Journal of Information and Education Technology. 6(10): 801-808.

Sukakanya, U. & Porkaew , K. (2009). Comparison of Classification Techniques for Thai Web Document Classification. World Academy of Science, Engineering and Technology. 2009: 895-901.

Haruechaiyasak, C., et al. (2013). S-Sense: A Sentiment Analysis Framework for Social Media Sensing. In proceedings of the 6th International Joint Conference on Natural Language Processing (6-13). Nagoya, Japan: The Association for Computational Linguistics.

Nararatwong, R., et al. (2018). Improving Thai Word and Sentence Segmentation Using Linguistic Knowledge. IEICE Transactions on Information and Systems. E101.D(12): 3218-3225.

Lapjaturapit, T., Viriyayudhakom, K. & Theeramunkong, T. (2018). Multi-Candidate Word Segmentation using Bi-directional LSTM Neural Networks. In proceedings of the 2018 International Conference on Embedded Systems and Intelligent Technology & International Conference on Information and Communication Technology for Embedded Systems (1-6). Khon Kaen, Thailand: IEEE.

Theeramunkong, T. & Usanavasin S. (2003). Non-dictionary-based Thai word segmentation using decision trees. In proceedings of the first international conference on Human language technology research (1-5). San Diego: ACM.

Theeramunkong, T. & Tanhermhong T. (2004). Pattern-Based Features vs. Statistical-Based Features in Decision Trees for Word Segmentation. IEICE TRANSACTIONS on Information and Systems. E87-D(5): 1254-1260.

Boonkwan, P. & Supnithi, T. (2017). Bidirectional Deep Learning of Context Representation for Joint Word Segmentation and POS Tagging. In proceedings of the 5th International Conference on Computer Science, Applied Mathematics and Applications. Berlin, Germany: Springer.

Lyons, S. (2016). Quality of Thai to English Machine Translation. In proceedings of Pacific Rim Knowledge Acquisition: Workshop on Knowledge Management and Acquisition for Intelligent Systems (261-270). Phuket, Thailand: Springer.

Luekhong, P., et al. (2016). A Study of a Thai-English Translation Comparing on Applying Phrase-Based and Hierarchical Phrase-Based Translation. In proceeding of international Symposium on Natural Language Processing (38-48). Ayutthaya, Thailand: Springer.

Sutantayawalee, V., et al. (2017). Improvement of Statistical Machine Translation using Charater Based Segmentation with Monolingual and Bilingual Information. In proceedings of the 28th Pacific Asia Conference on Language, Information and Computation (145-151). Phuket, Thailand: ACL Technology.

Chung, J., et al. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv. 1412.3555: 1-9.