Development of Semantic Search Model Based on The Doc2Vec Technique for an Application for Vocal Music Search

Main Article Content

พิศาล สุขขี

Abstract

This research aims to 1) improve the efficiency of meaningful search subjects from doc2Vec techniques and 2) develop a system for searching for songs from vocals with meaningful search models from Doc2Vec techniques. The data used in the research were Thai lyrics. Research methods consist of data collection, management of Thai lyrics data by natural language processing methods, and Doc2Vec query model creation. Adjusting the appropriate parameters to enhance the performance of the model Finding the magnitude of the inference vector that is appropriate and results to the search accuracy of the model. Using an optimized model to develop a prototype system for vocal song search.


The findings were as follows: 1) A study to improve the model efficiency found that the size of the questionnaires affecting the search efficiency of the model was 30 percent of the length of the original lyrics to be searched. The size of the vector suitable for modeling in the Thai Folk lyrics dataset is 200. It was found that retaining the stop word in the lyrics resulted in higher efficiency in information retrieval information than eliminating the stop word. 2) Developing a program for searching for songs with vocals, the researcher uses the Web Speech API is a program for receiving vocals from users and searching for them with a model that has been optimized for performance. This allows developers and users to know the appropriate length of the lyrics to be able to search for the songs they want. The query does not necessarily have the accuracy of the original lyrics.

Article Details

How to Cite
สุขขี พ. (2021). Development of Semantic Search Model Based on The Doc2Vec Technique for an Application for Vocal Music Search. Journal of Technology Management Rajabhat Maha Sarakham University, 8(2), 101–113. retrieved from https://ph02.tci-thaijo.org/index.php/itm-journal/article/view/245330
Section
บทความวิจัย

References

[1] สมจิน เปียโคกสูง, และนิศาชล จํานงศรี. (2553). ระบบนําทางความรู้เพื่อการเข้าถึงเนื้อหาในสื่อสิ่งพิมพ์. วารสารสารสนเทศ ศาสตร์, 28(3), 9-20.
[2] บุษบงก์ คชินทรโรจน์, เดือนเพ็ญ ธีรวรรณวิวัฒน์ และพาชิตชนัต ศิริพานิช. (2564). การสร้างระบบคัดกรองข้อความการเกลียดกลัวคนต่างชาติบนทวิตเตอร์ในช่วงการแพร่ระบาดของโรคติดเชื้อไวรัสโคโรนา 2019. Thai Journal of Operations Research: TJOR, 9(1), 31-44.
[3] Mikolov T., Chen K., Corrado G.S., & Dean J. (2013). Efficient Estimation of Word Representations in Vector Space. ICLR.
[4] Le Q., Mikolov T. (2014). Distributed Representations of Sentences and Document. Proceedings of the 31st International Conference on Machine Learning, Beijing, China.
[5] Dai A.M., Olah C., & Le Q.V. (2015). Document Embedding with Paragraph Vectors. ArXiv, abs/1507.07998.
[6] Lau J.H., & Baldwin T. (2016). An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. Proceedings of the 1st Workshop on Representation Learning for NLP (pp. 78–86), Berlin, Germany: Association for Computational Linguistics.
[7] Kaothanthong N., Kongyoung S., & Theeramunkong T. (2021). Headline2Vec: A CNN-based Feature for Thai Clickbait Headlines Classification. International Scientific Journal of Engineering And Technology, 5(1), 20-31.
[8] Alshammeria M., Atwella E., Alsalkaa M.A. (2021). Detecting Semantic-based Similarity Between Verses of The Quran with Doc2vec. Procedia Computer Science, 189, 351-358.
[9] Budiarto A., Rahutomo R., Putra H. N., Cenggoro T. W., Kacamarga M. F., & Pardamean B. (2021). Unsupervised News Topic Modelling with Doc2Vec and Spherical Clustering. Procedia Computer Science, 179, 40-46.
[10] Patra B.G., Das D., and Bandyopadhyay S. (2017). Retrieving Similar Lyrics for Music Recommendation System. Proceeding of Conference on Natural Language Processing (pp. 290-297), Kolkata, India: NLP Association of India (NLPAI).