Natural Language Interface to Database for Data Retrieval and Processing

  • Chalermpol Tapsai Department of Information Technology Management, Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
  • Phayung Meesad Department of Information Technology Management, Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
  • Choochart Haruechaiyasak National Electronics and Computer Technology Center, Pathum Thani, Thailand
Keywords: Data retrieval, Natural language, Pattern parsing, Ranking Trie, Ontology, Fuzzy system

Abstract

Though many studies related to natural language interface to a database have been conducted for many years, the results of these studies are not covered in many used cases such as the use of negative sentences, processing functions, and variety of sentence patterns with various types of query specification. To solve these problems, a model called “Natural Language Processing for Data Retrieval and Processing (NLP-DRP)” was proposed. A new algorithm named ‘Ranking Trie’ was implemented with the combination of Pattern Parsing, Ontology, and Fuzzy system to improve Lexical analysis, Semantic analysis, and Output transformation processes to allow users to retrieve and process data with various patterns of sentences and conditions. The model was incrementally tested and updated by a Learning dataset collected from users with a total of 3,868 Natural Language Query Sentences (NLQSs) then finally evaluated by the test dataset with a total of 500 NLQSs. The results showed that the NLP-DRP could retrieve data, processed, and generated outputs which consistent with user requirements with all values of Accuracy, Precision, Recall, and F-measure higher than 0.9.

Downloads

Download data is not yet available.

References

[1] A. Shah, J. Pareek, H. Patel, and N. Panchal, “NLKBIDB - Natural language and keyword based interface to database,” in Proceedings of the International Conference on Advances in Computing, Communications and Informatics, 2013, pp. 1569–1576.

[2] Y. Thairatananond, “Towards the design of a Thai text syllable analyzer,” M.S. thesis, Asian Institute of Technology, Bangkok, 1981.

[3] S. Chanyapornpong, “A Thai syllable separation algorithm,” M.S. thesis, Asean Institute of Technology, Bangkok, 1983.

[4] Y. Poowarawan, “Dictionary-based Thai syllable separation,” in Proceeding of the 9th Electrical Engineering Conference, 1986, pp. 167–175.

[5] S. Raruenrom, “Word segmentation by dictionary,” Senior Project Report, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand, 1991 (in Thai).

[6] V. Sornlertlamvanich, “Word segmentation for Thai in machine translation system,” National Electronics and Computer Technology Center, Bangkok, Thailand, 1993 (in Thai).

[7] National Electronics and Computer Technology Center, “Thai Lexeme Tokenizer : LexTo,” 2019. [Online]. Available: http://www.sansarn.com/lexto/

[8] P. Chaloenpomsawat, “Feature-Based Thai word segmentation,” M.S. thesis, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand, 1998 (in Thai).

[9] A. Kawtrakul, C. Thumkanon, and S. Seriburi “A statistical approach to Thai word filtering,” in Proceedings of the 2nd Symposium on Natural Language Processing, 1997, pp. 398–406.

[10] C. Haruechaiyasak, S. Kongyoung, and M. Dailey, “A comparative study on Thai word segmentation approaches,” in Proceedings of the 5th International Conference ECTI-CON, 2008, pp. 125–128.

[11] W. A. Woods, R. M. Kaplan, and B. N. Webber, “The lunar sciences natural language information system: Final report,” BBN Report 2378, 1972.

[12] E. D. Sacerdoti, “Language access to distributed data with error recovery, knowledge,” in Proceedings of the 5th International Joint Conferences on Artificial Intelligence, 1977, pp. 196–202.

[13] J. K. Jia, Y. B. Shao, H. Long, and Q. Z. Du, “A natural language sentence analysis algorithm based on word order modifier syntax rules,” Procedia Computer Science, vol. 166, pp. 496–500, Jan. 2020.

[14] A. Rajendra and J. Manish, “Natural language interface using shallow parsing,” International Journal of Computer Science and Application, vol. 5, no. 3, pp. 70–90, 2008.

[15] F. B. Thompson, P. C. Lockemann, B. Dostert, and R. S. Deverill, “REL: A rapidly extensible language system,” in Proceedings of the International Conference on Computational Linguistic, 1969, pp. 399–417.

[16] D. L. Waltz, “An english language question answering system for a large relational database,” Communications of the ACM, vol. 21, no. 7, pp. 526–539, 1978.

[17] G. G. Hendrix, E. D. Sacerdoti, D. Sagalowicz, and J. Slocum, “Developing a natural language interface to complex data,” ACM Transactions on Database Systems, vol. 3, no. 2, pp. 105–147, 1978.

[18] A. Gupta, A. Akula, D. Malladi, P. Kukkadapu, V. Ainavolu, and R. Sangal, “A novel approach towards building a portable nlidb system using the computational paninian grammar framework,” in Proceedings of the International Conference on Asian Language Processing, 2012, pp. 93–96.

[19] W3C, “Resource Description Framework (RDF) : Concepts and Abstract Syntax: 2014,” 2019. [Online]. Available: http://www.w3.org/TR/2014/ REC-rdf11-concepts-20140225/

[20] W3C, “RDFa Core 1.1 - Third Edition: 2015,” 2019. [Online]. Available: https://www. w3.org/ TR/2015/REC-rdfa-core-20150317/

[21] W3C, “SPARQL 1.1 Update: 2013,” 2019. [Online]. Available: https://www.w3.org/TR/ 2013/REC-sparql11-update-20130321/

[22] W3C, “Web Ontology Language (OWL): Overview:2012,” 2019. [Online]. Available: https://www.w3.org/OWL/

[23] W3C, “OWL 2 Web Ontology Language: Document Overview (Second Edition):2012,” 2019. [Online]. Available: https://www.w3.org/ TR/owl2-overview/

[24] S. Jinxing, “Ontology-based semantic information retrieval for enterprise management information system,” in Proceedings of the 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, 2011, pp. 409–412.

[25] A. Mittal, J. Sen, D. Saha, and K. Sankaranarayanan, “An Ontology based dialog interface to database,” in Proceedings of the International Conference on Management of Data, 2018, pp. 1749–1752.
[26] F. Ramli, S. A. Noah, and T. B. Kurniawan, “Ontologybased information retrieval for historical documents,” in Proceedings of the Third International Conference on Information Retrieval and Knowledge Management (CAMP), 2016, pp. 55–59.

[27] P. Smith, Applied Data Structures with C++. Massachusetts: Jones and Bartlett publisher, 2004, pp. 253–273.

[28] C. Tapsai, P. Meesad, and C. Haruechaiyasak, “TLS-ART: Thai language segmentation by automatic ranking trie,” presented at the 9th International Conference Autonomous Systems, Cala Millor, Spain, Oct. 23–28, 2016.

[29] H. J. Zimmermann, Fuzzy Set Theory and Its Applications, 4th ed. New York: Springer Seience+Business Media, 2001.

[30] P. Meesad, Fuzzy System and Artificial Neural Network. Bangkok, Thailand: KMUTNB Textbook Publishing Center, 2012 (in Thai).

[31] U. Sillapasarn, Thai Language Principle. Bangkok, Thailand: Thai Wattanapanich Publishing, 1990 (in Thai).

[32] K. Thonglor, Thai Language Principle. Bangkok, Thailand: Amorn Printing, 1996 (in Thai).

[33] Meteorological Department of Thailand, “Meteorological knowledge: Distribution criteria of rain,” 2019. [Online]. Available: https://www. tmd.go.th/info/info.php?FileID=29

[34] Royal Society of Thailand, Royal Institute Dictionary 2011. Bangkok, Thailand: Nanmeebooks Publication, 2013.

[35] Royal Society of Thailand, “Royal Institute Dictionary 2011,” 2019. [Online]. Available: http://www.royin.go.th/dictionary/

[36] Digital Government Development Agency, “High Value Dataset: Hourly rainfall data 2012–2014,” 2019. [Online]. Available: https://data.go.th/ Datasets.aspx?kw=ฝน

[37] V. Sornlertlamvanich, N. Takahashi, and H. Isahara, “Thai Part-of-speech Tagged Corpus: ORCHID,” 2019. [Online]. Available: https:// www.researchgate.net/publication/243783378_ Thai_Part-of-speech_Tagged_Corpus_ ORCHID
Published
2021-07-13
Section
Research Articles