Comparison of Keyword Extraction Methods for Crowdfunding Projects Based on Web-Data

Main Article Content

Wenting Hou
Jian Qu

Abstract

With the development of technology, there are more and more crowdfunding projects. However, it is hard for a human to understand such projects easily. Therefore, this study aims to provide a better solution for extracting keywords from each crowdfunding project so that everyone can quickly understand the core of these projects. In this study, we compared the performance of four keyword extraction methods on crowdfunding projects. The experimental results show that Bert performs better in precision, recall, and f-measure than NLTK, LIAAD, and Harvest algorithms. Moreover, we compared four pre-training models based on Bert and found that the distills-based-multilingualcased-v1 model worked better than others with 74.0% in precision and 85.0% in F-measure.
In addition, we also created a corpus of 106,869pairs of text and its keyword for keyword extraction based on crowdfunding projects.

Article Details

How to Cite
Hou, W., & Qu, J. (2022). Comparison of Keyword Extraction Methods for Crowdfunding Projects Based on Web-Data. INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND TECHNOLOGY (ISJET), 6(2), 1–12. Retrieved from https://ph02.tci-thaijo.org/index.php/isjet/article/view/245285
Section
Research Article

References

X. Schmitt, S. Kubler, J. Robert et al., “A replicable comparison study of NER software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate,” in Proc. 2019 Sixth International Conference on Social Networks Analysis, Management and Security, 2019, pp. 338-343.

G. B. Salton and C. Buckley, “Term-Weighting Approaches In Automatic Text Retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513-523, Jan. 1988.

T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” in Proc.

The 14th International Conference on Machine Learning, 1997, pp. 143-151.

L. Huang, Y. Wu, and Q. J. C. S. Zhu, “Research and Improvement of Keyword Automatic Extraction Method,”

Journal of Computer Science, vol. 41, no. 6, pp. 204-207, Jun. 2014.

E. Loper and S. Bird, “NLTK: The Natural Language Toolkit,” in Proc. The COLING/ACL 2006 Interactive Presentation Sessions, 2006, pp. 69-72.

C. M. Bowman, P. B. Danzig, D. R. Hardy et al., “The harvest Information Discovery and Access System,” Computer

Networks and ISDN Systems, vol. 28, no. 1-2, pp. 119-125, Dec. 1995.

D. Gotz, Z. When, J. Lu et al., “Harvest: an Intelligent Visual Analytic Tool for the Masses,” in Proc. The First International Workshop on Intelligent Visual Interfaces for Text Analysis, 2010, pp. 1-4.

R.Campos, V.Mangaravite, A.Pasquali et al.,“YAKE!Keyword Extraction from Single Documents Using Multiple Local Features,” Information Sciences, vol. 509, pp. 257-289, Jan. 2020.

R. Campos, V. Mangaravite, A. Pasquali et al., “A Text Feature Based Automatic Keyword Extraction Method for Single Documents,” in Proc. European Conference on Information Retrieval, 2018, pp. 684-691.

R.Campos, V.Mangaravite, A.Pasquali et al.,“Yake!CollectionIndependent Automatic Keyword Extractor,” in Proc.

European Conference on Information Retrieval, 2018, pp. 806-810.

J. Qu, T. Theeramunkong, C. Nattee et al., “Web Translation of English Medical OOV Terms to Chinese with Data Mining Approach,” Science and Technology Asia, vol. 16, no. 2, pp. 26-40, Jun. 2011.

L. Page, S. Brin, R. Motwani et al., (1998, Jan. 28). The PageRank Citation Ranking: Bringing Order to the Web.

[Online]. Available: http;//ilpubs.stanford.edu:090/422/

R. Mihalcea and P. Tarau, “Textrank: Bringing Order into Text,” in Proc. The 2004 Conference on Empirical Methods in Natural Language Processing, 2004, pp. 404-411.

W. Li and J. Zhao, “TextRank Algorithm by Exploiting Wikipedia for Short Text Keywords Extraction,” in Proc.

3rd International Conference on Information Science and Control Engineering, 2016, pp. 683-686.

A.Bougouin, F.Boudin, and B.Daille,“Topicrank:Graph-Based Topic Ranking for Keyphrase Extraction,” in Proc. The 6th International Joint Conference on Natural Language Processing, 2013, pp. 543-551.

A. M. Martinez and A. C. Kak, “PCA Versus LDA,” IEEE Transactions on Pattern Analysis and Machine Intelligence,

vol. 23, no. 2, pp. 228-233, Feb. 2001.

S. Vogel, H. Ney, and C. Tillmann, “HMM-Based Word Alignment in Statistical Translation,” in Proc. The 16th

International Conference on Computational Linguistics, 1996, pp. 836-841.

Q. Zhang, Y. Wang, Y. Gong et al., “Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter,” in Proc. The 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 836-845.

J. Devlin, M.W. Chang, K. Lee et al., “Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” in Proc. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171-4186.

Y. Qian, C. Jia, and Y. Liu, “Bert-Based Text Keyword Extraction,” Journal of Physics: Conference Series, vol. 1992, no. 4, p. 042077, Aug. 2021.