A Hybrid Approach for Thai Documents Plagiarism Detectionusing N-gram and Semantic Role Labeling Technique

Main Article Content

สรวัตร ประภานิติเสถียร
ไกรศักดิ์ เกษร

Abstract

- Plagiarism detection methods have been required by many education institutions because students in these institutions use other people works or idea without referencing the original authors. However, the existing methods obtain poor performance when applying them to Thai documents because the sentence structure of Thai is different from English (e.g. no space between words). This research presents a hybrid approach for Thai document plagiarism detection using the N-gram and Semantic Role Labeling (SRL) techniques. In addition, keywords of documents are weighted to enhance similarity computation efficiency. As a result, the proposed technique can improve the plagiarism detection performance compared to traditional SRL and other existing methods.

Article Details

How to Cite
[1]
ประภานิติเสถียร ส. and เกษร ไ., “A Hybrid Approach for Thai Documents Plagiarism Detectionusing N-gram and Semantic Role Labeling Technique”, JIST, vol. 5, no. 1, pp. 42–50, Jun. 2015.
Section
Research Article: Soft Computing (Detail in Scope of Journal)

References

1. S. Schleimer, et al., "Winnowing: local algorithms for document fingerprinting," in Proceedings of the 2003 ACM SIGMOD international conference on Management of data,ed. New York, NY, USA: ACM, 2003, pp. 76–85

2. L. Nick, "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm," vol. 1988, pp. 285–318(2).

3. Z. Du, et al., "A Cluster-Based Plagiarism Detection Method " in Conference and Labs of the Evaluation Forum, 2010.

4. B. Gipp and N. Meuschke, "Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence," in Proceedings of the 11th ACM symposium on ocument engineering, ed. New York, NY, USA: ACM, 2011, pp. 249–258.

5. A. H. Osman, et al., "An improved plagiarism detection scheme based on semantic role labeling," Appl. Soft Comput., vol. 12, pp. 1493–1502 2012.

6. ศ. น. สิทธิโชค ปัญญาฤกษ์ชัย, "Information Retrieval System Using N-Gram Technique," presented at the The 5th National Conference on Computing and Information Technology, 2009.

7. ช. จ. อัษฎางค์ แตงไทย "การย่อความเอกสาร ภาษาไทยโดยกรรมวิธีการแยกค่าแบบเดี่ยว," presented at the National Computer Science and Engineering Conference, 2004.

8. อ. เอกวงศ์อนันต์, "การระบุคาไทยและคาทับ ศัพท์ด้วยแบบจาลองเอ็นแกรม," วิทยานิพนธ์มหาบัณฑิต ภาควิชาภาษาศาสตร์, จุฬาลงกรณ์ มหาวิทยาลัย, 2548

9. T. Chumwatana, "Using N-gram and Frequent Max Substring Techniques for Index-Term Extraction from Non-Segmented Texts: A Comparison of Two Techniques," Journal of Information Science and Technology, vol. 3, pp. 8-15, JANUARY - JUNE 2012 2012.

10. R. Pankhuenkhat, "การวิเคราะห์ประโยค ภาษาไทย (Immediate Constituents)," สารภาษาไทยและวัฒนธรรมไทย vol. June - November 2007, 2007.