A Hybrid Approach for Thai Documents Plagiarism Detectionusing N-gram and Semantic Role Labeling Technique

สรวัตร ประภานิติเสถียร; ไกรศักดิ์ เกษร

doi:10.14456/jist.2015.5

pdf

เผยแพร่แล้ว: มิ.ย. 30, 2015

DOI: https://doi.org/10.14456/jist.2015.5

คำสำคัญ:

การโจรกรรมความคิดทางวิชาการ การเปรียบเทียบเชิงความหมาย Semantic Role Labeling N-grams

สรวัตร ประภานิติเสถียร

ภาควิชาวิทยาการคอมพิวเตอร์และเทคโนโลยีสารสนเทศ คณะวิทยาศาสตร์ มหาวิทยาลัยนเรศวร

ไกรศักดิ์ เกษร

ภาควิชาวิทยาการคอมพิวเตอร์และเทคโนโลยีสารสนเทศ คณะวิทยาศาสตร์ มหาวิทยาลัยนเรศวร

บทคัดย่อ

- การตรวจจับการคัดลอกเชิงวิชาการเป็นสิ่งที่ได้รับความสนใจอย่างมาก โดยเฉพาะในสถาบันการศึกษาเนื่องจากนักศึกษามักจะกระทาผิดโดยการนาเอาผลงานหรือแนวคิดผู้อื่นมาแอบอ้างเป็นงานของตนเอง แต่เทคนิคในการตรวจจับการคัดลอกของเอกสารที่นิยมใช้กันในปัจจุบันนั้น เมื่อนามาใช้กับเอกสารภาษาไทยพบว่ามีประสิทธิภาพที่ต่าเนื่องจากปัญหาด้านโครงสร้างไวยากรณ์ของภาษาไทย งานวิจัยนี้จึงนาเสนอการนาหลักไวยากรณ์และการตารางความน่าจะเป็นแบบ 5 และ 3 แกรม (N-gram) ที่สร้างจากตัวแบบทานายต้นไม้ในการปรับปรุงโครงสร้างของประโยคและเทคนิค Semantic Role Labeling ร่วมกับการให้ค่าน้าหนักของคาในการเปรียบเทียบเชิงความหมาย จากการทดลองพบว่าทาการปรับปรุงโครงสร้างของประโยคแล้วมีประสิทธิภาพในการตรวจจับมากยิ่งขึ้นกว่าการใช้เทคนิค Semantic Role Labeling ร่วมกับการให้ค่าน้าหนักของคาเพียงอย่างเดียว

รูปแบบการอ้างอิง

[1]

ประภานิติเสถียร ส. และ เกษร ไ., “การตรวจการโจรกรรมทางวิชาการด้วยใช้เทคนิค N-gram ร่วมกับเทคนิคการตรวจสอบเชิงความหมายสาหรับเอกสารภาษาไทย”, JIST, ปี 5, ฉบับที่ 1, น. 42–50, มิ.ย. 2015.

ฉบับ

ปีที่ 5 ฉบับที่ 1 (2015): Journal of Information Science and Technology (JIST) [Jan. 2015 - Jun. 2015]

ประเภทบทความ

บทความวิจัย Soft Computing:

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

I/we certify that I/we have participated sufficiently in the intellectual content, conception and design of this work or the analysis and interpretation of the data (when applicable), as well as the writing of the manuscript, to take public responsibility for it and have agreed to have my/our name listed as a contributor. I/we believe the manuscript represents valid work. Neither this manuscript nor one with substantially similar content under my/our authorship has been published or is being considered for publication elsewhere, except as described in the covering letter. I/we certify that all the data collected during the study is presented in this manuscript and no data from the study has been or will be published separately. I/we attest that, if requested by the editors, I/we will provide the data/information or will cooperate fully in obtaining and providing the data/information on which the manuscript is based, for examination by the editors or their assignees. Financial interests, direct or indirect, that exist or may be perceived to exist for individual contributors in connection with the content of this paper have been disclosed in the cover letter. Sources of outside support of the project are named in the cover letter.
I/We hereby transfer(s), assign(s), or otherwise convey(s) all copyright ownership, including any and all rights incidental thereto, exclusively to the Journal, in the event that such work is published by the Journal. The Journal shall own the work, including 1) copyright; 2) the right to grant permission to republish the article in whole or in part, with or without fee; 3) the right to produce preprints or reprints and translate into languages other than English for sale or free distribution; and 4) the right to republish the work in a collection of articles in any other mechanical or electronic format.
We give the rights to the corresponding author to make necessary changes as per the request of the journal, do the rest of the correspondence on our behalf and he/she will act as the guarantor for the manuscript on our behalf.
All persons who have made substantial contributions to the work reported in the manuscript, but who are not contributors, are named in the Acknowledgment and have given me/us their written permission to be named. If I/we do not include an Acknowledgment that means I/we have not received substantial contributions from non-contributors and no contributor has been omitted.

เอกสารอ้างอิง

1. S. Schleimer, et al., "Winnowing: local algorithms for document fingerprinting," in Proceedings of the 2003 ACM SIGMOD international conference on Management of data,ed. New York, NY, USA: ACM, 2003, pp. 76–85

2. L. Nick, "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm," vol. 1988, pp. 285–318(2).

3. Z. Du, et al., "A Cluster-Based Plagiarism Detection Method " in Conference and Labs of the Evaluation Forum, 2010.

4. B. Gipp and N. Meuschke, "Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence," in Proceedings of the 11th ACM symposium on ocument engineering, ed. New York, NY, USA: ACM, 2011, pp. 249–258.

5. A. H. Osman, et al., "An improved plagiarism detection scheme based on semantic role labeling," Appl. Soft Comput., vol. 12, pp. 1493–1502 2012.

6. ศ. น. สิทธิโชค ปัญญาฤกษ์ชัย, "Information Retrieval System Using N-Gram Technique," presented at the The 5th National Conference on Computing and Information Technology, 2009.

7. ช. จ. อัษฎางค์ แตงไทย "การย่อความเอกสาร ภาษาไทยโดยกรรมวิธีการแยกค่าแบบเดี่ยว," presented at the National Computer Science and Engineering Conference, 2004.

8. อ. เอกวงศ์อนันต์, "การระบุคาไทยและคาทับ ศัพท์ด้วยแบบจาลองเอ็นแกรม," วิทยานิพนธ์มหาบัณฑิต ภาควิชาภาษาศาสตร์, จุฬาลงกรณ์ มหาวิทยาลัย, 2548

9. T. Chumwatana, "Using N-gram and Frequent Max Substring Techniques for Index-Term Extraction from Non-Segmented Texts: A Comparison of Two Techniques," Journal of Information Science and Technology, vol. 3, pp. 8-15, JANUARY - JUNE 2012 2012.

10. R. Pankhuenkhat, "การวิเคราะห์ประโยค ภาษาไทย (Immediate Constituents)," สารภาษาไทยและวัฒนธรรมไทย vol. June - November 2007, 2007.

Article Sidebar

Main Article Content

บทคัดย่อ

Article Details

เอกสารอ้างอิง