Comparison of Machine Learning Model Performance in Classifying the Geographical Origin of Emeralds
Main Article Content
Abstract
This research aims to 1) construct machine learning models for classifying the geographical origins of emeralds from four sources—Colombia, Zambia (Kafubu), Zambia (Musakashi), and Afghanistan; and 2) compare the performance and accuracy of these models in origin classification. The study followed the six-phase CRISP-DM data mining methodology. The dataset was divided into training and testing sets in an 80:20 ratio, and a SMOTE technique was applied to address data imbalance issues. Feature selection was performed using the One-Way ANOVA technique combined with the Pearson correlation coefficient, resulting in 11 optimal elemental features. Four supervised machine learning algorithms were employed to construct the classification models: Random Forests, Support Vector Machines, Naïve Bayes, and k-Nearest Neighbors. Model performance was evaluated using a confusion matrix. The results revealed that the Random Forests model achieved the highest classification performance with 100% accuracy, followed by the Naïve Bayes and k-Nearest Neighbors models, both attaining an accuracy of 99.08%. The Support Vector Machines model produced the lowest accuracy at 97.25%. These findings demonstrate that machine learning techniques are highly effective tools for supporting the classification of emerald origins.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
I/we certify that I/we have participated sufficiently in the intellectual content, conception and design of this work or the analysis and interpretation of the data (when applicable), as well as the writing of the manuscript, to take public responsibility for it and have agreed to have my/our name listed as a contributor. I/we believe the manuscript represents valid work. Neither this manuscript nor one with substantially similar content under my/our authorship has been published or is being considered for publication elsewhere, except as described in the covering letter. I/we certify that all the data collected during the study is presented in this manuscript and no data from the study has been or will be published separately. I/we attest that, if requested by the editors, I/we will provide the data/information or will cooperate fully in obtaining and providing the data/information on which the manuscript is based, for examination by the editors or their assignees. Financial interests, direct or indirect, that exist or may be perceived to exist for individual contributors in connection with the content of this paper have been disclosed in the cover letter. Sources of outside support of the project are named in the cover letter.
I/We hereby transfer(s), assign(s), or otherwise convey(s) all copyright ownership, including any and all rights incidental thereto, exclusively to the Journal, in the event that such work is published by the Journal. The Journal shall own the work, including 1) copyright; 2) the right to grant permission to republish the article in whole or in part, with or without fee; 3) the right to produce preprints or reprints and translate into languages other than English for sale or free distribution; and 4) the right to republish the work in a collection of articles in any other mechanical or electronic format.
We give the rights to the corresponding author to make necessary changes as per the request of the journal, do the rest of the correspondence on our behalf and he/she will act as the guarantor for the manuscript on our behalf.
All persons who have made substantial contributions to the work reported in the manuscript, but who are not contributors, are named in the Acknowledgment and have given me/us their written permission to be named. If I/we do not include an Acknowledgment that means I/we have not received substantial contributions from non-contributors and no contributor has been omitted.
References
McClure, S. F., Moses, T. M., & Shigley, J. E. (2019). The geographic origin dilemma. Gems & Gemology, 55(4), 450– 453.
Archuleta, J.-L. (2016). The color of responsibility: ethical issues and solutions in colored gemstones. Gems & Gemology, 52(2), 114–131.
Saeseaw, S., Renfro, N. D., Palke, A. C., Sun, Z., & McClure, S. F. (2019). Geographic origin determination of emerald. Gems & Gemology, 55(4), 478–507.
Wood, D. L., & Nassau, K. (1968). The characterization of beryl and emerald by visible and infrared absorption spectroscopy. American Mineralogist, 53, 777–800.
Fritsch, E., & Rossman, G. R. (1987). An update on color in gems. part I: introduction and colors caused by dispersed netal ions. Gems & Gemology, 23, 126–139.
Schwarz, D., Giuliani, G., Grundmann, G., & Glas, M. (2002). The origin of emerald...a controversial topic. extraLapis English, (2), 18–21.
Awad, M., & Khanna, R. (2015). Machine learning in action: Examples. Apress.
Blodgett, T., & Shen, A. H. (2011). Application of discriminant analysis in gemology: country-of-origin separation in colored stones and distinguishing HPHT-treated diamonds. Gems & Gemology, 47(2), 145.
Luo, Z., Yang, M., & Shen, A. H. (2015). Origin determination of dolomite-related white nephrite through IB-LDA. Gems & Gemology, 51(3), 268–278.
Homkrajae, A., Sun, Z., & Shih, S. C. (2019). Gemological and chemical characteristics of natural freshwater pearls from the Mississippi river system. Gems & Gemology, 55(2), 164–185.
Wang, H. A. O., & Krzemnicki, M. S. (2021). Multi-element analysis of minerals using laser ablation inductively coupled plasma time of flight mass spectrometry and geochemical data visualization using t-distributed stochastic neighbor embedding: case study on emeralds. Journal of Analytical Atomic Spectrometry, 36(3), 518–527.
Zhong, Y., Shen, A. H., Zhang, Z., Ye, M., & Han, Y. (2023). Application of machine learning algorithms in the geographical origin determination of peridot. Journal of Gems & Gemmology, 25(6), 65–75. (in Chinese)
Seneewong-Na-Ayutthaya, M., Wanthanachaisaeng, B., Suwanmanee, W., Lhuaamporn, T., Kamkeaw, P., Phibanchon, S., Sripoonjan, T., & Leelawatanasuk, T. (2023). An implementation of machine learning in ruby and sapphire origin determination. The 37th International Gemmological Conference (IGC 2023). Tokyo, Japan.
Krzemnicki, M. S., Wang, H. A. O., Wälle, M., Lefèvre, P., Zhou, W., & Cartier, L. E. (2024). Gemmological characterisation of emeralds from Musakashi, Zambia, and implications for their geographic origin determination. Journal of Gemmology, 39(4), 338–350.
Mariscal, G., Marbán, Ó., & Fernández, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(2), 137–166.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research (JAIR), 16, 321–357.
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2-3), 103–130.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems, 13(4), 18–28.
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437.
Alonso-Pérez, F., Grisi, L. T., Sardi, F. G., Martínez, A., de Oliveira, A. L., Lobo-Guerrero, L., Barone, G., & Raneri, S. (2024). Geographic origin determination of emeralds using machine learning algorithms and handheld LIBS data. Minerals, 14(1), 89.
มณฑิรา เสนีวงศ์ ณ อยุธยา, ทะนง ลีลาวัฒนสุข, ภูวดล วรรธนะชัยแสง, วรัตน์ชนก สุวรรณมณี, ชีวากร สุพรรณ, พิมพ์ลภัส คำแก้ว, ทัศนรา ศรีปุ้นจั่น, อนุรักษ์ บุญมาวงศ์. (2567). กระบวนการพิสูจน์แหล่งกำเนิดอัญมณี โดยใช้เอกลักษณ์ทางเคมีด้วยเทคนิค LA-ICP-MS (เบริล). ใน งานสัมมนาเผยแพร่งานวิจัยและนวัตกรรมด้านอัญมณีและเครื่องประดับ บูรณาการความร่วมมือหน่วยงานเครือข่ายสู่การนำไปใช้ประโยชน์. โรงแรมพูลแมน กรุงเทพ จี.
วศิน หาสงคราม, มงคล ทะกอง, กริช สมคันทา, และ กฤษณพงษ์ สมสุข. (2568). การหาประสิทธิผลของเทคนิคการเรียนรู้สำหรับการจำแนกปริมาณฝุ่นโดยใช้เทคนิคเหมืองข้อมูล. วารสารวิจัยและนวัตกรรมอาชีวศึกษา, 9(1), 80–88.
