Comparison of Machine Learning Model Performance in Classifying the Geographical Origin of Emeralds

Main Article Content

Chawalit Chankhantha

Abstract

This research aims to 1) construct machine learning models for classifying the geographical origins of emeralds from four sources—Colombia, Zambia (Kafubu), Zambia (Musakashi), and Afghanistan; and 2) compare the performance and accuracy of these models in origin classification. The study followed the six-phase CRISP-DM data mining methodology. The dataset was divided into training and testing sets in an 80:20 ratio, and a SMOTE technique was applied to address data imbalance issues. Feature selection was performed using the One-Way ANOVA technique combined with the Pearson correlation coefficient, resulting in 11 optimal elemental features. Four supervised machine learning algorithms were employed to construct the classification models: Random Forests, Support Vector Machines, Naïve Bayes, and k-Nearest Neighbors. Model performance was evaluated using a confusion matrix. The results revealed that the Random Forests model achieved the highest classification performance with 100% accuracy, followed by the Naïve Bayes and k-Nearest Neighbors models, both attaining an accuracy of 99.08%. The Support Vector Machines model produced the lowest accuracy at 97.25%. These findings demonstrate that machine learning techniques are highly effective tools for supporting the classification of emerald origins.


 

Article Details

How to Cite
[1]
C. Chankhantha, “Comparison of Machine Learning Model Performance in Classifying the Geographical Origin of Emeralds”, JIST, vol. 16, no. 1, pp. 72–84, Jun. 2026.
Section
Research Article: Soft Computing (Detail in Scope of Journal)

References

McClure, S. F., Moses, T. M., & Shigley, J. E. (2019). The geographic origin dilemma. Gems & Gemology, 55(4), 450– 453.

Archuleta, J.-L. (2016). The color of responsibility: ethical issues and solutions in colored gemstones. Gems & Gemology, 52(2), 114–131.

Saeseaw, S., Renfro, N. D., Palke, A. C., Sun, Z., & McClure, S. F. (2019). Geographic origin determination of emerald. Gems & Gemology, 55(4), 478–507.

Wood, D. L., & Nassau, K. (1968). The characterization of beryl and emerald by visible and infrared absorption spectroscopy. American Mineralogist, 53, 777–800.

Fritsch, E., & Rossman, G. R. (1987). An update on color in gems. part I: introduction and colors caused by dispersed netal ions. Gems & Gemology, 23, 126–139.

Schwarz, D., Giuliani, G., Grundmann, G., & Glas, M. (2002). The origin of emerald...a controversial topic. extraLapis English, (2), 18–21.

Awad, M., & Khanna, R. (2015). Machine learning in action: Examples. Apress.

Blodgett, T., & Shen, A. H. (2011). Application of discriminant analysis in gemology: country-of-origin separation in colored stones and distinguishing HPHT-treated diamonds. Gems & Gemology, 47(2), 145.

Luo, Z., Yang, M., & Shen, A. H. (2015). Origin determination of dolomite-related white nephrite through IB-LDA. Gems & Gemology, 51(3), 268–278.

Homkrajae, A., Sun, Z., & Shih, S. C. (2019). Gemological and chemical characteristics of natural freshwater pearls from the Mississippi river system. Gems & Gemology, 55(2), 164–185.

Wang, H. A. O., & Krzemnicki, M. S. (2021). Multi-element analysis of minerals using laser ablation inductively coupled plasma time of flight mass spectrometry and geochemical data visualization using t-distributed stochastic neighbor embedding: case study on emeralds. Journal of Analytical Atomic Spectrometry, 36(3), 518–527.

Zhong, Y., Shen, A. H., Zhang, Z., Ye, M., & Han, Y. (2023). Application of machine learning algorithms in the geographical origin determination of peridot. Journal of Gems & Gemmology, 25(6), 65–75. (in Chinese)

Seneewong-Na-Ayutthaya, M., Wanthanachaisaeng, B., Suwanmanee, W., Lhuaamporn, T., Kamkeaw, P., Phibanchon, S., Sripoonjan, T., & Leelawatanasuk, T. (2023). An implementation of machine learning in ruby and sapphire origin determination. The 37th International Gemmological Conference (IGC 2023). Tokyo, Japan.

Krzemnicki, M. S., Wang, H. A. O., Wälle, M., Lefèvre, P., Zhou, W., & Cartier, L. E. (2024). Gemmological characterisation of emeralds from Musakashi, Zambia, and implications for their geographic origin determination. Journal of Gemmology, 39(4), 338–350.

Mariscal, G., Marbán, Ó., & Fernández, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review, 25(2), 137–166.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research (JAIR), 16, 321–357.

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.

Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2-3), 103–130.

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems, 13(4), 18–28.

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437.

Alonso-Pérez, F., Grisi, L. T., Sardi, F. G., Martínez, A., de Oliveira, A. L., Lobo-Guerrero, L., Barone, G., & Raneri, S. (2024). Geographic origin determination of emeralds using machine learning algorithms and handheld LIBS data. Minerals, 14(1), 89.

มณฑิรา เสนีวงศ์ ณ อยุธยา, ทะนง ลีลาวัฒนสุข, ภูวดล วรรธนะชัยแสง, วรัตน์ชนก สุวรรณมณี, ชีวากร สุพรรณ, พิมพ์ลภัส คำแก้ว, ทัศนรา ศรีปุ้นจั่น, อนุรักษ์ บุญมาวงศ์. (2567). กระบวนการพิสูจน์แหล่งกำเนิดอัญมณี โดยใช้เอกลักษณ์ทางเคมีด้วยเทคนิค LA-ICP-MS (เบริล). ใน งานสัมมนาเผยแพร่งานวิจัยและนวัตกรรมด้านอัญมณีและเครื่องประดับ บูรณาการความร่วมมือหน่วยงานเครือข่ายสู่การนำไปใช้ประโยชน์. โรงแรมพูลแมน กรุงเทพ จี.

วศิน หาสงคราม, มงคล ทะกอง, กริช สมคันทา, และ กฤษณพงษ์ สมสุข. (2568). การหาประสิทธิผลของเทคนิคการเรียนรู้สำหรับการจำแนกปริมาณฝุ่นโดยใช้เทคนิคเหมืองข้อมูล. วารสารวิจัยและนวัตกรรมอาชีวศึกษา, 9(1), 80–88.