An Automatic OMR System Using Digital Image Processing with Hybrid K-Means and K-Nearest Neighbor Classification
Keywords:
Digital Image Processing, Hybrid Classification, Optical Mark Recognition, Feature Extraction, Result Analysis, Confusion Matrix, K-Means, K-Nearest NeighborAbstract
This research focused on developing an automated Optical Mark Recognition (OMR) system to overcome the issues of delay and errors associated with manual checking. We utilized Digital Image Processing (DIP) techniques combined with a Hybrid Classification approach employing K-Means and K-Nearest Neighbor (K-NN) algorithms to enhance accuracy and reduce data noise. The prototype processed 15,000 samples from 100 answer sheets (150 marks per sheet), divided into 80% training and 20% testing sets. The methodology included image preprocessing, segmentation, and feature extraction, focusing on Mean Intensity, Variance, and Pixel Density, to classify marks as "Filled" or "Empty." Comparative results across four methods (K-NN, SVM, Decision Tree, and
K-Means + K-NN) showed that the latter technique yielded the highest average Accuracy from multiple repeated trials at 99.30% (SD = 0.23). Furthermore, evaluation using a Confusion Matrix on the 3,000 test samples confirmed a high Accuracy of 99.30%, Precision of 99.11%, Recall of 98.56%, and an F1-score of 98.83%. These findings confirmed that integrating K-Means with K-NN significantly improved the system's correctness and stability, demonstrating its potential for real-world deployment without reliance on specialized paper or high-quality scanners.
References
E.M. de Elias, P. M. Tasinafo, and R. Hirata Jr, “Optical mark recognition: Advances, difficulties, and limitations,” SN Computer Science, vol. 2, Jul. 2021, Art. no. 367, doi: 10.1007/s42979-021-00760-z.
T. V. Ha and N. T. Thu, “An application of image processing in optical mark recognition,” Vietnam Journal of Agricultural Sciences, vol. 3, no. 4, pp. 864–871, Dec. 2020, doi: 10.31817/vjas.2020.3.4.09.
Z. Küçükkara and A. E. Tümer, “An image processing oriented optical mark recognition and evaluation system,” International Journal of Applied Mathematics, Electronics and Computers (IJAMEC), vol. 6, no. 4, pp. 59–64, Dec. 2018. [Online]. Available: https://ijamec.org/index.php/ijamec/article/view/261
S. Ravindra, “Automated OMR analyser using ML and image processing,” International Journal for Research in Applied Science and Engineering Technology, vol. 13, no. 6, pp. 1843–1845, Jun. 2025,doi: 10.22214/ijraset.2025.72535.
B. Lu, “Research and application of digit recognition based on K-nearest neighbor classifier,” in Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023), Xi’an, China, Nov. 24–26, 2024, pp. 36–42, doi: 10.2991/978-94-6463-370-2_5.
Y. Hernández-Mier, M. A. Nuño-Maganda, S. Polanco-Martagón, G. Acosta-Villarreal, and R. Posada-Gómez, “Unsupervised optical mark recognition on answer sheets for massive printed multiple choice tests,” Journal of Imaging, vol. 11, no. 9, 2025, Art. no. 308, doi: 10.3390/jimaging11090308.
P. Surlakar, S. Araujo, and K. Meenakshi Sundaram, “Comparative analysis of K-Means and K-Nearest neighbor image segmentation techniques,” in Proc. 2016 IEEE 6th International Conference on Advanced Computing (IACC), Bhimavaram, India, Feb. 27–28, 2016, pp. 96–100, doi: 10.1109/IACC.2016.27.
J. Memon, M. Sami, R. A. Khan, and M. Uddin, “Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR),” IEEE Access, vol. 8, pp. 142642–142668, 2020, doi: 10.1109/ACCESS.2020.3012542.
Y. Sancar, U. Yavuz, and I. K. Aksakalli, “Personal mark density-based high-performance Optical Mark Recognition (OMR) system using K-means clustering algorithm,” Multimedia Tools and Applications, vol. 84, pp. 24671–24703, Sep. 2024, doi: 10.1007/s11042-024-20218-7.
S. Singh, N. k. Garg, and M. Kumar “Feature extraction and classification techniques for handwritten character recognition: A survey,” Multimedia Tools and Applications, vol. 82, pp. 745–775, Jun. 2022, doi: 10.1007/s11042-022-13318-9.
R. C. Gonzalez and R. E. Woods, Digital Image Processing, 4th ed. NY, USA: Pearson, 2018.
N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst. Man Cybern., vol. 9, no. 1, pp. 62–66, 1979. [Online]. Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4310076
E. O. Gyamfi and Y. M. Missah, “Pixel-based unsupervised classification approach for information detection on optical markup recognition sheet,” Advances in Science, Technology and Engineering Systems Journal (ASTESJ), vol. 2, no. 4, pp. 121–132, 2017, doi: 10.25046/aj020417.
T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967, doi: 10.1109/TIT.1967.1053964.
G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN model-based approach in classification,” in On the Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, R. Meersman, Z. Tari, and D. C. Schmidt, Eds., 2003, Heidelberg, Berlin, Germany: Springer, doi: 10.1007/978-3-540-39964-3_62.
M. L. Zhang and Z. H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern Recognition and Artificial Intelligence., vol. 40, no. 7, pp. 2038–2048, 2007, doi: 10.1016/j.patcog.2006.12.019.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Math. Stat. Probab., 1967, pp. 281–297. [Online]. Available: https://scispace.com/pdf/some-methods-for-classification-and-analysis-of-multivariate-4pswti19oz.pdf
S. P. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 129–137, Mar. 1982, doi: 10.1109/TIT.1982.1056489.
X. Wu et al., “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2008, doi: 10.1007/s10115-007-0114-2.
R. O. Duda and P. E. Hart, “Use of the Hough transformation to detect lines and curves in pictures,” Commun. ACM, vol. 15, no. 1, pp. 11–15, Jan. 1972, doi: 10.1145/361237.361242.
S. Suzuki and K. Abe, “Topological structural analysis of digitized binary images by border following,” Computer Vision Graphics and Image Processing, vol. 30, no. 1, pp. 32–46, Apr. 1985, doi: 10.1016/0734-189X(85)90016-7.
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2026 NKRAFA JOURNAL OF SCIENCE AND TECHNOLOGY

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
- Content and information in articles published in NKRAFA Journal of Science and Technology are comment and responsibility of authors of articles directly. Journal editorial do no need to agree or share any responsibility.
- NKRAFA Journal of Science and Technology Articles holds the copyright of the content, pictures, images etc. which published in it. If any person or agency require to reuse all or some part of articles, the permission must be obtained from the NKRAFA Journal of Science and Technology.