Computer Vision-based AI Models for Tracking Learners' Facial Expressions and Gaze Behavior in Online Education
Keywords:
Computer vision, Gaze behavior, Facial expressionAbstract
Online learning faces a significant challenge: the high dropout rate caused by limitations in observing learners’ behaviors. As a result, instructors often lack sufficient data to adapt teaching methods and effectively motivate learners. This research aimed to 1) develop computer vision-based AI models for analyzing online learners’ facial expressions, 2) to develop computer vision-based AI models for analyzing online learners’ gaze behavior, and 3) to study the potential application of the developed models in the context of online learning environments. To address the diversity of devices and computational constraints in online learning environments, a transfer learning approach was adopted to train the models. Four pre-trained models including Yolov8n, Yolov9t, Yolov10n, and Yolov11n were selected cause by their optimization for low computational resource usage, and their effectiveness was evaluated and compared. The results revealed that the Learner Emotion Detection model (LeEmo) detects academic emotions from learners’ facial expressions, performed best with Yolov9t, achieving a mAP of 78.10% and an F1-Score of 76.05%. The Lerner Eye Tracking model (LeET), developed for learners’ eye-tracking tasks, achieved its best performance with the Yolov11n, achieving a mAP of 94.51%, an F1-Score of 85.23%, and a performance of 93.45% for monitoring blink and closed eye activities. Lastly, the Learner Drowsiness Detection model (LeDro), which detects activeness or drowsiness from learners’ facial expressions, also performs best with the Yolov11n, achieving a mAP of 85.96% and an F1-Score of 82.46%. These findings demonstrate the significant potential of computer vision-based models for detecting and monitoring online learners’ behaviors, providing valuable data for instructors to enhance online learning outcomes. Furthermore, these data could be analyzed using other artificial intelligence technologies to explore various learning states of online learners, such as sustained attention levels, flow states, engagement levels, and stress levels.
References
Wichuda R. Web-based: A new alternative to Thai educational technology. J Educ Stud. 1999;27(3):29–35. Thai.
Guo YR, Goh DHL, Luyt B, Sin SCJ, Ang RP. The effectiveness and acceptance of an affective information literacy tutorial. Comput Educ. 2015;87:368–84.
Rothkrantz L. An affective distant learning model using avatars as user stand-in. Proc 18th Int Conf Comput Syst Technol [Internet]. New York, USA: ACM; 2017. p. 288–95.
Immordino-Yang MH, Damasio A. We feel, therefore we learn: The relevance of affective and social neuroscience to education. Mind Brain Educ. 2007;1(1):3–10.
Marshall L, Rowland F. A guide to learning independently. 3rd ed. Open University Press; 1998.
Young MS, Robinson S, Alberts P. Students pay attention! Combating the vigilance decrement to improve learning during lectures. Act Learn High Educ. 2009;10(1):41–55.
Ma S, Zhou T, Nie F, Ma X. Glancee: An adaptable system for instructors to grasp student learning status in synchronous online classes. Proc CHI Conf Hum Factors Comput Syst [Internet]. New York, USA: ACM; 2022 [cited 2022 May 1]. p. 1–25.
Wang HY, Sun JCY. Exploring behavioral patterns of online synchronous VR co-creation: An analysis of student engagement. Proc 2021 Int Conf Adv Learn Technol (ICALT). 2021. p. 392–4.
Kodagoda N, Gamage A, Suriyawansa K, Jayasinghe B, Rupasinghe S, Ganegoda D, et al. Innovative use of collaborative teaching in conducting a large scale online synchronous fresher’s programming course. Proc 2021 IEEE Glob Eng Educ Conf (EDUCON). 2021. p. 891–6.
Pozdniakov S, Martinez-Maldonado R, Singh S, Chen P, Richardson D, Bartindale T, et al. Question-driven learning analytics: Designing a teacher dashboard for online breakout rooms. Proc 2021 Int Conf Adv Learn Technol (ICALT). 2021. p. 176–8.
Altuwairqi K, Jarraya SK, Allinjawi A, Hammami M. A new emotion–based affective model to detect student’s engagement. J King Saud Univ - Comput Inf Sci [Internet]. 2018; Available from: http://www.sciencedirect.com/science/article/pii/S1319157818309224
Shi Y, Tong M, Long T. Investigating relationships among blended synchronous learning environments, students’ motivation, and cognitive engagement: A mixed methods study. Comput Educ. 2021;168:104193.
Raes A, Vanneste P, Pieters M, Windey I, Van Den Noortgate W, Depaepe F. Learning and instruction in the hybrid virtual classroom: An investigation of students’ engagement and the effect of quizzes. Comput Educ. 2020;143:103682.
Almpanis T, Joseph-Richard P. Lecturing from home: Exploring academics’ experiences of remote teaching during a pandemic. Int J Educ Res Open. 2022;3:100133.
Schwenck CM, Pryor JD. Student perspectives on camera usage to engage and connect in foundational education classes: It’s time to turn your cameras on. Int J Educ Res Open. 2021;2:100079.
Capone R, Lepore M. From distance learning to integrated digital learning: A fuzzy cognitive analysis focused on engagement, motivation, and participation during COVID-19 pandemic. Technol Knowl Learn. 2021;1259–89.
Hjalmarson MA. Learning to teach mathematics specialists in a synchronous online course: a self-study. J Math Teach Educ. 2017;20(3):281–301.
Alamer A, Alharbi F. Synchronous distance teaching of radiology clerkship promotes medical students’ learning and engagement. Insights Imaging. 2021;12(1):41.
Händel M, Bedenlier S, Kopp B, Gläser-Zikuda M, Kammerl R, Ziegler A. The webcam and student engagement in synchronous online learning: visually or verbally? Educ Inf Technol [Internet]. 2022 Apr 18 [cited 2022 May 1]. Available from: https://doi.org/10.1007/s10639-022-11050-3
Olt PA. Virtually there: Distant freshmen blended in classes through synchronous online education. Innov High Educ. 2018;43(5):381–95.
Sun W, Li Y, Tian F, Fan X, Wang H. How presenters perceive and react to audience flow prediction in-situ: An explorative study of live online lectures. Proc ACM Hum-Comput Interact. 2019;3(CSCW):162:1–19.
Meriem B, Benlahmar H, Naji M, Filali S, Kaiss W. Determine the level of concentration of students in real time from their facial expressions. Int J Adv Comput Sci Appl. 2022;13:159–66.
Li J, Ngai G, Va Leong H, Chan S. Multimodal human attention detection for reading. Proc 31st Annu ACM Symp Appl Comput [Internet]. New York, USA: ACM; 2016. p. 187–92.
Bailey D, Almusharraf N, Hatcher R. Finding satisfaction: Intrinsic motivation for synchronous and asynchronous communication in the online language learning context. Educ Inf Technol. 2021;26(3):2563–83.
Rabinovich T, Berthon P, Fedorenko I. Reducing the distance: Financial services education in web-extended learning environments. J Financ Serv Mark. 2017;22(3):126–31.
Raca M, Dillenbourg P. Translating head motion into attention - Towards processing of student’s body-language. Proc 8th Int Conf Educ Data Mining. Madrid, Spain; 2015. p. 320–6.
Allison N. Students’ attention in class: Patterns, perceptions of cause and a tool for measuring classroom quality of life. J Perspect Appl Acad Pract. 2020;8:58.
Sharifrazi F, Stone S. Students perception of learning online: Professor’s presence in synchronous versus asynchronous modality. Proc 5th Int Conf Comput Technol Appl [Internet]. New York, USA: ACM; 2019. p. 180–3.
Xu M, Zeng S. Chinese EFL learners’ perception of synchronous-computer-mediated communication in conducting online interactive tasks. Proc 2019 14th Int Conf Comput Sci Educ (ICCSE). 2019. p. 987–91.
McCarthy M, Kusaila M, Grasso L. Intermediate accounting and auditing: Does course delivery mode impact student performance? J Account Educ. 2019;46:26–42.
Mendoza AV, Díaz KP, Raffo FS. Perceptions of university teachers and students on the use of Blackboard Collaborate as a teaching tool during virtual learning due to the COVID-19 pandemic. Proc 2021 IEEE 1st Int Conf Adv Learn Technol Educ Res (ICALTER). 2021. p. 1–4.
Schwenck CM, Pryor JD. Student perspectives on camera usage to engage and connect in foundational education classes: It’s time to turn your cameras on. Int J Educ Res Open. 2021;2:100079.
Atapattu T, Falkner K, Thilakaratne M, Sivaneasharajah L, Jayashanka R. What do linguistic expressions tell us about learners’ confusion? A domain-independent analysis in MOOCs. IEEE Trans Learn Technol. 2020;13(4):878–88.
Pham P, Wang J. AttentiveLearner: Improving mobile MOOC learning via implicit heart rate tracking. In: Conati C, Heffernan N, Mitrovic A, Verdejo MF, editors. Artif Intell Educ. Cham: Springer Int Publ; 2015. p. 367–76.
Xiao X, Wang J. Towards attentive, bi-directional MOOC learning on mobile devices. Proc 2015 ACM Int Conf Multimodal Interact [Internet]. New York, USA: ACM; 2015. p. 163–70.
Richardson DC, Griffin NK, Zaki L, Stephenson A, Yan J, Curry T, et al. Engagement in video and audio narratives: Contrasting self-report and physiological measures. Sci Rep. 2020;10(1):11298.
Al Machot F, Ali M, Ranasinghe S, Mosa AH, Kyandoghere K. Improving subject-independent human emotion recognition using electrodermal activity sensors for active and assisted living. Proc 11th PErvasive Technol Relat Assist Environ Conf. New York, USA: ACM; 2018. p. 222–8.
Di Lascio E, Gashi S, Santini S. Unobtrusive assessment of students’ emotional engagement during lectures using electrodermal activity sensors. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018 Sep 18;2(3):103:1–21.
Hassib M, Schneegass S, Eiglsperger P, Henze N, Schmidt A, Alt F. EngageMeter: A system for implicit audience engagement sensing using electroencephalography. Proc 2017 CHI Conf Hum Factors Comput Syst [Internet]. New York, USA: ACM; 2017. p. 5114–9.
Kosmyna N, Maes P. AttentivU: An EEG-based closed-loop biofeedback system for real-time monitoring and improvement of engagement for personalized learning. Sensors. 2019;19(23):5200.
Cha S, Kim W. Concentration analysis by detecting face features of learners. Proc 2015 IEEE Pacific Rim Conf Commun Comput Signal Process (PACRIM). 2015. p. 46–51.
Dewan MAA, Lin F, Wen D, Murshed M, Uddin Z. A deep learning approach to detecting engagement of online learners. Proc 2018 IEEE SmartWorld, Ubiquitous Intell Comput, Adv Trusted Comput, Scalable Comput Commun, Cloud Big Data Comput, Internet People Smart City Innov (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). 2018. p. 1895–902.
D’Mello S, Olney A, Williams C, Hays P. Gaze tutor: A gaze-reactive intelligent tutoring system. Int J Hum-Comput Stud. 2012;70(5):377–98.
Hutt S, Mills C, Bosch N, Krasich K, Brockmole J, D’Mello S. “Out of the Fr-Eye-ing Pan”: Towards gaze-based models of attention during learning with technology in the classroom. In: Proc 25th Conf User Model Adapt Pers [Internet]. New York, USA: ACM; 2017. p. 94–103.
Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong T, et al. EmotionCues: Emotion-oriented visual summarization of classroom videos. IEEE Trans Vis Comput Graph. 2019;1–1.
Murali P, Hernandez J, McDuff D, Rowan K, Suh J, Czerwinski M. AffectiveSpotlight: Facilitating the communication of affective responses from audience members during online presentations. Proc 2021 CHI Conf Hum Factors Comput Syst [Internet]. New York, NY, USA: ACM; 2021. p. 1–13.
Ahuja K, Kim D, Xhakaj F, Varga V, Xie A, Zhang S, et al. EduSense: Practical classroom sensing at scale. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2019;3(3):71:1–26.
Bednarik R, Eivazi S, Hradis M. Gaze and conversational engagement in multiparty video conversation: An annotation scheme and classification of high and low levels of engagement. Proc 4th Workshop Eye Gaze Intell Hum Mach Interact [Internet]. New York, USA: ACM; 2012. p. 1–6.
Yang FY, Chang CY, Chien WR, Chien YT, Tseng YH. Tracking learners’ visual attention during a multimedia presentation in a real classroom. Comput Educ. 2013;62:208–20.
Ahuja K, Shah D, Pareddy S, Xhakaj F, Ogan A, Agarwal Y, et al. Classroom digital twins with instrumentation-free gaze tracking. Proc 2021 CHI Conf Hum Factors Comput Syst [Internet]. New York, USA: ACM; 2021. p. 1–9.
Teevan J, Liebling D, Paradiso A, Suarez C, Veh C, Gehring D. Displaying mobile feedback during a presentation. 2012 Sep 21.
Pekrun R, Goetz T, Titz W, Perry RP. Academic emotions in students’ self-regulated learning and achievement: A program of qualitative and quantitative research. Educ Psychol. 2002;37(2):91–105.
Smith BA, Yin Q, Feiner SK, Nayar SK. Gaze locking: Passive eye contact detection for human-object interaction. Proc 26th Annu ACM Symp User Interface Softw Technol [Internet]. New York, USA: ACM; 2013. p. 271–80.
Kosti R, Alvarez JM, Recasens A, Lapedriza A. Context based emotion recognition using EMOTIC dataset. IEEE Trans Pattern Anal Mach Intell. 2019;1–1.
Columbia Gaze Data Set | CEAL [Internet]. [cited 2024 Dec 20]. Available from: https://ceal.cs.columbia.edu/columbiagaze/#project-publications
Ghoddoosian R, Galib M, Athitsos V. A realistic dataset and baseline temporal model for early drowsiness detection [Internet]. [cited 2024 Apr 30]. Available from: http://arxiv.org/abs/1904.07312
Shaikh A, Mishra K, Kharade P, Kanojia M. Comprehensive study on emotion detection with facial expression images using YOLO models [Internet]. [cited 2024 Dec 22]. Available from: https://www.semanticscholar.org/paper/Comprehensive-Study-on-Emotion-Detection-with-Using-Shaikh-Mishra/1f17cf816c125e899599ee1fee8913336da9fb8d#citing-papers
Zheng Y, Chen S, Wu J, Chen K, Wang T, Peng T. Real-time driver fatigue detection method based on comprehensive facial features. In: Tari Z, Li K, Wu H, editors. Algorithms and architectures for parallel processing. Singapore: Springer Nature; 2024. p. 484–501.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 KKU Research Journal (Graduate Studies)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.