Predicting E-Commerce Purchase Intention Using Machine Learning
Main Article Content
Abstract
The fast-evolving digital commerce environment demands precise predictions of consumer buying intentions to develop personalized experiences and boost conversion rates and user satisfaction on e-commerce platforms. The field has extensively utilized traditional statistical models in conjunction with behavioral theories; however, these methods fail to adapt to high-dimensional, imbalanced session-level data. This learning proposes a machine learning-based approach to predict online purchase intention using the widely recognized Online Shoppers Intention dataset. The methodology involves a reproducible pipeline that integrates data preprocessing, the synthetic minority over-sampling technique (SMOTE) to address class imbalance, chi–square–based feature selection, and a comparative evaluation of multiple classification models. The pipeline was tested on a 70:30 train-test split using stratified sampling to maintain class distribution, and further validated through 10-fold cross-validation to enhance robustness. The Support Vector Classifier (SVC) was found to be the best model in terms of both ROC-AUC and F1 score, achieving an ROC-AUC of 0.886 and an F1 Score of 0.633, thereby efficiently discriminating between purchase and non-purchase sessions. We also explore Random Forest, Decision Tree, and Ridge Classifier models to support a more holistic understanding of performance across a variety of complexity and interpretability levels. Importantly, the research also uncovers important behavioral predictors, including product-page engagement and whether the visitor is a returning one, providing interpretable insights that are consistent with real-world e-commerce practices. These results suggest the potential for implementing machine learning models for real-time behavior forecasting in online retail contexts and show that a data-driven pipeline can add value to traditional behavioral modeling counterparts.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Li, Q.; Zhao, C.; Cheng, R. How the Characteristics of Live-Streaming Environment Affect Consumer Purchase Intention: The Mediating Role of Presence and Perceived Trust. IEEE Access 2023, 11, 123977–123987. https://doi.org/10.1109/ACCESS.2023.3330324
Mensah, I. K. The Factors Driving the Consumer Purchasing Intentions in Social Commerce. IEEE Access 2022, 10, 132332–132344. https://doi.org/10.1109/ACCESS.2022.3230629
Kumar, V.; Preeti; Saheb, S. S.; Kumari, S.; Pathak, K.; Chandel, J. K.; Varshney, N.; Kumar, A. A PLS-SEM Based Approach: Analyzing Generation Z Purchase Intention through Facebook’s Big Data. Big Data Min. Anal. 2023, 6(4), 491–503. https://doi.org/10.26599/BDMA.2022.9020033
Lee, H. Interest-Based E-Commerce and Users’ Purchase Intention on Social Network Platforms. IEEE Access 2024, 12, 87451–87462. https://doi.org/10.1109/ACCESS.2024.3417440
Ye, L.; Zhang, H.; Fei, Z. The Impact of Sales Promotion on the C2C Online Purchasing Behavior: An Empirical Study. In Proceedings of the 2010 International Conference on E-Business and E-Government; IEEE: Guangzhou, China, 2010; pp 2261–2264. https://doi.org/10.1109/ICEE.2010.571
Nikulin, V. On the Method for Data Streams Aggregation to Predict Shoppers Loyalty. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 2015; pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280493
Liao, J.; Jantan, A.; Ruan, Y.; Zhou, C. Multi-behavior RFM model based on improved SOM neural network algorithm for customer segmentation. IEEE Access 2022, 10, 122501–122514. https://doi.org/10.1109/ACCESS.2022.3223361
Kim, K.; Jo, M.; Ra, I.; Park, S. RFMVDA: An enhanced deep learning approach for customer behavior classification in e-commerce environments. IEEE Access 2025, 13, 12527–12539. https://doi.org/10.1109/ACCESS.2025.3529023
Lin, Y.-T. J.; Chang, C.-Y.; Cheng, S.-Y.; Lin, M.-Y. T. Probabilistic Customer Purchase Evolution Graph. IEEE Access 2023, 11, 32962–32976. https://doi.org/10.1109/ACCESS.2023.3263729
Chang, S. E.; Yu, C. Exploring gamification for live-streaming shopping—Influence of reward, competition, presence and immersion on purchase intention. IEEE Access 2023, 11, 57503–57515. https://doi.org/10.1109/ACCESS.2023.3284033
Bhati, N. S.; Vijayvargy, L.; Pandey, A. Role of e-service quality (E-SQ) on customers’ online buying intention: An extended theory of planned behavior. IEEE Access 2022, 10, 77337–77349. https://doi.org/10.1109/ACCESS.2022.3190637
Dhanushkodi, K.; Bala, A.; Kodipyaka, N.; Shreyas, V. Customer behavior analysis and predictive modeling in supermarket retail: A comprehensive data mining approach. IEEE Access 2025, 13, 2945–2959. https://doi.org/10.1109/ACCESS.2024.3407151
Ramanathan, U.; Williams, N. L.; Zhang, M.; Sa-nguanjin, P.; Garza-Reyes, J. A.; Borges, L. A. A new perspective of e-trust in the era of social media: Insights from customer satisfaction data. IEEE Trans. Eng. Manag. 2022, 69(4), 1417–1429. https://doi.org/10.1109/TEM.2020.2985379
Tiffany, P.; Pinem, A. A.; Hidayanto, A. N.; Kurnia, S. Gain-loss framing: Comparing the push notification message to increase purchase intention in e-marketplace mobile application. IEEE Access 2020, 8, 182550–182562. https://doi.org/10.1109/ACCESS.2020.3029112
Liu, X.; Zhou, B.; Du, R.; Qi, W.; Li, Z.; Wang, J. On evolutionary analysis of customer purchasing behavior by the supervision of e-commerce platforms. IEEE Trans. Comput. Soc. Syst. 2025, 12(1), 38–51. https://doi.org/10.1109/TCSS.2024.3485959
Lemaître, G.; Nogueira, F.; Aridas, C. K. Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 2017, 18(17), 1–5. http://jmlr.org/papers/v18/16-365.html
Kumar, A. Online Shoppers Purchasing Intention Dataset. Kaggle, 2022. https://www.kaggle.com/datasets/imakash3011/online-shoppers-purchasing-intention-dataset (accessed 2025-12-05).