Enhancing linear regression through neighbor-based similarity analysis

Main Article Content

Chinnawat Chetcharungkit

Abstract

When working with real-world datasets characterized by complex and non-linear relationships, the limitations of non-complex machine learning models like linear regression become evident. In response to addressing this technical problem, we propose a novel algorithm to enhance linear regression without the necessity of complex mathematical or statistical expressions. Instead, the algorithm segments data into multiple subgroups or neighbors, each with its own best-fitting line. The primary objective of this approach is to enable more accurate predictions for unseen data points by utilizing the most similar neighbors and their corresponding linear regression lines, with the support of k-nearest neighbors. Empirical evidence from three publicly available housing price datasets demonstrates the algorithm’s effectiveness in improving traditional linear regression models.

Article Details

Section
Research Articles

References

Moore, A.; Bell, M. XGBoost, A novel explainable AI technique, in the prediction of myocardial infarction: A UK Biobank cohort study. Clinical Medicine Insights: Cardiology, 2022, 16, 1-6. https://doi.org/10.1177/11795468221133611

Khan, M. S.; Salsabil, N.; Alam, M. G. R., et al. CNN-XGBoost fusion-based affective state recognition using EEG spectrogram image analysis. Scientific Reports 2002, 12, 14122. https://doi.org/10.1038/s41598-022-18257-x

Moraru, L.; Sistaninejhad, B.; Rasi, H.; Nayeri, P. A Review Paper about Deep Learning for Medical Image Analysis. Computational and Mathematical Methods in Medicine, 2023, https://doi.org/10.1155/2023/7091301

Shorten, C.; Khoshgoftaar, T. M.; Furht, B. Text Data Augmentation for Deep Learning. Journal of Big Data, 2021, 8, 101-135. https://doi.org/10.1186/s40537-021-00492-0

Dodge, J.; Prewitt, T.; Combes, R. T. D., et al. Measuring the carbon intensity of AI in cloud instances. In 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, 1877-1894. https://doi.org/10.1145/3531146.3533234

Fama, E. F.; French, K. R. The Capital Asset Pricing Model: Theory and Evidence. Journal of Economic Perspectives, 2004, 18(3), 25-46. https://doi.org/10.1257/0895330042162430

Singh, T.; Kalra, R.; Mishra, S., et al. An efficient real-time stock prediction exploiting incremental learning and deep learning. Evolving Systems, 2022, 14, 919-937. https://doi.org/10.1007/s12530-022-09481-x

Walberg, H. J.; Rasher, S. P. Improving Regression Models. Journal of Educational Statistics, 1976, 1, 253-277. https://doi.org/10.2307/1164786

Uddin, M. F.; Lee, J.; Rizvi, S., et al. Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes. Applied Sciences, 2018, 8(4). https://doi.org/10.3390/app8040646

Qiao, Z.; Wang, C. H.; Liu, J. Y. Integrating Feature Engineering with Deep Learning to Conduct Diagnostic and Predictive Analytics for Turbofan Engines. Mathematical Problems in Engineering, 2022. https://doi.org/10.1155/2022/9930176

Deisenroth, M. P.; Faisal, A. A.; Ong, C. S. Mathematics for Machine Learning. Cambridge, UK: Cambridge University Press. 2020.

Arashi, M.; Roozbeh, M.; Hamzah, N. A., et al. Ridge regression and its applications in genetic studies. PLoS ONE, 2021, 16. https://doi.org/10.1371/journal.pone.0245376

Michel, V.; Gramfort, A.; Varoquaux, G., et al. Total Variation Regularization Enhances Regression-Based Brain Activity Prediction. In First Workshop on Brain Decoding: Pattern Recognition Challenges in Neuroimaging, 2021, 9-12. https://doi.org/10.1109/WBD.2010.13

Samavat, A.; Khalili, E.; Ayati, B., et al. Deep Learning Model with Adaptive Regularization for EEG-Based Emotion Recognition Using Temporal and Frequency Features. IEEE Access, 2022, 10, 24520-24527. https://doi.org/10.1109/ACCESS.2022.3155647

Marsland, S. Machine Learning: An Algorithmic Perspective, 2nd ed. New York: Taylor & Francis Group, 2014, 158-160.

Dunn, P. K.; Smyth, G. K. Generalized linear models with examples in R. New York: Springer, 2018. 211-233.

Scikit-learn. (n.d.). Scikit-learn: Machine Learning in Python. Available: https://scikit-learn.org/stable/ [Accessed: 15 Jul 2023].

Kaggle. (n.d.). House Prices: Advanced Regression Techniques. Available: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data [Accessed: 1 Jul 2023].

Kaggle. (n.d.). California Housing Prices. Available: https://www.kaggle.com/datasets/camnugent/california-housing-prices [Accessed: 1 Jul 2023].

Kaggle. (n.d.). Boston House Prices. Available: https://www.kaggle.com/datasets/vikrishnan/boston-house-prices [Accessed: 1 Jul 2023].

Wang, H. Q., & Liang, L. Q. (2022). How Do Housing Prices Affect Residents’ Health? New Evidence From China. Frontiers in Public Health, 9, 816372. https://doi.org/10.3389/fpubh.2021.816372

Mukhlishin, M. F.; Saputra, R.; Wibowo, A., et al. Predicting house sale price using fuzzy logic, Artificial Neural Network and K-Nearest Neighbour. In Proceedings of the 2017 1st International Conference on Informatics and Computational Sciences, 2017, 171-176. https://doi.org/10.1109/ICICOS.2017.8276357

Phan, T. D. Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia. In Proceedings of the 2018 International Conference on Machine Learning and Data Engineering, 2018. 35-42. https://doi.org/10.3390/land11112100

Truong, Q.; Nguyen, M.; Dang, H., et al. Housing Price Prediction via Improved Machine Learning Techniques. Procedia Computer Science, 2020. 433-442. https://doi.org/10.1016/j.procs.2020.06.111

Gupta, P.; Zhang, Q. Housing Price Prediction Based on Multiple Linear Regression. Scientific Programming, 2021. https://doi.org/10.1155/2021/7678931

Tanamal, R.; Minoque, N.; Wiradinata, T., et al. House Price Prediction Model Using Random Forest in Surabaya City. TEM Journal, 2023. 12, 126-132. https://doi.org/10.18421/TEM121-17

Goodfellow, I. J.; Bengio, Y.; Courville, A. Machine Learning Basics. In Deep Learning. Cambridge: MIT Press, 2016. 96-146.