Comparative Study on Outliers-Detection Procedures in Binary Logistic Regression Model

Authors

  • Ali H. Abuzaid Department of Mathematics, Al Azhar University-Gaza, Palestine
  • Nae'l A. Alghalban Department of Applied Statistics, Al Azhar University-Gaza, Palestine

Keywords:

Deletion raw, deviance, influential points, residuals

Abstract

The analysis of logistic regression is subjected to the existence of outliers, which affect the accuracy of model prediction. This article compares the performance of six outliers-detection methods in logistic regression model based on simulation study, by considering different sample size, number of covariates, contamination rate. The results show that the power of performance has an inverse relationship with the contamination rate as well as the number of covariates. Moreover, the performance is almost stable for large sample size. The DFFIT, and Cook’s distance methods outperform other methods, while the hat value method is the weakest. For illustration purpose, a real data set of 30 patients with leukemia were modeled by logistic regression, and the six detection methods were implemented to detect possible outliers, the analysis results showed an agreement with the findings of the simulation study.

References

Abuzaid AH, Ahmed HIS. On outliers detection in circular logistic regression. J Appl Prob Stat. 2021; 16(1): 95-110.

Ahmad S, Ramli NM, Midi H. Outlier detection in logistic regression and its application in medical data analysis. In: CHUSER2012: Colloquium on Humanities, Science and Engineering (CHUSER); 2012 Dec 3-4; Malaysia. New Jersey: IEEE; 2012. p. 503-507.

Belsley DA, Kuh E, Welsch RE. Regression diagnostics: identifying influential data and sources of collinearity. New York: John Wiley & Sons; 1980.

Cook RD. Detection of influential observations in linear regression. Technometrics. 1977; 19(1): 15-18.

Cook RD, Weisberg S. Residual and influence in regression. New York: Chapman and Hall; 1982.

Copas JB. Binary regression models for contaminated data. J Roy Stat Soc B. 1988; 50(2): 225-265.

Cox DR. The regression analysis of binary sequences. J Roy Stat Soc B. 1958; 20(2): 215-242.

Feigl P, Zelen M. Estimation of exponential survival probabilities with concomitant information. Biometrics. 1965; 21(4): 826-838.

Hadi AS. A new measure of overall potential influence in linear regression. Comput Stat Data Anal. 1992; 14(1): 1-27.

Heijnen BJ, Böhringer S, Speyer R. Prediction of aspiration in dysphagia using logistic regression: oral intake and self-evaluation. Eur Arch Otorhinolaryngol. 2020; 277(1): 197-205.

Imon AHMR, Hadi AS. Identification of multiple outliers in logistic regression. Commun Stat - Theory Methods. 2008; 37(11): 1697-1709.

Imon AHMR. Identifying multiple influential observations in linear regression, J Appl Stat. 2005; 32(9): 929-946.

McCullagh P, Nelder JA. Generalized linear models. London: Chapman and Hall; 1989.

Menard S. Logistic regression: from introductory to advanced concepts and applications. Los Angeles: SAGE; 2010.

Nurunnabi AAM, Dai H. Robust-diagnostic regression: a prelude for inducing reliable knowledge from regression. In: Dai H, Liu JNK, Smirnov E, editors. Reliable Knowledge Discovery. New York: Springer; 2012. p. 69-92.

Pregibon D. Logistic regression diagnostics. Ann Stat. 1981; 9(4): 705-724.

Rousseeuw PJ, Christmann A Robustness against separation and outliers in logistic regression. Comput Stat Data Anal. 2003; 43(3): 315-332.

Ryan TP. Modern regression methods. New York: John Wiley & Sons; 1997.

Sarkar SK, Midi H, Rana S. Detection of outliers and influential observations in binary logistic regression: an empirical study. J Appl Sci. 2011; 11(1): 26-35.

Sun D, Wen H, Zhang Y, Xue M. An optimal sample selection-based logistic regression model of slope physical resistance against rainfall-induced landslide. Nat Hazards. 2021; 105(1): 1255-1279.

Twis MK. Predicting different types of victim-trafficker relationships: a multinomial logistic regression analysis. J Hum Traffick. 2020; 6(4): 450-466.

Downloads

Published

2023-12-28

How to Cite

H. Abuzaid, A. ., & A. Alghalban, N. . (2023). Comparative Study on Outliers-Detection Procedures in Binary Logistic Regression Model . Thailand Statistician, 22(1), 180–191. retrieved from https://ph02.tci-thaijo.org/index.php/thaistat/article/view/252231

Issue

Section

Articles