AN ALTERNATIVE DISTRIBUTION OF INTERNAL STUDENTIZED RESIDUAL AND IDENTIFICATION OF OUTLIERS THROUGH EVIDENCE PLOTS
Main Article Content
Abstract
This paper proposed the exact distribution of internal studentized residual which used to evaluate the outliers in X and Y space in linear multiple regression analysis. The authors explored the relationship between the internal studentized residual in terms of two independent t-ratio, F-ratio’s and they show the derived density function of the residual in terms of Gauss hyper-geometric function. Moreover, the new form of the distribution is symmetric, first two moments of the distribution are derived and the authors computed the critical points of internal studentized residual at 5% and 1% significance level for different sample sizes and varying number of predictors. Evidence plots were also proposed to evaluate the exact position and location of the outliers. Finally, the numerical example shows the results extracted from the proposed approaches are more scientific, systematic in identifying the outliers in both spaces(X and Y) and its exactness gives more insights than the traditional Weisberg test.
Article Details
References
Ali S. Hadi (1992), A new measure of overall potential influence in linear regression, Computational Statistics & Data Analysis, 14(1), 1-27
Behnken, D. W., and Draper, N. R. (1972), "Residuals and Their Variance Patterns," Technometrics, 14, 101-111.
Beckman, R.J., Trussell, H.J., 1974. “The distribution of an arbitrary studentized residual and effects of updating in multiple regression”. J. Amer. Statist. Assoc. 69, 199–201.
Belsley, D.A., Kuh, E. and Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, New York.
Cook, R.D. (1977). Detection of influential observation in linear regression. Technometrics, 19, 15-18.
Chatterjee, S. and Hadi, A. S.(1988), Sensitivity Analysis in Linear Regression, New York: John Wiley and Sons.
Cook, R. D., & Weisberg, S. (1982) Residuals and influence in regression (Vol. 5). New York: Chapman and Hall.
Davies, R. B. and Hutton, B., (1975). The effects of errors in the independent variables in linear regression. Biometrika, 62, 383-391.
Dı́az-Garcı́a, J. A., & González-Farı́as (2004), G. A note on the Cook's distance. Journal of statistical planning and inference, 120(1), 119-136.
Ellenberg, J.H., 1973. “The joint distribution of the standardized least squares residual from general linear regression”. J. Amer. Statist. Assoc. 68, 941–943.
Eubank, R.L(1985), Diagnostics for smoothing splines. J. Roy. Statist. Soc. Ser. B 47, 332–341.
Hoaglin, D.C,and Welsch, R.E. (1978).The Hat matrix in regression and ANOVA. The Amer. Statist., 32, 17-22.
Huber, P. J., (1975). Robustness and Designs. A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam.
José A. Díaz-García, Ramón Gutiérrez-Jáimez , 2007. “The distribution of residuals from a general elliptical linear model”. Journal of Statistical Planning and Inference, Volume 137, Issue 7, Pages 2347-2354
Kendall, M.G., Stuart, A. (1973) The Advanced Theory of Statistics. Volume 2: Inference and Relationship, Griffin
Kim, C.(1996), Cook’s distance in spline smoothing. Statist. Probab. Lett. 31, 139–144.
Kim, C., Kim, W.(1998), Some diagnostics results in nonparametric density estimation. Comm. Statist. Theory Methods 27, 291–303.
Kim, C., Lee, Y., Park, B.U.(2001), Cook’s distance in local polynomial regression. Statist. Probab. Lett. 54, 33–40.
Lund, R. E., (1975). Tables for an approximate test for outliers in linear models. Technometrics, 17, 473-476.
Silverman, B.W.(1985), Some aspects of the spline smoothing approach to non-parametric regression curve 6tting (with discussion). J. Roy. Statist. Soc. Ser. B 47, 1–52.
Thomas, W (1991), Influence diagnostics for the cross-validated smoothing parameter in spline smoothing. J. Amer. Statist. Assoc. 86, 693–698.
Weisberg, s. (1980). Applied linear regression. New York: Wiley.