Goodness-of-fit Tests for Logit Models Based on Probability Levels of Response Categories

Veeranun Pongsapukdee; Thanittha Kumsri

Authors

Veeranun Pongsapukdee Department of Statistics, Faculty of Science, Silpakorn University, Nakhon Pathom 73000, Thailand.
Thanittha Kumsri Siam Commercial Bank PCL, Head Office, Rutchadapisek Rd., Bangkok 10900, Thailand.

Keywords:

base rate levels, Bayes’ theorem, indexes of predictive efficiency, likelihood ratio statistic, logit models

Abstract

For the basic logit models, the response Y takes the value 1 with the success probability P₁, and the value 0 with the failure probability (1-P₁). Problems arise with several proposed statistics for assessing the fit of the models and often be questioned which one of them is more preferable. In this article, 1,000 computer simulation experiments in each condition of the probabilities of Y=1(P₁), the calculated parameters and X’s distributions, were generated to evaluate the performance of various statistics, all of which were used for assessing the goodness-of-fit of the logit models. Ten statistics were computed for each combination of base rate levels and model conditions: the likelihood ratio statistics G_M, the indexes of predictive efficiency which consist of $\inline \dpi{100} \lambda _{P}$ , $\inline \dpi{100} \tau _{P}$ and $\inline \dpi{100} \phi _{P}$ , the coefficients of determination or R² analogs which consist of R²_C (the contingency coefficient R²), R²_L (the log likelihood ratio R²), R²_M (the geometric mean squared improvement per observation R² ), R²_N (the adjusted geometric mean squared improvement R² ), and R²_O (the ordinary least squares R²). The correlation coefficients for determining their magnitude (absolute values) of the measures of independence from the base rate levels, the percentages of correct classification of the model (%correct) and the type II error rates, corresponding to the percentages of power of the tests (%accept) were also computed.

The research results show that, for hypothesis testing goodness-of-fit of models, both of the %correct and the %accept all are satisfied. The average of %correct, when X is Exponential is around 77% and when X’s are Bernoulli and multinomial distributed, they are approximately equal to 99%. Similarly for the average of %accept which are all approximately equal to 95%. For X~ Exponential, the R²_C, R²_M, and R²_O are preferable and for X~ Bernoulli R²_C, R²_M, R²_O are still preferable but R²_o outperforms. For (X₁, X₂)~ Multinomial, the results are similar but slightly superior to those of X~ Bernoulli. The indexes of predictive efficiency of the multinomial case, when the success probability P₁ is high, suggest that the $\inline \dpi{100} \lambda _{P}$ , $\inline \dpi{100} \tau _{P}$ , statistics may be used as the alternatives of the R²_C,R²_M and R²_O. Some recommendations are made for logit models with the exponential explanatory variable, the statistics R²_C, R²_M, R²_O, $\inline \dpi{100} \lambda _{P}$ and $\inline \dpi{100} \phi _{P}$ probably be interesting to use. However, when P₁ is closed to 0.5 the %correct is low and the range is high. Therefore, further studies in more details for the exponential explanatory variable together with the increased sample sizes would be recommended. For the logit models with Bernoulli and multinomial explanatory variables are much improved. Then, the statistics R²_C, R²_M, R²_O, $\inline \dpi{100} \lambda _{P}$ and $\inline \dpi{100} \tau _{P}$ are probably appropriate, especially the R²_Ostatistic.

Goodness-of-fit Tests for Logit Models Based on Probability Levels of Response Categories

Authors

Keywords:

Abstract

Downloads

How to Cite

Issue

Section

Make a Submission

Information

logo

ThaiES

visitor