Goodness-of-fit Tests for Logit Models Based on Probability Levels of Response Categories
Keywords:
base rate levels, Bayes’ theorem, indexes of predictive efficiency, likelihood ratio statistic, logit modelsAbstract
For the basic logit models, the response Y takes the value 1 with the success probability P1, and the value 0 with the failure probability (1-P1). Problems arise with several proposed statistics for assessing the fit of the models and often be questioned which one of them is more preferable. In this article, 1,000 computer simulation experiments in each condition of the probabilities of Y=1(P1), the calculated parameters and X’s distributions, were generated to evaluate the performance of various statistics, all of which were used for assessing the goodness-of-fit of the logit models. Ten statistics were computed for each combination of base rate levels and model conditions: the likelihood ratio statistics GM, the indexes of predictive efficiency which consist of , and , the coefficients of determination or R2 analogs which consist of R2C (the contingency coefficient R2 ), R2L (the log likelihood ratio R2 ), R2M (the geometric mean squared improvement per observation R2 ), R2N (the adjusted geometric mean squared improvement R2 ), and R2O (the ordinary least squares R2 ). The correlation coefficients for determining their magnitude (absolute values) of the measures of independence from the base rate levels, the percentages of correct classification of the model (%correct) and the type II error rates, corresponding to the percentages of power of the tests (%accept) were also computed.
The research results show that, for hypothesis testing goodness-of-fit of models, both of the %correct and the %accept all are satisfied. The average of %correct, when X is Exponential is around 77% and when X’s are Bernoulli and multinomial distributed, they are approximately equal to 99%. Similarly for the average of %accept which are all approximately equal to 95%. For X~ Exponential, the R2C, R2M, and R2O are preferable and for X~ Bernoulli R2C, R2M, R2O are still preferable but R2o outperforms. For (X1, X2)~ Multinomial, the results are similar but slightly superior to those of X~ Bernoulli. The indexes of predictive efficiency of the multinomial case, when the success probability P1 is high, suggest that the , , statistics may be used as the alternatives of the R2C,R2M and R2O. Some recommendations are made for logit models with the exponential explanatory variable, the statistics R2C, R2M, R2O, and probably be interesting to use. However, when P1 is closed to 0.5 the %correct is low and the range is high. Therefore, further studies in more details for the exponential explanatory variable together with the increased sample sizes would be recommended. For the logit models with Bernoulli and multinomial explanatory variables are much improved. Then, the statistics R2C, R2M, R2O, and are probably appropriate, especially the R2O statistic.