Goodness-of-fit Tests for Logit Models Based on Probability Levels of Response Categories

Authors

  • Veeranun Pongsapukdee Department of Statistics, Faculty of Science, Silpakorn University, Nakhon Pathom 73000, Thailand.
  • Thanittha Kumsri Siam Commercial Bank PCL, Head Office, Rutchadapisek Rd., Bangkok 10900, Thailand.

Keywords:

base rate levels, Bayes’ theorem, indexes of predictive efficiency, likelihood ratio statistic, logit models

Abstract

For the basic logit models, the response Y takes the value 1 with the success probability P1, and the value 0 with the failure probability (1-P1).  Problems arise with several proposed statistics for assessing the fit of the models and often be questioned which one of them is more preferable. In this article, 1,000 computer simulation experiments in each condition of the probabilities of Y=1(P1), the calculated parameters and X’s distributions, were generated to evaluate the performance of  various statistics, all of which were used for assessing the goodness-of-fit of the logit models. Ten statistics were computed for each combination of base rate levels and model conditions:  the likelihood ratio statistics GM,  the indexes of predictive efficiency which consist of \inline \dpi{100} \lambda _{P}, \inline \dpi{100} \tau _{P}  and \inline \dpi{100} \phi _{P}, the coefficients of determination or R2 analogs which consist of  R2C (the contingency  coefficient R2 ), R2L (the  log  likelihood  ratio R2 ), R2M  (the  geometric  mean  squared  improvement  per  observation R2 ),  R2N  (the adjusted geometric  mean  squared  improvement  R2 ),  and R2O (the  ordinary  least  squares R2 ). The correlation coefficients for determining their magnitude (absolute values) of the measures of  independence from the base rate levels, the percentages of correct classification of the model (%correct) and the type II error rates, corresponding to the percentages of power of the tests (%accept) were also computed.

The research results show  that,  for  hypothesis  testing  goodness-of-fit  of  models,  both  of  the %correct and the %accept all are satisfied. The average of %correct, when X is Exponential is around 77% and when X’s are Bernoulli and multinomial distributed, they are approximately equal to 99%. Similarly for the average of %accept which are all approximately equal to 95%.  For X~ Exponential, the R2C, R2M,  and R2O are preferable and for X~ Bernoulli  R2C, R2M, R2O  are still preferable but R2o  outperforms.  For (X1, X2)~ Multinomial, the results are similar but slightly superior to those of X~ Bernoulli. The indexes of predictive efficiency of the multinomial case, when the success probability P1 is high, suggest that the \inline \dpi{100} \lambda _{P}\inline \dpi{100} \tau _{P}  , statistics may be used as the alternatives of the R2C,R2M and R2O. Some recommendations are made for logit models with the exponential explanatory variable, the statistics R2C, R2M, R2O, \inline \dpi{100} \lambda _{P} and  \inline \dpi{100} \phi _{P} probably be interesting to use. However, when P1 is closed to 0.5 the %correct is low and the range is high. Therefore, further studies in more details for the exponential explanatory variable together with the increased sample sizes would be recommended. For the logit models with Bernoulli and multinomial explanatory variables are much improved. Then, the statistics R2C, R2M, R2O,  \inline \dpi{100} \lambda _{P} and  \inline \dpi{100} \tau _{P}  are probably appropriate, especially the R2O statistic.

Downloads

How to Cite

Pongsapukdee, V., & Kumsri, T. (2015). Goodness-of-fit Tests for Logit Models Based on Probability Levels of Response Categories. Thailand Statistician, 4, 43–61. Retrieved from https://ph02.tci-thaijo.org/index.php/thaistat/article/view/34359

Issue

Section

Articles