Contingency-Table Sparseness under Cumulative Logit Models for Ordinal Response Categories and Nominal Explanatory Variables with Two-Factor Interaction
Keywords:
contingency table, goodness of fit, interaction effect, multinomial cumulative logit models, sparsenessAbstract
In this article the sparseness and the assessing goodness of fit of cumulative models for ordinal response categories and nominal explanatory variables with two-factor interaction are investigated. The sparseness is computed from the number of occurrence of at least one empty cell in each simulation in 1,000 simulations. The magnitude of goodness-of-fit statistics, the coefficients of determination or R2 analogs, the likelihood ratio statistic,GM, AIC (Akaike Information Criterion, [2]),and BIC (Bayesian Information Criterion, Schwarz, 1978) are calculated. The simulations have been conducted for the multinomial logit models with K=3 response categories and two random explanatory variables X1 and X2 whose joint distribution of (X1,X2) is assumed to be multinomial with probabilities 1, 2, 3 and 4, corresponding to (X1,X2) values of (0, 0), (0,1), (1, 0), (1, 1), respectively. Three sets of (1, 2, 3, 4) are studied to represent different distributional shapes, which were chosen to induce possibly strong effects such that β1=log2 ,β2=log3 and β12= 0.0-4.5, namely (X1, X2)~multinomial(0.10,0.35,0.45,0.10), (X,1X2)~ multinomial (0.50,0.30,0.10,0.10), and (X1, X2)~multinomial (0.25,0.25,0.25,0.25). Four sets of the three ordered category distributing corresponding with the (X1, X2) were again generated through the models under the proportions of (p1 , p2 , p3), namely Y~multinomial (p1 , p2 , p3): (0.05,0.20,0.75), (0.25,0.50,0.25), (0.5,0.20,0.25), and (0.33,0.33,0.33) from which it follows that the true model intercepts are , , corresponding to the proportions of Y = 1, 2, 3 respectively. Four sample sizes of 600, 800, 1,000, and 1,500 units were performed. Each condition was carried out for 1,000 repeated simulations using the developed macro program run with the Minitab Release 11 [17].
The results indicate that the minimum sparseness of contingency tables and the maximum of goodness-of-fit statistics, R2 analogs and BIC, occur for the distribution of Y~multinomial (0.05,0.20,0.75) with (X1,X2)~multinomial (0.25,0.25,0.25,0.25) as well as when each distribution of Y and (X1,X2) is equally symmetric proportions. In contrast, the maximum sparse cells occur for the distributions of Y~ multinomial (0.25,0.50,0.25) with (X1,X2)~multinomial (0.50,0.30,0.10,0.10). In addition, when (X1,X2) is (0.25,0.25,0.25,0.25), it always gives less tendency of sparseness than those when (X1,X2) are asymmetric, as the sample size become large. Moreover, the number of sparseness tends to increase as the interaction parameter, β12 increases; however, it is also relatively decreased when the sample sizes increase. Hence, for the true model with correlated structures are presented, the sparseness of the contingency tables increases as the interaction- parameter increases, and the rate of increasing will decrease as the sample sizes increase. These results indicate and confirm some association patterns in the models and the contingency tables. Therefore, when the distribution of Y is either equally symmetry or that’s in increasing ordered proportions, corresponding with those of (X1,X2) are also symmetric, the moderate to small sample sizes are possible; however, when most distributions are asymmetric we do recommend only the large sample sizes for suitable analysis of the association and sparse contingency tables.