Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
Keywords:Missing data, imputation, equivalent weight, bias, mean squared error
The purpose of this research is to compare the efficiency of different imputation methods for multiple regression analysis of heteroscedastic data with missing at random dependent variable. The missing data imputation methods used in this study are mean imputation, hot deck imputation, knearest neighbors imputation (KNN), stochastic regression imputation, along with three proposed composite methods, namely hot deck and KNN imputation with equivalent weight (HKEW), hot deck and stochastic regression imputation with equivalent weight (HSEW), and mean and stochastic regression imputation with equivalent weight (MSEW). The comparison between the seven methods was conducted through the simulation study varied by the sample sizes and the missing percentages. The criteria for comparing the efficiency of estimators are bias and mean squared error (MSE). The results show that the stochastic regression imputation performed well in terms of bias in all situations. In terms of MSE, the mean imputation performed well when the sample size is small to medium, whereas the MSEW imputation performed well when the sample size is large and the missing percentage is high (30-40%).