The Performance Comparison of Mean Estimation by Imputation with Auxiliary Variable under Various Missing Data Mechanisms
Main Article Content
Abstract
The objective of this research is to compare the performance of mean estimation when data is missing. Mean estimators are derived from imputation methods, including the mean method, factor method, and exponential method. Under missing data mechanisms, there are 3 types, including missing completely at random, missing at random, and missing not at random. This research uses the small dust particles from the Pollution Control Department's sources to study the efficiency of mean estimation using the imputation method for each situation. The criterion used to compare efficiency is the mean square error. The sample sizes were set to be 30, 100, and 500 at missing data levels of 10%, 20%, and 30%. The results showed that under all types of missing data mechanisms, the efficiency of mean estimation from the imputation method that uses information from an auxiliary variable together with the study variable as the factor method and the exponent method was better than the mean method that uses the study variable only. Moreover, the sample size and the missing data levels affect the performance of the method. As the sample size increases, the efficiency of the method increases. But if the missing data levels increase, the efficiency of the method decreases.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
ลิขสิทธ์ ของมหาวิทยาลัยเทคโนโลยีราชมงคลพระนครReferences
R. Little and D. Rubin, Statistical analysis with missing data. 3nd ed. New York: Wiley, 2019.
M. N. Norazian et al., “Estimation of missing values in air pollution data using single imputation techniques,” ScienceAsia, vol. 34, pp. 341-345, 2008.
N. A. Zainuri, A. A. Jemain and N. Muda, “A Comparison of Various Imputation Methods for Missing Values in Air Quality Data,” Sains Malaysiana, vol. 44, no. 3, pp. 449–456, 2015.
K. Chodjuntug and N. Lawson, “Imputation for estimating the population mean in the presence of nonresponse, with application to fine particle density in Bangkok,” Mathematical Population Studies, vol. 29, no. 4, pp. 204-225, 2022a.
K. Chodjuntug and N. Lawson, “A chain regression exponential type imputation method for mean estimation in the presence of missing data,” Songklanakarin Journal of Science & Technology, vol. 44, no. 4, pp. 1109-1118, 2022b.
H. Lee, E. Rancourt and C. E. Sarndall, “Experiments with variance estimation from survey data with imputed values.” Journal of Official Statistics, vol. 10, no. 3, pp. 231–243, 1994.
W. G. Cochran, “The estimation of yield of cereal experiments by sampling for the ratio of gain to total produce.” Journal of Agricultural Science, vol. 30, pp. 262-275, 1940.
S. Singh and S. Horn, “Compromised imputation in survey sampling.” Metrika, vol. 51, pp. 267-276, 2000.
D. Shukla and N. Thakur, “Estimation of mean with imputation of missing data using factor type estimator.” Statistics in Transition. vol. 9, pp. 33-48, 2008.
V. K. Singh and D. Shukla, “One parameter family of factor type ratio estimator,” Metron, vol. 45, pp. 273–283, 1987.
D. Shukla, N. Singh and T. S. Pathak, “Some new aspects on imputation in sampling,” African Journal of Mathematics and Computer Science Research, vol. 6, no. 1, pp. 5-15, 2013.
A. K. Singh, P. Singh, and V. Singh, “Estimation of mean with imputation of missing data using exponential-type estimators,” Journal of Statistics Applications and Probability Letters, vol. 9, no. 2, pp. 71-80, 2022.
K. Chodjuntug. Biostatistics, 1st ed. Ubon Ratchathani: Ubon Ratchathani University Printing House, 2024.
S. Salao, N. Potatong, P. Kammoongkun and P. Choomanee, “Chemical composition analysis for source identification of PM2.5 in Muang District of Ubon Ratchathani Province,” Burapha science Journal, vol. 26, no. 1, pp. 438–453, 2021.
S.K. Yadav, G.K. Vishwakarma and D.K. Sharm, “A computational strategy for estimation of mean using optimal imputation in presence of missing observation.” Scientific Reports, vol.14, no. 6433, pp1-13, 2024.
T. Dachochaiporn, and K. Chodjuntug, “Estimating the Mean of PM2.5 with Missing Data in the Area Around Electricity Generating Authority of Thailand Using the Improved Compromised Imputation Method.” Current Applied Science and Technology, vol.23, no. 5, pp 1-14, 2023.
S. Bahl and R. K. Tuteja, “Ratio and product type exponential estimator,” Journal of the Statistics Applications and Probability, vol. 12, no. 1, pp. 159-163, 1991.
A. K. Singh, P. Singh, and V. K. Singh, “Exponential-Type Compromised Imputation in Survey Sampling,” Journal of the Statistics Applications and Probability, vol. 3, no. 2, pp. 211-217, 2014.
V. Dubey and H. K. Sharma, “On estimating population variance using auxiliary information,” Statistics in Transition, vol. 9, no. 1, pp. 7-18, 2008.
A. Audu et al, “On the efficiency of imputation estimators using auxiliary attribute,” Continental Journal Applied Sciences, vol. 15, no. 1, pp. 1-13, 2020.
K. Srihaset, S. Kanjanawasee and D. Srisukho, “A comparison of the quality of missing data treatment methods for examinees’ ability parameters estimation,” Journal of Research Methodology, vol. 26, no. 2, pp. 169-187, 2013.
M. Rueda and S. Gonzalez, “A new ratio-type imputation with random disturbance,” Applied Mathematics Letters, vol. 21, pp. 978–982, 2008.
G. N. Singh and S. Suman, “Estimation of population mean using imputation methods for missing data under two‑phase sampling design,” Journal of Statistical Theory and Practice, vol. 13, no. 19, pp. 1-24, 2019.