Variance estimation in the presence of imputed data has been widely studied in the literature. It is well known that treating the imputed values as if they were true values could lead to serious underestimation of the true variance, especially if the response rates are low. In this paper, we consider the problem of variance estimation using a model, in the context of two-stage cluster sampling designs which are widely used in social and household surveys. In cluster sampling designs, units in the same neighborhood tend to have similar characteristics (e.g., income, education level, etc). It is thus important to take account of the intra-cluster correlation in formulating the model and then derive variance estimators under the appropriate model. In this paper, we consider weighted random hot-deck imputation and derive consistent variance estimators under two distinct frameworks: (i) the two-phase framework and (ii) the reverse framework. In the case of the two-phase framework, we use a variance estimation method proposed by Särndal (1992), whereas we use a method developed by Fay (1991) and Shao and Steel (1999) in the case of the reverse framework. Finally, we perform a simulation study to evaluate the performance of the proposed variance estimators in terms of relative bias. We conclude that the variance estimators obtained by Shao-Steel’s method are more robust to model misspecification than those derived using Särndal’s method.

Additional Metadata
Keywords Nonresponse, Random hot-deck imputation, Reverse framework, Two-phase framework, Two-stage cluster sampling, Variance estimation
Persistent URL
Journal Journal of Statistical Theory and Practice
Haziza, D. (David), & Rao, J.N.K. (2010). Variance estimation in two-stage cluster sampling under imputation for missing data. Journal of Statistical Theory and Practice, 4(4), 827–844. doi:10.1080/15598608.2010.10412021