TREATMENTS OF MISSING DATA
Keywords:missing data analysis, multiple imputation, categorical variables, numerical variables
AbstractThis paper presented a critical review of the most commonly used treatments of missing data: traditional, such as case deletion and single imputation (mean imputation, imputation by regression, hot deck imputation), and modern, such as multiple imputation methods and maximum likelihood methods (EM algorithm, FIML method). We described their advantages and disadvantages and recommendations regarding selection of treatment in relation to the missing data mechanism, the type and measurement level of variables, sample size, etc. Also, paper included an overview of treatment practices of categorical and numerical missing data in psychology articles published in top journals. It was concluded that the most commonly used treatment is the traditional listwise deletion, while multiple imputation is a slightly less used method. Thus, we provided an example of implementation of multiple imputation in SPSS.
Allison, P. D. (2002). Missing data, Sage University papers series on quantitative applications in the social sciences, series 07–136. Thousand Oaks, CA: Sage.
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5–37.
Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling, 15, 651–675.
Brown, R. L. (1994). Efficacy of the indirect approach for estimating structural equation models with missing data: A comparison of five methods. Structural Equation Modeling, 1, 287–316.
Chan, D. (1998). The conceptualization and analysis of change over time: An integrated approach incorporating longitudinal mean and covariance structures analysis (LMACS) and multiple indicator latent growth modeling (MLGM). Organizational Research Methods, 1, 421–483.
Chen, G., & Astebro, T. (2003). How to deal with missing categorical data: Test of a simple Bayesian method. Organizational Research Methods, 6, 309–327.
Demirtas, H., Freels, S. A., & Yucel, R. M. (2008). Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: A simulation assessment. Journal of Statistical Computation and Simulation, 78(1), 69–84.
Dong, Y., & Peng, C. Y. J. (2013). Principled missing data methods for researchers. Springer Plus, 2(1), 1–17.
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.
Fichman, M., & Cummings, J. N. (2003). Multiple imputation for missing data: Making the most of what you know. Organizational Research Methods, 6(3), 282–308.
Finch, W. H. (2010). Imputation methods for missing categorical questionnaire data: A comparison of approaches. Journal of Data Science, 8, 361–378.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576.
Graham, J. W., Cumsille, P. E., & Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Vol. 2. Research methods in psychology (pp. 87–114). New York: John Wiley & Sons.
Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197– 218.