TREATMENTS OF MISSING DATA
DOI:
https://doi.org/10.19090/pp.2015.3.289-309Keywords:
missing data analysis, multiple imputation, categorical variables, numerical variablesAbstract
This paper presented a critical review of the most commonly used treatments of missing data: traditional, such as case deletion and single imputation (mean imputation, imputation by regression, hot deck imputation), and modern, such as multiple imputation methods and maximum likelihood methods (EM algorithm, FIML method). We described their advantages and disadvantages and recommendations regarding selection of treatment in relation to the missing data mechanism, the type and measurement level of variables, sample size, etc. Also, paper included an overview of treatment practices of categorical and numerical missing data in psychology articles published in top journals. It was concluded that the most commonly used treatment is the traditional listwise deletion, while multiple imputation is a slightly less used method. Thus, we provided an example of implementation of multiple imputation in SPSS.Metrics
References
Allison, P. D. (2002). Missing data, Sage University papers series on quantitative applications in the social sciences, series 07–136. Thousand Oaks, CA: Sage.
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5–37.
http://dx.doi.org/10.1016/j.jsp.2009.10.001
Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling, 15, 651–675.
http://dx.doi.org/10.1080/10705510802339072
Brown, R. L. (1994). Efficacy of the indirect approach for estimating structural equation models with missing data: A comparison of five methods. Structural Equation Modeling, 1, 287–316.
http://dx.doi.org/10.1080/10705519409539983
Chan, D. (1998). The conceptualization and analysis of change over time: An integrated approach incorporating longitudinal mean and covariance structures analysis (LMACS) and multiple indicator latent growth modeling (MLGM). Organizational Research Methods, 1, 421–483.
http://dx.doi.org/10.1177/109442819814004
Chen, G., & Astebro, T. (2003). How to deal with missing categorical data: Test of a simple Bayesian method. Organizational Research Methods, 6, 309–327.
http://dx.doi.org/10.1177/1094428103254672
Demirtas, H., Freels, S. A., & Yucel, R. M. (2008). Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: A simulation assessment. Journal of Statistical Computation and Simulation, 78(1), 69–84.
http://dx.doi.org/10.1080/10629360600903866
Dong, Y., & Peng, C. Y. J. (2013). Principled missing data methods for researchers. Springer Plus, 2(1), 1–17.
http://dx.doi.org/10.1186/2193-1801-2-222
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.
Fichman, M., & Cummings, J. N. (2003). Multiple imputation for missing data: Making the most of what you know. Organizational Research Methods, 6(3), 282–308.
http://dx.doi.org/10.1177/1094428103255532
Finch, W. H. (2010). Imputation methods for missing categorical questionnaire data: A comparison of approaches. Journal of Data Science, 8, 361–378.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576.
http://dx.doi.org/10.1146/annurev.psych.58.110405.085530
Graham, J. W., Cumsille, P. E., & Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Vol. 2. Research methods in psychology (pp. 87–114). New York: John Wiley & Sons.
http://dx.doi.org/10.1002/0471264385.wei0204
Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197– 218.
http://dx.doi.org/10.1207/s15327906mbr3102_3