TRETMANI NEDOSTAJUĆIH PODATAKA
DOI:
https://doi.org/10.19090/pp.2015.3.289-309Ključne reči:
analiza nedostajućih podataka, multipla imputacija, kategorijalne varijable, numeričke varijableApstrakt
U radu je dat kritički osvrt na najčešće korišćene tretmane nedostajućih podataka: tradicionalne, kao što su isključivanje nedostajućih podataka i jednostruke imputacije (zamena nedostajućih podataka srednjom vrednošću, imputacija pomoću regresije, slučajna imputacija), moderne, kao što su tretmani zasnovani na maksimalnoj verodostojnosti (npr. EM algoritam i FIML metod) i metodi višestruke imputacije. Ukazano je na prednosti i mane svakog od ovih tretmana i preporuke u vezi sa odabirom tretmana u odnosu na mehanizam nedostajanja podataka, tip i nivo merenja varijable, veličinu uzorka i slično. Takođe, dat je pregled prakse tretmana kategorijalnih i numeričkih nedostajućih podataka u psihologiji u objavljenim radovima u vrhunskim psihološkim časopisima. Zaključeno je da je najčešće korišćen tradicionalni metod izbacivanja slučajeva sa nedostajućim vrednostima, a potom se u nešto manjem broju koristi metod multiple imputacije. S obzirom na to, u radu je dat primer sprovođenja multiple imputacije u SPSS-u.Metrics
Reference
Allison, P. D. (2002). Missing data, Sage University papers series on quantitative applications in the social sciences, series 07–136. Thousand Oaks, CA: Sage.
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48, 5–37.
http://dx.doi.org/10.1016/j.jsp.2009.10.001
Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling, 15, 651–675.
http://dx.doi.org/10.1080/10705510802339072
Brown, R. L. (1994). Efficacy of the indirect approach for estimating structural equation models with missing data: A comparison of five methods. Structural Equation Modeling, 1, 287–316.
http://dx.doi.org/10.1080/10705519409539983
Chan, D. (1998). The conceptualization and analysis of change over time: An integrated approach incorporating longitudinal mean and covariance structures analysis (LMACS) and multiple indicator latent growth modeling (MLGM). Organizational Research Methods, 1, 421–483.
http://dx.doi.org/10.1177/109442819814004
Chen, G., & Astebro, T. (2003). How to deal with missing categorical data: Test of a simple Bayesian method. Organizational Research Methods, 6, 309–327.
http://dx.doi.org/10.1177/1094428103254672
Demirtas, H., Freels, S. A., & Yucel, R. M. (2008). Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: A simulation assessment. Journal of Statistical Computation and Simulation, 78(1), 69–84.
http://dx.doi.org/10.1080/10629360600903866
Dong, Y., & Peng, C. Y. J. (2013). Principled missing data methods for researchers. Springer Plus, 2(1), 1–17.
http://dx.doi.org/10.1186/2193-1801-2-222
Enders, C. K. (2010). Applied missing data analysis. New York: Guilford Press.
Fichman, M., & Cummings, J. N. (2003). Multiple imputation for missing data: Making the most of what you know. Organizational Research Methods, 6(3), 282–308.
http://dx.doi.org/10.1177/1094428103255532
Finch, W. H. (2010). Imputation methods for missing categorical questionnaire data: A comparison of approaches. Journal of Data Science, 8, 361–378.
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576.
http://dx.doi.org/10.1146/annurev.psych.58.110405.085530
Graham, J. W., Cumsille, P. E., & Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Vol. 2. Research methods in psychology (pp. 87–114). New York: John Wiley & Sons.
http://dx.doi.org/10.1002/0471264385.wei0204
Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197– 218.
http://dx.doi.org/10.1207/s15327906mbr3102_3