Investigator: Jungpil Hahn, Ke-Wei Huang.
In today’s big data environment, missing values continue to be a problem that harms data quality. The bias caused by missing values raises the highest concern, as it cannot be eliminated simply by increasing the sample size. Although the statistics literature has developed approaches to handling missing values and formulated assumptions regarding when these approaches generate valid statistical inferences, these prescriptions have yet to be broadly accepted by many social science disciplines, including the information systems (IS) discipline. By reviewing recently published empirical research in information systems, we find that missing values are indeed an important and pervasive problem. We believe that a review of missing value theory is necessary and timely for the IS community to understand the nature of missing values and to promote more rigorous research practice when missing values are often unavoidable. In addition, the missing not at random (MNAR) mechanism brings about challenges in parameter estimation. We contribute to research practice by proposing and demonstrating the superior performance of a Monte Carlo likelihood approach in correcting bias in parameter estimation. We conclude by suggesting that research validity can be enhanced through a reasoned adoption of the missing value handling method and structured missing value reporting practices.