Start Submission Become a Reviewer

Reading: The Challenges of Data Quality Evaluation in a Joint Data Warehouse

Download

A- A+
dyslexia friendly

Empirical research

The Challenges of Data Quality Evaluation in a Joint Data Warehouse

Authors:

Charles J. Bae ,

Cleveland Clinic
About Charles J.
MD
X close

Sandra Griffith,

Cleveland Clinic
About Sandra
PhD
X close

Youran Fan,

Cleveland Clinic
About Youran
PhD
X close

Cheryl Dunphy,

Cleveland Clinic
About Cheryl
RN
X close

Nicolas Thompson,

Cleveland Clinic
About Nicolas
MS
X close

John Urchek,

Cleveland Clinic
X close

Alandra Parchman,

Cleveland Clinic
About Alandra
MHA
X close

Irene L. Katzan

Cleveland Clinic
About Irene L.
MD, MS
X close

Abstract

Introduction: The use of clinically derived data from electronic health records (EHRs) and other electronic clinical systems can greatly facilitate clinical research as well as operational and quality initiatives. One approach for making these data available is to incorporate data from different sources into a joint data warehouse. When using such a data warehouse, it is important to understand the quality of the data. The primary objective of this study was to determine the completeness and concordance of common types of clinical data available in the Knowledge Program (KP) joint data warehouse, which contains feeds from several electronic systems including the EHR.

Methods: A manual review was performed of specific data elements for 250 patients from an EHR, and these were compared with corresponding elements in the KP data warehouse. Completeness and concordance were calculated for five categories of data including demographics, vital signs, laboratory results, diagnoses, and medications.

Results: In general, data elements for demographics, vital signs, diagnoses, and laboratory results were present in more cases in the source EHR compared to the KP. When data elements were available in both sources, there was a high concordance. In contrast, the KP data warehouse documented a higher prevalence of deaths and medications compared to the EHR.

Discussion: Several factors contributed to the discrepancies between data in the KP and the EHR—including the start date and frequency of data feeds updates into the KP, inability to transfer data located in nonstructured formats (e.g., free text or scanned documents), as well as incomplete and missing data variables in the source EHR.

Conclusion: When evaluating the quality of a data warehouse with multiple data sources, assessing completeness and concordance between data set and source data may be better than designating one to be a gold standard. This will allow the user to optimize the method and timing of data transfer in order to capture data with better accuracy.

How to Cite: Bae CJ, Griffith S, Fan Y, Dunphy C, Thompson N, Urchek J, et al.. The Challenges of Data Quality Evaluation in a Joint Data Warehouse. eGEMs (Generating Evidence & Methods to improve patient outcomes). 2015;3(1):12. DOI: http://doi.org/10.13063/2327-9214.1125
2
Views
Published on 22 May 2015.
Peer Reviewed

Downloads

  • PDF (EN)