Skip to main content

Data Quality

Description

Health care processes consume increasing volumes of digital data. However, creating and leveraging high quality integrated health data is challenging because large-scale health data derives from systems where data is captured from varying workflows, yielding varying data quality, potentially limiting its utility for various uses, including population health. To ensure accurate results, it’s important to assess the data quality for the particular use. Examples of sub-optimal health data quality abound: accuracy varies for medication and diagnostic data in hospital discharge and claims data; electronic laboratory data used to identify notifiable public-health cases shows varying levels of completeness across data sources; data timeliness has been found to vary across different data sources. Given that there is clear increasing focus on large health data sources; there are known data quality issues that hinder the utility of such data; and there is a paucity of medical literature describing approaches for evaluating these issues across integrated health data sources, we hypothesize that novel methods for ongoing monitoring of data quality in rapidly growing large health data sets, including surveillance data, will improve the accuracy and overall utility of these data.

 

Objective

We describe how entropy, a key information measure, can be used to monitor the characteristics of chief complaints in an operational surveillance system.

Submitted by hparton on
Description

The National Notifiable Disease Surveillance System (NNDSS) comprises many activities including collaborations, processes, standards, and systems which support gathering data from US states and territories. As part of NNDSS, the National Electronic Disease Surveillance System (NEDSS) provides the standards, tools, and resources to support reporting public health jurisdictions (jurisdictions). The NEDSS Base System (NBS) is a CDC-developed, software application available to jurisdictions to collect, manage, analyze and report national notifiable disease (NND) data. An evaluation of NEDSS with the objective of identifying the functionalities of NC systems and the impact of these features on the user’s culture is underway.

 

Objective

The culture by which public health professionals work defines their organizational objectives, expectations, policies, and values. These aspects of culture are often intangible and difficult to qualify. The introduction of an information system could further complicate the culture of a jurisdiction if the intangibles of a culture are not clearly understood. This report describes how cultural modeling can be used to capture intangible elements or factors that may affect NEDSS-compatible (NC) system functionalities within the culture of public health jurisdictions.

Submitted by hparton on
Description

Data consisting of counts or indicators aggregated from multiple sources pose particular problems for data quality monitoring when the users of the aggregate data are blind to the individual sources. This arises when agencies wish to share data but for privacy or contractual reasons are only able to share data at an aggregate level. If the aggregators of the data are unable to guarantee the quality of either the sources of the data or the aggregation process then the quality of the aggregate data may be compromised. This situation arose in the Distribute surveillance system (1). Distribute was a national emergency department syndromic surveillance project developed by the International Society for Disease Surveillance for influenza-like-illness (ILI) that integrated data from existing state and local public health department surveillance systems, and operated from 2006 until mid 2012. Distribute was designed to work solely with aggregated data, with sites providing data aggregated from sources within their jurisdiction, and for which detailed information on the un-aggregated ‘raw’ data was unavailable. Previous work (2) on Distribute data quality identified several issues caused in part by the nature of the system: transient problems due to inconsistent uploads, problems associated with transient or long-term changes in the source make up of the reporting sites and lack of data timeliness due to individual site data accruing over time rather than in batch. Data timeliness was addressed using prediction intervals to assess the reliability of the partially accrued data (3). The types of data quality issues present in the Distribute data are likely to appear to some extent in any aggregate data surveillance system where direct control over the quality of the source data is not possible.

Objective

In this work we present methods for detecting both transient and long-term changes in the source data makeup.

 

Submitted by uysz on
Description

Analyses produced by epidemiologists and public health practitioners are susceptible to bias from a number of sources including missing data, confounding variables, and statistical model selection. It often requires a great deal of expertise to understand and apply the multitude of tests, corrections, and selection rules, and these tasks can be time-consuming and burdensome. To address this challenge, Aptima began development of CARRECT, the Collaborative Automation Reliably Remediating Erroneous Conclusion Threats system. When complete, CARRECT will provide an expert system that can be embedded in an analyst’s workflow. CARRECT will support statistical bias reduction and improved analyses and decision making by engaging the user in a collaborative process in which the technology is transparent to the analyst.

Objective

The objective of the CARRECT software is to make cutting edge statistical methods for reducing bias in epidemiological studies easy to use and useful for both novice and expert users.

 

Submitted by uysz on
Description

Communicable disease surveillance is a core Public Health function. Many diseases must be reported to state and federal agencies (1). To manage and adjudicate such cases, public health stakeholders gather various data elements. Since cases are identified in various healthcare settings, not all information sought by public health is available (2) resulting in varied field completeness, which affects the measured and perceived data quality. To better understand this variation, we evaluated public health practitioners’ perceived value of these fields to initiate or complete communicable disease reports.

Objective:

To assess communicable disease report fields required by public health practitioners and evaluate the variation in the perceived utility of these fields.

 

Submitted by Magou on
Description

National efforts to improve quality in public health are closely tied to advancing capabilities in disease surveillance. Measures of public health quality provide data to demonstrate how public health programs, services, policies, and research achieve desired health outcomes and impact population health. They also reveal opportunities for innovations and improvements. Similar quality improvement efforts in the health care system are beginning to bear fruit. There has been a need, however, for a framework for assessing public health quality that provides a standard, yet is flexible and relevant to agencies at all levels.

The U.S. Health and Human Services (HHS) Office of the Assistant Secretary for Health, working with stakeholders, recently developed and released a Consensus Statement on Quality in the Public Health System that introduces a novel evaluation framework. They identified nine aims that are fundamental to public health quality improvement efforts and six cross-cutting priority areas for improvement, including population health metrics and information technology; workforce development; and evidence-based practices.

Applying the HHS framework to surveillance expands measures for surveillance quality beyond typical variables (e.g., data quality and analytic capabilities) to desired characteristics of a quality public health system. The question becomes: How can disease surveillance help public health services to be more population centered, equitable, proactive, health-promoting, risk-reducing, vigilant, transparent, effective, and efficient—the desired features of a quality public health system? Any agency with a public health mission, or even a partial public health mission (e.g., tax-exempt hospitals), can use these measures to develop strategies that improve both the quality of the surveillance enterprise and public health systems, overall. At this time, input from stakeholders is needed to identify valid and feasible ways to measure how surveillance systems and practices advance public health quality. What exists now and where are the gaps?

Objective

To examine disease surveillance in the context of a new national framework for public health quality and to solicit input from practitioners, researchers, and other stakeholders to identify potential metrics, pivotal research questions, and actions for achieving synergy between surveillance practice and public health quality.

Submitted by teresa.hamby@d… on
Description

Uncertainty introduced by the selective identification of cases must be recognized and corrected for in order to accurately map the distribution of risk. Consider the problem of identifying geographic areas with increased risk of DRTB. Most countries with a high TB burden only offer drug sensitivity testing (DST) to those cases at highest risk for drug-resistance. As a result, the spatial distribution of confirmed DRTB cases under-represents the actual number of drug-resistant cases. Also, using the locations of confirmed DRTB cases to identify regions of increased risk of drug-resistance may bias results towards areas of increased testing. Since testing is neither done on all incident cases nor on a representative sample of cases, current mapping methods do not allow standard inference from programmatic data about potential locations of DRTB transmission.

Objective

Uncertainty regarding the location of disease acquisition, as well as selective identification of cases, may bias maps of risk. We propose an extension to a distance-based mapping method (DBM) that incorporates weighted locations to adjust for these biases. We demonstrate this method by mapping potential drug-resistant tuberculosis (DRTB) transmission hotspots using programmatic data collected in Lima, Peru.

Submitted by teresa.hamby@d… on
Description

Understanding your data is a fundamental pillar of disease surveillance success. With the increase in automated, electronic surveillance tools many public health users have begun to rely on those tools to produce reports that contain processed results to perform their daily jobs. These tools can focus on the algorithm or visualizations needed to produce the report, and can easily overlook the quality of the incoming data. The phrase “garbage in, garbage out” is often used to describe the value of reports when the incoming data is not of high quality. There is a need then, for systems and tools that help users determine the quality of incoming data.

Objective

The objective of this project was to develop visualizations and tools for public health users to determine the quality of their surveillance data. Users should be able to determine or be warned when significant changes have occurred to their data streams, such as a hospital converting from a free-text chief complaint to a pick list. Other data quality factors, such as individual variable completeness and consistency in how values are mapped to standard system selections should be available to users. Once built, these new visualizations should also be evaluated to determine their usefulness in a production disease surveillance system.

Submitted by teresa.hamby@d… on
Description

The Louisiana Office of Public Health (OPH) Infectious Disease Epidemiology Section (IDEpi) conducts syndromic surveillance of Emergency Department (ED) visits through the Louisiana Early Event Detection System (LEEDS) and submits the collected data to ESSENCE. There are currently 86 syndromes defined in LEEDS including infectious disease, injury and environmental exposure syndromes, among others. LEEDS uses chief complaint, admit reason, and/or diagnosis fields to tag visits to relevant syndromes. Visits that do not have information in any of these fields, or do not fit any syndrome definition are tagged to Null syndrome. ESSENCE uses a different algorithm from LEEDS and only looks in chief complaint for symptom information to bin visits to syndromes defined in ESSENCE. Visits that do not fit the defined syndromes or do not contain any symptom information are tagged to Other syndrome. Since the transition from BioSense to ESSENCE, IDEpi has identified various data quality issues and has been working to address them. The NSSP team recently notified IDEpi that a large number of records are binning to Other syndrome, which led to the investigation of the possible underlying data quality issues captured in Other syndrome.

Objective:

This investigation takes a closer look at Other syndrome in ESSENCE and Null syndrome in LEEDS to understand what types of records are not tagged to a syndrome to elucidate data quality issues.

Submitted by elamb on
Description

Oregon Public Health Division (OPHD), in collaboration with The Johns Hopkins University Applied Physics Laboratory, implemented Oregon ESSENCE in 2011. ESSENCE is an automated, electronic syndromic surveillance system that captures emergency department data from hospitals across Oregon. While each hospital system sends HL7 2.5.1-formatted messages, each uses a uniquely configured interface to capture, extract, and send data. Consequently, ESSENCE receives messages that vary greatly in content and structure. Emergency department data are ingested using the Rhapsody Integration Engine 6.2.1 (Orion Health, Auckland, NZ), which standardizes messages before entering ESSENCE. Mechanisms in the ingestion route (error-handling filters) identify messages that do not completely match accepted standards for submission. A sub-set of these previously-identified messages with errors are corrected within the route as they emerge. Existence of errors does not preclude a message’s insertion into ESSENCE. However, the quality and quantity of errors determine the quality of the data that ESSENCE uses. Unchecked, error accumulation also can cause strain to the integration engine. Despite ad-hoc processes to address errors, backlogs accrue. With no meta-data to assess the importance and source of backlogged errors, the ESSENCE team had no guide with which to mitigate errors. The ESSENCE team needed a way to determine which errors could be fixed by updating the Rhapsody Integration Engine and which required consultation with partner health systems and their data vendors. To formally address these issues, the ESSENCE team developed an error-capture module within Rhapsody to identify and quantify all errors identified in syndromic messages and to use as a guide to prioritize fixing new errors.

Objective:

To streamline emergency department data processing in Oregon ESSENCE (Oregon’s statewide syndromic surveillance) by systematically and efficiently addressing data quality issues among submitting hospital systems.

Submitted by elamb on