Skip to main content

Data Quality

Description

Timely and accurate syndromic surveillance depends on continuous data feeds from healthcare facilities. Typical outlier detection methodologies in syndromic surveillance compare predictions of counts for an interval to observed event counts, either to detect increases in volume associated with public health incidents or decreases in volume associated with compromised data transmission. Accurate predictions of total facility volume need to account for significant variance associated with the time of day and week; at the extreme are facilities which are only open during limited hours and on select days. Models need to account for the cross-product of all hours and days, creating a significant data burden. Timely detection of outages may require sub-hour aggregation, increasing this burden by increasing the number of intervals for which parameters need to be estimated. Nonparametric models for the probability of message arrival offer an alternative approach to generating predictions. The data requirements are reduced by assuming some time-dependent structure in the data rather than allowing each interval to be independent of all others, allowing for predictions at sub-hour intervals.

Objective:

Characterize the behavior of nonparametric regression models for message arrival probability as outage detection tools.

Submitted by elamb on
Description

The National Syndromic Surveillance Program (NSSP) is a community focused collaboration among federal, state, and local public health agencies and partners for timely exchange of syndromic data. These data, captured in nearly real time, are intended to improve the nation's situational awareness and responsiveness to hazardous events and disease outbreaks. During CDC’s previous implementation of a syndromic surveillance system (BioSense 2), there was a reported lack of transparency and sharing of information on the data processing applied to data feeds, encumbering the identification and resolution of data quality issues. The BioSense Governance Group Data Quality Workgroup paved the way to rethink surveillance data flow and quality. Their work and collaboration with state and local partners led to NSSP redesigning the program’s data flow. The new data flow provided a ripe opportunity for NSSP analysts to study the data landscape (e.g., capturing of HL7 messages and core data elements), assess end-to-end data flow, and make adjustments to ensure all data being reported were processed, stored, and made accessible to the user community. In addition, NSSP extensively documented the new data flow, providing the transparency the community needed to better understand the disposition of facility data. Even with a new and improved data flow, data quality issues that were issues in the past, but went unreported, remained issues in the new data. However, these issues were now identified. The newly designed data flow provided opportunities to report and act on issues found in the data unlike previous versions. Therefore, an important component of the NSSP data flow was the implementation of regularly scheduled standard data quality checks, and release of standard data quality reports summarizing data quality findings.

Objective:

Review the impact of applying regular data quality checks to assess completeness of core data elements that support syndromic surveillance.

Submitted by elamb on
Description

Effective clinical and public health practice in the twenty-first century requires access to data from an increasing array of information systems. However, the quality of data in these systems can be poor or “unfit for use.” Therefore measuring and monitoring data quality is an essential activity for clinical and public health professionals as well as researchers. Current methods for examining data quality largely rely on manual queries and processes conducted by epidemiologists. Better, automated tools for examining data quality are desired by the surveillance community.

Objective:

To extend an open source analytics and visualization platform for measuring the quality of electronic health data transmitted to syndromic surveillance systems.

Submitted by elamb on
Description

BioSense 2.0, a redesigned national syndromic surveillance system, provides users with timely regional and national data classified into disease syndromes, with views of health outcomes and trends for use in situational awareness. As of July 2014, there are 60 jurisdictions nationwide feeding data into BioSense 2.0. In New Jersey, the state’s syndromic surveillance system, EpiCenter, receives registration data from 75 of NJ’s 80 acute care and satellite emergency departments. EpiCenter is a system developed by Health Monitoring Systems, Inc. (HMS) that incorporates statistical management and analytical techniques to process health-related data in real time. To participate in BioSense 2.0, New Jersey worked with HMS to connect existing data to BioSense. In May, 2013, HMS established a single data feed of New Jersey’s facility data to BioSense 2.0. This transfer from HMS servers occurs twice daily via SFTP. The average daily visit volume in the transfer is around 10,000 records. This data validation project was initiated by the New Jersey Department of Health (NJDOH) in 2013 to assure that the registration records are delivered successfully to BioSense 2.0.

Objective

To assess and validate New Jersey’s ED registration data feed from EpiCenter to BioSense 2.0.

Submitted by teresa.hamby@d… on
Description

Timeliness of reports sent by laboratories and providers is a continuous challenge for disease surveillance and management. Public health organizations often collect communicable disease reports with various degrees of timeliness raising the concern about the delay in patient information received. Timely reports are beneficial to accurately evaluate community health needs and investigate disease outbreaks. According to Indiana state law, chlamydia reports are required to be sent to public health within 3 days after a positive test result confirmation. Therefore, laboratories and providers must be accountable and comply with regulation to ensure accurate data quality of disease assessment.

Objective

To analyze the time delay between a chlamydia positive test diagnosis and when a laboratory and/or a provider sends a report to a local public health department.

Submitted by teresa.hamby@d… on
Description

Effective use of data for disease surveillance depends critically on the ability to trust and quantify the quality of source data. The Scalable Data Integration for Disease Surveillance project is developing tools to integrate and present surveillance data from multiple sources, with an initial focus on malaria. Consideration of data quality is particularly important when integrating data from diverse clinical, population-based, and other sources. Several global initiatives to reduce the burden of malaria (Presidents Malaria Initiative, Roll Back Malaria Initiative and The Global Fund to Fight AIDS, Tuberculosis and Malaria) have published lists of recommended indicators. Values for these indicators can be obtained from different data sources, with each source having different data quality properties as a consequence of the type of data collected and the method used to collect the data. Our goal is to develop a framework for organizing the data quality (DQ) properties of indicators used for disease surveillance in this setting.

Submitted by teresa.hamby@d… on

Problem Summary

Data collection across a growing stream of contributing facilities and variables requires automated, consistent, and efficient monitoring of quality. Epidemiologists tasked with analyzing syndromic data need to be confident in the overall quality of their data, and aware of the effects of poor data quality when interpreting data. Data quality is also increasingly important as data are shared across jurisdictions and combined for analysis.

Submitted by ctong on

These slides provide an overview of the onboarding process for jurisdictions in Kansas supplying data for BioSense. This presentation emphasizes steps needed to improve data quality.

Submitted by uysz on