Skip to main content

Big Data

Description

Hepatitis C virus (HCV) infection is a leading cause of liver disease-related morbidity and mortality in the United States. Approximately 75% of people infected with chronic HCV were born between 1945 and 1965. Since 2012, the CDC has recommended one-time screening for chronic HCV infection for all persons in this birth cohort (baby boomers). The United States Preventive Services Task Force (USPSTF) subsequently made the same recommendation in June 2013. We estimated the rate of HCV testing between 2011 and 2017 among persons with commercial health insurance coverage and compared rates by birth cohort.

Objective: Using the two largest commercial laboratory data sources nationally, we estimated the annual rates of hepatitis C testing among individuals who were recommended to be tested (i.e., baby boomer cohort born between 1945 and 1965) by the CDC and United States Preventive Services Task Force. This panel will discuss strengths and weaknesses for monitoring hepatitis C testing using alternative data sources including self-reported data, insurance claims data, and laboratory testing data.

Submitted by elamb on
Description

The 2012 National Strategy for Biosurveillance (BSV) recognizes that a well-integrated national BSV enterprise must provide essential information for better decision making at all levels. Submitting an electronic bill following HC services is the most mature and widely used form of eHealth. HIPAA-compliant eHRCs captured in e-commerce can be consolidated into big HC data centers and used for many purposes including BSV. eHRCs are standardized and each claim contains pertinent person, place, and time information that can be leveraged for BSV. IMS Health (IMS) is a global HC information company and maintains one of worldÕs largest eHealth data centers that processed information including eHRCs on >260M unique U.S. patients in 2012.

Objective

This paper describes how high-volume electronic healthcare (HC) reimbursement claims (eHRCs) from providers' offices and retail pharmacies can be used to provide timely and accurate influenza-like illness (ILI) situational awareness at state and CBSA levels

Submitted by elamb on
Description

Multiple data sources are used in a variety of biosurveillance systems. With the advent of new technologies, globalization, high performance computing, and "big data" opportunities, there are seemingly unlimited potential data streams that could be useful in biosurveillance. Data streams have not been universally defined in either the literature or by specific biosurveillance systems. The definitions and framework that we have developed enable a characterization methodology that facilitates understanding of data streams and can be universally applicable for use in evaluating and understanding a wide range of biosurveillance activities- filling a gap recognized in both the public health and biosurveillance communities.

Objective

To develop a data stream-centric framework that can be used to systematically categorize data streams useful for biosurveillance systems, supporting comparative analysis

Submitted by knowledge_repo… on
Description

Health care processes consume increasing volumes of digital data. However, creating and leveraging high quality integrated health data is challenging because large-scale health data derives from systems where data is captured from varying workflows, yielding varying data quality, potentially limiting its utility for various uses, including population health. To ensure accurate results, it’s important to assess the data quality for the particular use. Examples of sub-optimal health data quality abound: accuracy varies for medication and diagnostic data in hospital discharge and claims data; electronic laboratory data used to identify notifiable public-health cases shows varying levels of completeness across data sources; data timeliness has been found to vary across different data sources. Given that there is clear increasing focus on large health data sources; there are known data quality issues that hinder the utility of such data; and there is a paucity of medical literature describing approaches for evaluating these issues across integrated health data sources, we hypothesize that novel methods for ongoing monitoring of data quality in rapidly growing large health data sets, including surveillance data, will improve the accuracy and overall utility of these data.

 

Objective

We describe how entropy, a key information measure, can be used to monitor the characteristics of chief complaints in an operational surveillance system.

Submitted by hparton on
Description

Hepatitis A virus (HAV) infections have persisted in the United States despite the availability of an effective vaccine. Recent outbreaks of HAV infections among unvaccinated adults attributed to consumption of HAV-contaminated food, or person-to-person contact in certain populations (e.g., men who have sex with men) or settings (e.g., homeless shelters) have emphasized the importance of targeted vaccination of at-risk adults.

Objective:

To evaluate the use of commercial laboratory data for monitoring trends in HAV infections over time and identifying geographic and demographic characteristics of HAV case clusters for the purpose of targeting interventions.

Submitted by elamb on
Description

GFT is a surveillance tool that gathers data on local internet searches to estimate the emergence of influenza-like illness in a given geographic location in real time.3 Previously, GFT has been proven to strongly correlate with influenza incidence at the national and regional level.2,3 GFT has shown promise as an easily accessed tool to enhance influenza surveillance and forecasting; however, further geographic validation of city-level data is needed. 1,2,6

Objective

To test if Google Flu Trends (GFT) is predictive of the volume of influenza and pneumonia emergency department (ED) visits across multiple United States cities.

 

Submitted by Magou on
Description

The National Strategy for Biosurveillance promotes a national effort to improve early detection and enable ongoing situational awareness of all-hazards threats. Implicit in the Strategy’s implementation plan is the need to upgrade capabilities and integrate multiple disparate data sources, including more complete electronic health record (EHR) data into future biosurveillance capabilities. Thus, new biosurveillance applications are clearly needed. Praedico™ is a next generation biosurveillance application that incorporates cloud computing technology, a Big Data platform utilizing MongoDB as a data management system, machine-learning algorithms, geospatial and advanced graphical tools, multiple EHR domains, and customizable social media streaming from public health-related sources, all within a user friendly interface.

Objective

The purpose of our study was to conduct an initial assessment of the biosurveillance capabilities of a new software application called Praedico™ and compare results obtained from previous queries with the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE).

 

Submitted by Magou on
Description

A variety of big data analytics, techniques and tools including social media analytics, open source visualizations, statistical anomaly detection, use of Application Programming Interfaces (APIs), and geospatial mapping, are used for infectious disease biosurveillance. Using these methodologies, policy makers and practitioners detect and monitor outbreaks across the world near real time, in multiple languages, 24/7. The non-infectious disease community, namely critical care, injury, and trauma stakeholders, currently lack this level of sophistication. To respond to most MCIs like a terrorist bombing, validated, real-time information is typically available via closed radio channels and limited to a specific set of emergency responders. Health care workers, policy makers, and citizens reach for news, radio, and Internet sources to characterize casualties and hazards, and increasingly social media. During the Boston Marathon bombing, witnesses began posting tweets seconds after the bombing and 15 seconds before CNN reported the incident. Current trauma data sets are unhelpful for real time response, including trauma registries that are used for hospital performance after an incident, and disaster databases consist of secondary reporting used for academic research purposes.

Objective

Discuss how different big-data analytics, techniques, and tools including open source platforms, cloud analytics, social media, crowdsourcing, and geospatial visualization can be used to quickly achieve situational awareness within seconds of a MCI, for use by pre-hospital responders, healthcare workers, and policy makers.

 

Submitted by Magou on
Description

This year’s conference theme is “Harnessing Data to Advance Health Equity” – and Washington State researchers and practitioners at the university, state, and local levels are leading the way in especially novel approaches to visualize health inequity and the effective translation of evidence into surveillance practice.

Objective

Washington is leading the way in especially novel approaches. Our goal is to share some of these innovative methods and discuss how these are used in State and Local monitoring of Health

 

Submitted by Magou on
Description

Many methods to detect outbreaks currently exist, although most are ineffective in the face of real data, resulting in high false positivity. More complicated methods have better precision, but can be difficult to interpret and justify. Praedico™ is a next generation biosurveillance application built on top of a Hadoop High Performance Cluster that incorporates multiple syndromic surveillance methods of alerting, and a machine-learning (ML) model using a decision tree classifier  evaluating over 100 different signals simultaneously, within a user friendly interface.

Objective

To compare syndromic surveillance alerting in VA using Praedico™ and ESSENCE.

Submitted by teresa.hamby@d… on