Skip to main content

Espino Jeremy

Description

We are developing a Bayesian surveillance system for realtime surveillance and characterization of outbreaks that incorporates a variety of data elements, including free-text clinical reports. An existing natural language processing (NLP) system called Topaz is being used to extract clinical data from the reports. Moving the NLP system from a research project to a real-time service has presented many challenges.

 

Objective

Adapt an existing NLP system to be a useful component in a system performing real-time surveillance.

Submitted by hparton on
Description

Current practices of automated case detection fall into the extremes of diagnostic accuracy and timeliness. In regards to diagnostic accuracy, electronic laboratory reporting (ELR) is at one extreme and syndromic surveillance is at the other. In regards to timeliness, syndromic surveillance can be immediate, and ELR is delayed 7 days from initial patient visit. A plausible solution, a middle way, to the extremes of diagnostic precision and timeliness in current case detection practices is an automated Bayesian diagnostic system that uses all available data types, for example, freetext ED reports, radiology reports, and laboratory reports.We have built such a solution - Bayesian case detection (BCD). As a probabilistic system, BCD operates across the spectrum of diagnostic accuracy, that is, it outputs the degree of certainty for every diagnosis. In addition, BCD incorporates multiple data types as they appear during the course of a patient encounter or lifetime, with no degradation in the ability to perform diagnosis.

 

Objective

This paper describes the architecture and evaluation of our recently developed automated BCD system.

Submitted by hparton on
Description

Our laboratory previously established the value of over-the-counter (OTC) sales data for the early detection of disease outbreaks. We found that thermometer sales (TS) increased significantly and early during influenza (flu) season. Recently, the 2009 H1N1 outbreak has highlighted the need for developing methods that not only detect an outbreak but also estimate incidence so that public-health decision makers can allocate appropriate resources in response to an outbreak. Although a few studies have tried to estimate the H1N1 incidence in the 2009 outbreak, these were done months afterward and were based on data that are either not easy to collect or not available in a timely fashion (for example, surveys or confirmed laboratory cases).

Here, we explore the hypothesis that OTC sales data can also be used for predicting a disease activity. Towards that end, we developed a model to predict the number of Emergency Departments (ED) flu cases in a region based on TS. We obtain sales information from the National Retail Data Monitor (NRDM) project. NRDM collects daily sales data of 18 OTC categories across the US.

 

Objective

We developed a model that predicts the incidence of flu cases that present to ED in a given region based on TS.

Submitted by hparton on
Description

The Keyhole Markup Language (KML) format has become a recognized standard for the distribution of geographic information system data. In most recent versions of the Real-Time and Outbreak Disease Surveillance (RODS) system, we standardized on KML as our mapping solution. This decision obviates the need for commercial GIS servers and clients, and permits users to easily overlay RODS map output with other websites and software that output KML, for example, EPA, NASA, and NOAA.

We quickly recognized that the mapping tools in RODS have broad applicability in public health and other domains where there is a requirement to display spatial temporal data as it relates to state, county, and zip code geographies. To facilitate these needs, we created the EpiScape map generation service for public use.

 

Objective

This paper describes EpiScape, our map generation service. It generates three-dimensional static or animated maps as KML files that can be used to display epidemiologic data over time and space using Google Earth or Google Maps software.

Submitted by hparton on
Description

Most, if not all, disease surveillance systems are federated in the sense that hospitals, doctors’ offices, pharmacies are the source of most surveillance data. Although a health department may request or mandate that these organizations report data, we are not aware of any requirements about the method of data collection or audits or other measures of quality control.

Because of the heterogeneity and lack of control over the processes by which the data are generated, data sources in a federated disease surveillance system are black boxes the reliability, completeness, and accuracy of which are not fully understood by the recipient.

In this paper, we use the variance-to-mean ratio of daily counts of surveillance events as a metric of data quality. We use thermometer sales data as an example of data from a federated disease surveillance system. We test a hypothesis that removing stores with higher baseline variability from pooled surveillance data will improve the signal-to-noise ratio of thermometer sales for an influenza outbreak.

 

Objective

We developed a novel method for monitoring the quality of data in a federated disease surveillance system, which we define as ‘a surveillance system in which a set of organizations that are not owned or controlled by public health provide data.’

Submitted by hparton on
Description

Current methods for influenza surveillance include laboratory confirmed case reporting, sentinel physician reporting of Influenza-Like-Illness (ILI) and chief-complaint monitoring from emergency departments (EDs).

The current methods for monitoring influenza have drawbacks. Testing for the presence of the influenza virus is costly and delayed. Specific, sentinel physician reporting is subject to incomplete, delayed reporting. Chief complaint (CC) based surveillance is limited in that a patient’s chief complaint will not contain all signs and symptoms of a patient.

A possible solution to the cost, delays, incompleteness and low specificity (for CC) in current methods of influenza surveillance is automated surveillance of ILI using clinician-provided free-text ED reports.

 

Objective

This paper describes an automated ILI reporting system based on natural language processing of transcribed ED notes and its impact on public health practice at the Allegheny County Health Department.

Submitted by hparton on
Description

With the recent emphasis on public health preparedness, health departments are identifying new ways to prepare for emergencies. There has been a significant increase in the number of syndromic surveillance systems operating in recent years. These systems are based on real-time information from hospital emergency departments that is transmitted and analyzed electronically for the purpose of early detection of public health emergencies. Like other states, Rhode Island sought to enhance its traditional surveillance activities through the implementation of such a system. Rhode Island implemented the Real-time Outbreak and Disease Surveillance (RODS) system, developed by the University of Pittsburgh’s Center for Biomedical Informatics. Data from three hospitals were collected as part of the pilot implementation of the Rhode Island RODS system. Personnel at both hospitals and the Department of Health, trained in surveillance-related areas such as infection control and epidemiology, received access to RI RODS. As part of the evaluation framework, Rhode Island desired to assess system user attitudes and opinions towards the new system.

 

Objective

This paper presents results of a survey assessing syndromic surveillance system initial user satisfaction and attitudes regarding syndromic surveillance.

Submitted by elamb on
Description

Scientists have utilized many chief complaint (CC) classification techniques in biosurveillance including keyword search, weighted keyword search, and naïve Bayes. These techniques may utilize CC-to-syndrome or CC-to-symptom-to-syndrome classification approaches. In the former approach, we classify a CC directly into syndrome categories. In the latter approach, we first classify a CC into symptom categories. Then, we use a syndrome definition, a combination of one or more symptoms, to determine whether or not a chief complaint belongs in a particular syndrome category. One approach to CC-to-symptom-to-syndrome classification uses manually weighted keyword search and Boolean operations to build syndrome classifiers. A limitation to this approach is that it does not address uncertainty in the data and the system is manually parameterized. A CC-tosymptom-to-syndrome approach that is both probabilistic and utilizes machine learning addresses these limitations.

 

Objective

Design, build and evaluate a symptom-based probabilistic chief complaint classifier for the Real-time Outbreak and Disease Surveillance System.

Submitted by elamb on
Description

Bio-surveillance systems monitor multiple data streams (over-the-counter (OTC) sales, Emergency Department visits, etc.) to detect both natural disease outbreaks (e.g. influenza) and bio-terrorist attacks (e.g. anthrax re-lease). Many detection algorithms show impressive results under simulated environments, but the complex behavior of real-world data and high costs associated with processing false positives make it difficult to develop practical bio-surveillance systems. We believe that using expert knowledge from public health officials will help us to better understand the real-world data, improving our ability to distinguish actual disease outbreaks from non-outbreak patterns.

 

Objective

This paper describes the evolution of a bio-surveillance system that incorporates user feedback to improve system utility and usability. The system monitors national-level OTC pharmacy sales on a daily basis. We use fast spatio-temporal scan statistics to detect disease outbreaks.

Submitted by elamb on