Skip to main content

Tsui Fu-Chiang

Description

We are developing a Bayesian surveillance system for realtime surveillance and characterization of outbreaks that incorporates a variety of data elements, including free-text clinical reports. An existing natural language processing (NLP) system called Topaz is being used to extract clinical data from the reports. Moving the NLP system from a research project to a real-time service has presented many challenges.

 

Objective

Adapt an existing NLP system to be a useful component in a system performing real-time surveillance.

Submitted by hparton on
Description

Current practices of automated case detection fall into the extremes of diagnostic accuracy and timeliness. In regards to diagnostic accuracy, electronic laboratory reporting (ELR) is at one extreme and syndromic surveillance is at the other. In regards to timeliness, syndromic surveillance can be immediate, and ELR is delayed 7 days from initial patient visit. A plausible solution, a middle way, to the extremes of diagnostic precision and timeliness in current case detection practices is an automated Bayesian diagnostic system that uses all available data types, for example, freetext ED reports, radiology reports, and laboratory reports.We have built such a solution - Bayesian case detection (BCD). As a probabilistic system, BCD operates across the spectrum of diagnostic accuracy, that is, it outputs the degree of certainty for every diagnosis. In addition, BCD incorporates multiple data types as they appear during the course of a patient encounter or lifetime, with no degradation in the ability to perform diagnosis.

 

Objective

This paper describes the architecture and evaluation of our recently developed automated BCD system.

Submitted by hparton on
Description

Our laboratory previously established the value of over-the-counter (OTC) sales data for the early detection of disease outbreaks. We found that thermometer sales (TS) increased significantly and early during influenza (flu) season. Recently, the 2009 H1N1 outbreak has highlighted the need for developing methods that not only detect an outbreak but also estimate incidence so that public-health decision makers can allocate appropriate resources in response to an outbreak. Although a few studies have tried to estimate the H1N1 incidence in the 2009 outbreak, these were done months afterward and were based on data that are either not easy to collect or not available in a timely fashion (for example, surveys or confirmed laboratory cases).

Here, we explore the hypothesis that OTC sales data can also be used for predicting a disease activity. Towards that end, we developed a model to predict the number of Emergency Departments (ED) flu cases in a region based on TS. We obtain sales information from the National Retail Data Monitor (NRDM) project. NRDM collects daily sales data of 18 OTC categories across the US.

 

Objective

We developed a model that predicts the incidence of flu cases that present to ED in a given region based on TS.

Submitted by hparton on
Description

Current methods for influenza surveillance include laboratory confirmed case reporting, sentinel physician reporting of Influenza-Like-Illness (ILI) and chief-complaint monitoring from emergency departments (EDs).

The current methods for monitoring influenza have drawbacks. Testing for the presence of the influenza virus is costly and delayed. Specific, sentinel physician reporting is subject to incomplete, delayed reporting. Chief complaint (CC) based surveillance is limited in that a patient’s chief complaint will not contain all signs and symptoms of a patient.

A possible solution to the cost, delays, incompleteness and low specificity (for CC) in current methods of influenza surveillance is automated surveillance of ILI using clinician-provided free-text ED reports.

 

Objective

This paper describes an automated ILI reporting system based on natural language processing of transcribed ED notes and its impact on public health practice at the Allegheny County Health Department.

Submitted by hparton on
Description

In disease surveillance, an outbreak is often present in more than one data type. If each data type is analyzed separately rather than combined, the statistical power to detect an outbreak may suffer because no single data source captures all the individuals in the outbreak. Researchers, thus, started to take multivariate approaches to syndromic surveillance. The data sources often analyzed include emergency department data, categorized by chief complaint; over-thecounter pharmaceutical sales data collected by the National Retail Data Monitor (NRDM), and some other syndromic data.

 

Objective

This study proposes a simulation model to generate the daily counts of over-the-counter medication sales, such as thermometer sales from all ZIP code areas in a study region that include the areas without retail stores based on the daily sales collected from the ZIP codes with retail stores through the NRDM. This simulation allows us to apply NRDM data in addition to other data sources in a multivariate analysis in order to rapidly detect outbreaks.

Submitted by hparton on
Description

Bio-surveillance systems monitor multiple data streams (over-the-counter (OTC) sales, Emergency Department visits, etc.) to detect both natural disease outbreaks (e.g. influenza) and bio-terrorist attacks (e.g. anthrax re-lease). Many detection algorithms show impressive results under simulated environments, but the complex behavior of real-world data and high costs associated with processing false positives make it difficult to develop practical bio-surveillance systems. We believe that using expert knowledge from public health officials will help us to better understand the real-world data, improving our ability to distinguish actual disease outbreaks from non-outbreak patterns.

 

Objective

This paper describes the evolution of a bio-surveillance system that incorporates user feedback to improve system utility and usability. The system monitors national-level OTC pharmacy sales on a daily basis. We use fast spatio-temporal scan statistics to detect disease outbreaks.

Submitted by elamb on
Description

Many cities in the US and the Center for Disease Control and Prevention have deployed biosurveillance systems to monitor regional health status. Biosurveillance systems rely on algorithms that analyze data in temporal domain (e.g., CuSUM) and/or spatial domain (e.g., SaTScan). Spatial domain-based algorithms often require population information to normalize the counts (e.g., emergency department visits) within a geographic region. This paper presents a new algorithm Ellipse-based Clustering Analysis (ECA) that analyzes data in both temporal and spatial domains--using time series analysis for each of zip codes with abnormal counts and using pattern recognition methods for spatial clusters.

 

Objective

This paper describes a new clustering algorithm ECA, which uses a time series algorithm to identify zip codes with abnormal counts, and uses a pattern recognition method to identify spatial clusters in ellipse shapes. Using ellipses could help detect elongated clusters resulting from wind dispersion of bio-agents. We applied the ECA to over-the-counter medicine sales. The pilot study demonstrated the potential use of the algorithm in detection of clustered outbreak regions that could be associated with aerosol release of bio-agents.

Submitted by elamb on
Description

Population surges or large events may cause shift of data collected by biosurveillance systems [1]. For example, the Cherry Blossom Festival brings hundreds of thousands of people to DC every year, which results in simultaneous elevations in multiple data streams (Fig. 1). In this paper, we propose an MGD model to accommodate the needs of dealing with baseline shifts.

Objective:

Outbreak detection algorithms monitoring only disease-relevant data streams may be prone to false alarms due to baseline shifts. In this paper, we propose a Multinomial-Generalized-Dirichlet (MGD) model to adjust for baseline shifts.

 

Submitted by Magou on
Description

Early detection of influenza outbreaks is critical to public health officials. Case detection is the foundation for outbreak detection. Previous study by Elkin el al. demonstrated that using individual emergency department (ED) reports can better detect influenza cases than using chief complaints. Our recent study using ED reports processed by Bayesian networks (using expert constructed network structure) showed high detection accuracy on detection of influenza cases.

Objective

Compare 7 machine learning algorithms with an expert constructed Bayesian network on detection of patients with influenza syndrome.

Submitted by teresa.hamby@d… on