Skip to main content

Classification

Description

Real-world public health data often provide numerous challenges. There may be a limited amount of background data, data dropouts, noise, and human error. The data from an emergency department (ED) in Urbana, IL includes a diagnosis field with multiple terms and notes separated by semicolons. There are over 7000 distinct terms, excluding the notes. Because it begins in April 2009, there is not yet adequate background data to use some of the regressionbased alerting algorithms. Values for some days are missing, so we also needed an algorithm that would tolerate data dropouts. 

INDICATOR is a workflow-based biosurveillance system developed at the National Center for Supercomputing Applications. One of the fundamental concepts of INDICATOR is that the burden of cleaning and processing incoming data should be on the software, rather than on the health care providers.

 

Objective

This paper compares different approaches with classification and anomaly detection of data from an ED.

Submitted by hparton on
Description

Event-based biosurveillance is a practice of monitoring diverse information sources for the detection of events pertaining to human health. Online documents, such as news articles on the Internet, have commonly been the primary information sources in event-based biosurveillance. With the large number of online publications as well as with the language diversity, thorough monitoring of online documents is challenging. Automated document classification is an important step toward efficient event-based biosurveillance. In Project Argus, a biosurveillance program hosted at Georgetown University Medical Center, supervised and unsupervised approaches to document classification are considered for event-based biosurveillance.

 

Objective

This paper describes ongoing efforts in enhancing automated document classification toward efficient event-based biosurveillance. 

Submitted by hparton on
Description

The Centers for Disease Control and Prevention's (CDC) Emerging Infections Program (EIP) monitors and studies many infectious diseases, including influenza. In 10 states in the US, information is collected for hospitalized patients with laboratory-confirmed influenza. Data are extracted manually by EIP personnel at each site, stripped of personal identifiers and sent to the CDC. The anonymized data are received and reviewed for consistency at the CDC before they are incorporated into further analyses. This includes identifying errors, which are used for classification.

 

Objective

Introducing data quality checks can be used to generate feedback that remediates and/or reduces error generation at the source. In this report, we introduce a classification of errors generated as part of the data collection process for the EIP’s Influenza Hospitalization Surveillance Project at the CDC. We also describe a set of mechanisms intended to minimize and correct these errors via feedback, with the collection sites.

Submitted by hparton on
Description

The goal of disease and syndromic surveillance is to monitor and detect aberrations in disease prevalence across space and time. Disease surveillance typically refers to the monitoring of confirmed cases of disease, whereas syndromic surveillance uses syndromes associated with disease to detect aberrations. In either situation, any proper surveillance system should be able to (i) detect, as early as possible, potentially harmful deviations from baseline levels of disease while maintaining low false positive detection rates, (ii) incorporate the spatial and temporal dynamics of a disease system, (iii) be widely applicable to multiple diseases or syndromes, (iv) incorporate covariate information and (v) produce results that are readily interpretable by policy decision makers.

Early approaches to surveillance were primarily computational algorithms. For example, the CUSUM technique and its variants (see, for example, Fricker et al.) monitor the cumulative deviation (over time) of disease counts from some baseline rate. A second line of work uses spatial scan statistics, originally proposed by Kulldorff with later extensions given in Walther and Neill et al.

 

Objective

Syndromic surveillance for new disease outbreaks is an important problem in public health. Many statistical techniques have been devised to address the problem, but none are able to simultaneously achieve important practical goals (good sensitivity and specificity, proper use of domain information, and transparent support to decision-makers). The objective, here, is to improve model-based surveillance methods by (i) detailing the structure of a hierarchical hidden Markov model for the surveillance of disease across space and time and (ii) proposing a new, non-separable spatio-temporal autoregressive model.

Submitted by hparton on
Description

In early May of 2013, two chemical spills occurred within high schools in Atlantic county. These incidents, occurring within a week of each other, highlighted the need to strengthen statewide syndromic surveillance of illnesses caused by such exposures. In response to these spills, a new 'chemical exposure' classifier was created in EpiCenter, New JerseyÕs syndromic surveillance system, to track future events by monitoring registration chief complaint data taken from emergency department visits. The primary objective behind creation of the new classifier is to provide local epidemiologists with prompt notification once EpiCenter detects an abnormal numbers of chemical exposure cases.

Objective

To describe the development of a new chemical exposure classifier in New Jersey's syndromic surveillance system (EpiCenter).

Submitted by elamb on