Skip to main content

Anomaly Detection

Description

After the 2009 H1N1 pandemic, the Assistant Secretary of Defense for Nuclear, Chemical and Biological Defense indicated œbiodefense would include emerging infectious disease. In response, DTRA launched an initiative for an innovative, rapidly emerging capability to enable real-time biosurveillance for early warning and course of action analysis. Through competitive prototyping, DTRA selected Digital Infuzion to develop the platform and next generation analytics. This work was extended to enhance collaboration capabilities and to harness data science and advanced analytics for multi-disciplinary surveillance including climate, crop, and animal as well as human data. New analysis tools ensure the BSVE supports a One Health paradigm to best inform public health action. Digital Infuzion and DTRA first introduced the BSVE to the ISDS community at the 2013 annual conference SWAP Meet. Digital Infuzion is pleased to present the mature platform to this community again as it is now a fully developed capability undergoing FedRAMP certification with the Department of Homeland Security's National Biosurveillance Integration Center and Is the basis for Digital Infuzion's HARBINGER ecosystem for biosurveillance.

Objective: While there is a growing torrent of data that disease surveillance could leverage, few effective tools exist to help public health professionals make sense of this data or that provide secure work-sharing and communication. Meanwhile, our ever more-connected world provides an increasingly receptive environment for diseases to emerge and spread rapidly making early warning and collaborative decision-making essential to saving lives and reducing the impact of outbreaks. Digital Infuzion's previous work on the Defense Threat Reduction Agency (DTRA)'s Biosurveillance Ecosystem (BSVE) built a cloud-based platform to ingest big data with analytics to provide users a robust surveillance environment. We next enhanced the BSVE data sources and analytics to support an integrated One Health paradigm. The resulting BSVE and Digital Infuzion's HARBINGER platform include: 1) identifying and ingesting data sources that span global human, animal and crop health; 2) inclusion of non-health data such as travel, weather, and infrastructure; 3) the data science tools, analytics and visualizations to make these data useful and 4) a fully-featured Collaboration Center for secure work-sharing and communication across agencies.

Submitted by elamb on
Description

Real-world public health data often provide numerous challenges. There may be a limited amount of background data, data dropouts, noise, and human error. The data from an emergency department (ED) in Urbana, IL includes a diagnosis field with multiple terms and notes separated by semicolons. There are over 7000 distinct terms, excluding the notes. Because it begins in April 2009, there is not yet adequate background data to use some of the regressionbased alerting algorithms. Values for some days are missing, so we also needed an algorithm that would tolerate data dropouts. 

INDICATOR is a workflow-based biosurveillance system developed at the National Center for Supercomputing Applications. One of the fundamental concepts of INDICATOR is that the burden of cleaning and processing incoming data should be on the software, rather than on the health care providers.

 

Objective

This paper compares different approaches with classification and anomaly detection of data from an ED.

Submitted by hparton on
Description

Syndromic surveillance typically involves collecting time-stamped transactional data, such as patient triage or examination records or pharmacy sales. Such records usually span multiple categorical features, such as location, age group, gender, symptoms, chief complaints, drug category and so on. The key analytic objective to identify potential disease clusters in such data observed recently (for example during last one week) as compared with baseline (for example derived from data observed over previous few months). In real world scenarios, a disease outbreak can impact any subset of categorical dimensions and any subset of values along each categorical dimension. As evaluating all possible outbreak hypotheses can be computationally challenging, popular state-of-the-art algorithms either limit the scope of search to exclusively conjunctive definitions or focus only on detecting spatially co-located clusters for disease outbreak detection. Further, it is also common to see multiple disease outbreaks happening simultaneously and affecting overlapping subsets of dimensions and values. Most such algorithms focus on finding just one most significant anomalous cluster corresponding to a possible disease outbreak, and ignore the possibility of a concurrent emergence of additional clusters.

 

Objective

We present Disjunctive Anomaly Detection (DAD), a novel algorithm to detect multiple overlapping anomalous clusters in large sets of categorical time series data. We compare performance of DAD and What’s Strange About Recent Events on a disease surveillance data from Sri Lanka Ministry of Health.

Submitted by hparton on
Description

A number of syndromic surveillance systems include tools that quickly identify potentially large disease outbreak events. However, the high falsepositive rate continues to be a problem in all of these systems. Our earlier work has showed that multi-source information fusion can improve specificity of the syndromic surveillance systems. However, an anomalous health event that presents as only a few cases may remain undetected because the chief complaint data does not contain enough details. New linked data sources need to be used to enhance detection capabilities. The focus of this project examining the incorporation of laboratory, prescription medications and radiology data linked to the patient encounter within syndromic surveillance systems. These data source linkings may enhance the sensitivity of syndromic surveillance.

Submitted by elamb on
Description

Objective

To enable the early detection of pandemic influenza, we have designed a system to differentiate between severe and mild influenza outbreaks. Historic information about previous pandemics suggested the evaluation of two specific discriminants: (1) the rapid development of disease to pneumonia within 1-2 days and (2) patient age distribution, as the virus usually targets specific age groups. The system is based on the hypothesis that an increased number of diagnosed pneumonia cases offers an early indication of severe influenza outbreaks. This approach is based on the fact that pneumonia cases will appear promptly in a severe influenza outbreak and can be diagnosed immediately in a physician office visit, while a confirmed influenza diagnosis requires a laboratory test. Furthermore, laboratory tests are unlikely to be ordered outside of the expected influenza season.

Submitted by elamb on
Description

The success of syndromic surveillance depends on the ability of the surveillance community to quickly and accurately recognize anomalous data. Current methods of anomaly detection focus on sets of syndromic categories and rely on a priori knowledge to map chief complaints to these general syndromic categories. As a result, the mapping scheme may miss key terms and phrases that have not previously been used. Furthermore, analysts do not have a good way of being alerted to these new terms in order to determine if they should be added to the syndromic mapping schema. We use a dynamic dictionary of terms to side-step the downfalls of a priori knowledge in this rapidly evolving field by alerting the analyst to rare and brand new words used in the chief complaint field.

Objective

To automate the detection of very unusual emergency department chief complaints based on a comparison between a trained dictionary of terms and the unstructured chief complaint field.

Submitted by knowledge_repo… on
Description

San Diego County Public Health has been conducting syndromic surveillance for the past few years. Currently, the system has become largely automated and processes and analyzes data from a variety of disparate sources including hospital emergency departments, 911 call centers, prehospital transports, and over-the-counter drug sales. What has remained constant since the system’s initial conceptualization is the local opinion that the data should be analyzed and interpreted in a variety of ways, in anticipation for the variety of contexts in which events that are of public health interest may unfold. Relatively small increases in volume that are sustained over time will likely be detected by methods designed to detect “small process shifts”, and include the CUSUM and EWMA methods. Larger increases in volume that are not sustained over time will likely be detected by other employed methods (P-Chart in the event of a non-proportional increase in volume, U-Chart in the event of a proportional increase in volume). A retrospective analysis was conducted on historical data from various data sources to determine the frequency of signals and detected events as well as the context within which the alert occurred (i.e., the “shape” of the data). Findings regarding several actual public health events will also be discussed.

 

Objective

This paper describes the frequency, various “shapes” and magnitudes of data anomalies, and varying ways actual public health events may present themselves in syndromic data.

Submitted by elamb on
Description

As the Georgia Division of Public Health began constructing a systems interface for its syndromic surveillance program, the nature and intended use of these data inspired new approaches to interface design. With the temporal and spatial components of these data serving as fundamental determinants within common aberration detection methods (e.g., Early Aberration Reporting System, SaTScan™), it became apparent that an interface technique that could present a synthesis of the two might better facilitate the visualization, interpretation and analysis of these data.

Typical presentations of data spatially oriented at the zip code level use a color gradient applied to a zip code polygon to represent the differences in magnitude of events within a given region across a particular time span. Typical presentations of temporally oriented data use time series graphs and tabular formats. Visualizations that present both aspects of spatially and temporally rich datasets within a single visualization are noticeably absent.

 

Objective

This paper describes an approach to the visualization of disease surveillance data through the use of animation techniques applied to datasets with both temporal and geospatial components.

Submitted by elamb on
Description

Ideal anomaly detection algorithms should detect both sudden and gradual changes, while keeping the background false positive alert rate at a tolerable level. Further, the algorithm needs to perform well when the need is to detect small outbreaks in low-incidence diseases. For example, when surveillance is done based on the specific ICD9 diagnosis of flu rather than a larger syndromic grouping, the baseline counts will generally be low, in the range of 0 or 1 per day even in a large sample of EDs. 

 

Objective

Our goal was to determine the sensitivity of detection of various inserted outbreak sizes and shapes using a modified Holt-Winters detection algorithm applied to daily flu count data before the flu season and after its peak. We compare our results to C3 of EARS.

Submitted by elamb on