Skip to main content

Outbreak Detection

Description

Developing and evaluating outbreak detection is challenging for many reasons.  A central difficulty is that the data the detection algorithms are “trained” on are often relatively short historical samples and thus do not represent the full range of possible background scenarios.  Once developed, the same dearth of historical data complicates evaluation.  In systems where only a count of cases is provided, plausible synthetic data are relatively easy to generate.  When precise location data is available, simple approaches to generating hypothetical cases is more difficult.

Advances in epidemiological modeling have allowed for increasingly realistic simulations of infectious disease spread in highly detailed synthetic populations. These agent-based simulations are capable of better representing real-world stochastic disease transmission process and thus show highly variable results even under identical initial conditions. Due to their ability to mimic a wide range outcomes and more fully represent the unknowns in a system, models of this class have become increasingly used to help inform decisions about public policies about hypothetical situations (eg pandemic influenza [1]).  This characteristic also makes them a powerful tool to represent the processes that create surveillance information.

Objective

Developing and evaluating detection algorithms in noisy surveillance data is complicated by a lack of realistic noise, meaning the surveillance data stream when nothing of public health interest is happening. These jobs are even more complex when data on the precise location of cases is available. This paper describes a methodology for plausible generation of such noise using agent-based models of infectious disease transmission based on highly resolved dynamic social networks.

Submitted by elamb on
Description

The performance of even the most advanced syndromic surveillance systems can be undermined if the monitored data is delayed before it arrives into the system.  In such cases, an outbreak may be detected only after it is too late for appropriate public health response. Surveillance systems can experience delays in data availability for a number of reasons: The process of transmitting data from data sources to the surveillance system can involve delays, especially in large systems where data is first aggregated across a national network of data sources before being transmitted to the surveillance system. Delays can also arise in the course of care, where, for example, a diagnosis is not available for a few days after the healthcare encounter.  It is important to minimize delays in data availability in order to maintain timeliness of detection [1].  When this is not possible, it is desirable to compensate for these data delays to minimize their effects.

Objective

This paper describes an approach to improving the detection timeliness of real-time health surveillance systems by modeling and correcting for delays in data availability.

Submitted by elamb on
Description

A significant research topic in biosurveillance is how to group individual events—such as single emergency department (ED) visits and sales of over-thecounter healthcare (OTC) products—into counts of “similar” events. For OTC products, the goal is to find categories of individual products that have superior outbreak detection performance relative to categories that biosurveillance systems currently monitor. We have described a method to identify OTC categories that correlate more highly with disease activity than existing categories.1 However, it is an open question whether a category that correlates more highly—or according to some other model has a higher ‘association’—with disease activity than an existing category necessarily has superior detection performance. Here, we evaluate whether a linear regression procedure that clusters OTC products based on how well they ‘explain’ ED visits for influenzalike illness (ILI) can find categories with superior outbreak-detection performance for influenza.

Objective

To develop a procedure that identifies product categories with superior outbreak detection performance.

Submitted by elamb on
Description

Space-time detection of disease clusters can be a computationally intensive task which defies the real time constraint for disease surveillance. At the same time, it has been shown that using exact patient locations, instead of their representative administrative regions, result in higher detection rates and accuracy while improving upon detection timeliness. Using such higher spatial resolution data, however, further exacerbates the computational burden on real time surveillance. The critical need for real time processing and interpretation of data dictate highly responsive models that may be best achievable utilizing high performance computing platforms.

Objective

Space-time detection techniques often require computationally intense searching in both the time and space domains. We introduce a high performance computing technique for parallelizing a variation of space-time permutation scan statistic applied to real data of varying spatial resolutions and demonstrate the efficiency of the technique by comparing the parallelized performance under different spatial resolutions with that of serial computation.

Submitted by elamb on
Description

Emergency Department surveillance methods currently rely on identification of acute illness by tracking chief complaint or ICD9 discharge codes. Newer generation electronic medical records are now capturing additional  information such as vital signs. These data have the potential for identifying disease syndromes earlier than the traditional methods.

 

Objective

This paper describes the temporal relationship between numbers of cases of fever, recorded as discrete vital sign data in an electronic medical record, and ICD9 Influenza Like Illnesses in the Emergency Department at the University of Wisconsin Hospital.

Submitted by elamb on
Description

The spatial scan statistic [1] detects significant spatial clusters of disease by maximizing a likelihood ratio statistic over a large set of spatial regions. Typical spatial scan approaches either constrain the search regions to a given shape, reducing power to detect patterns that do not correspond to this shape, or perform a heuristic search over a larger set of irregular regions, in which case they may not find the most relevant clusters. In either case, computation time is a serious issue when searching over complex region shapeso r when analyzing a large amount of data. Analternative approach might be to search over all possible subsets of the data to find the  most relevant pat-terns, but since there are exponentially many subsets, an exhaustive search is computationally infeasible.

Objective

We present a new method of "linear-time subset scanning" and apply this technique to various spatial outbreak detection scenarios, making it computationally feasible (and very fast) to perform spatial scans over huge numbers of search regions.

Submitted by elamb on
Description

Efforts have been made to standardize and prioritize the description and evaluation of syndromic surveillance systems. Systematic information on the performance of existing systems can be used to assess and compare the value of these systems, and inform decisions regarding their use. 

The Michigan’s Emergency Department Syndromic Surveillance System (MSSS) is an implementation of an early version of the Real-time Outbreak and Disease Surveillance system developed by the University of Pittsburgh, which collects patient chief complaint data from emergent care facilities in real time. At the Michigan Department of Community Health the system has been in use since 2003. Alterations to the system and recruitment of data contributors have been ongoing. The primary stated purpose of the MSSS is earlier detection of outbreaks of severe illness, enabling a more rapid public health response and intervention to reduce the impact of public health threats.

 

Objective

This work describes key characteristics of MSSS and reports on its evaluation.

Submitted by elamb on
Description

Animals continue to be recognized as a potential source of surveillance data for detecting emerging infectious diseases, bioterrorism preparedness, pandemic influenza preparedness, and detection of other zoonotic diseases. Detection of disease outbreaks in animals remains mostly dependent upon systems that are disease specific and not very timely. Most zoonotic disease outbreaks are detected only after they have spread to humans. The use of syndromic surveillance methods (outbreak surveillance using pre-diagnostic data) in animals is a possible solution to these limitations. The authors examine microbiology orders from a veterinary diagnostics laboratory (VDL) as a possible data source for early outbreak detection. They establish the species representation in the data, quantify the potential gain in timeliness, and use a CuSum method to study counts of microorganisms, animal species, and specimen collection sites as potential early indicators of disease outbreaks. The results indicate that VDL microbiology orders might be a useful source of data for a surveillance system designed to detect outbreaks of disease in animals earlier than traditional reporting systems.

Submitted by elamb on
Description

Research evaluating the use of spatial data for surveillance purposes is ongoing and evolving. As spatial methods evolve, it is important to characterize their effectiveness in real-world settings. Assessing the performance of surveillance systems has been difficult because there has been a paucity of data from real bioterrorism events. Recent efforts to assess surveillance system performance have focused on injecting synthetic outbreak data (signal) into actual background visit data. These studies focused on either temporal data, a single syndrome category, or a single bioterrorism agent. We are unaware of prior studies evaluating the performance of spatial outbreak detection for multiple syndrome categories in an operational surveillance system.

 

Objective

To characterize the performance of a spatial scan statistic, we used SaTScan to measure the sensitivity and positive predictive value for detecting simulated outbreaks having varying size, case density, and syndrome type.

Submitted by elamb on
Description

There are many proposed methods of identifying outbreaks of disease in surveillance data. However, there is little agreement about appropriate ways to choose amongst them. One common basis for comparison is simulating outbreaks and adding the simu lated cases to real data streams (‘injected outbreaks’); competing statistical methods then attempt to detect the outbreak. The receiver operating characteristic (ROC) curve and the area beneath it are well-known approaches to evaluation. The ROC curve plots the sensitivity against 1 less the specificity for a range of decision thresholds. Unfortunately, defining ROC curves in this context is not straightforward. In the usual setting of screening, ROC curves are constructed based on individuals, not populations, and it is unclear how to extend the concept to surveillance. In addition, the sensitivity and specificity need to be supplemented by the timeliness: a method with perfect sensitivity and specificity that detects outbreaks too late is useless.

 

Objective

We developed metrics for evaluating tools used for outbreak detection, assuming simulated outbreaks.

Submitted by elamb on