Skip to main content

Data Analytics

Description

Temporal anomaly detection is a key component of real time surveillance. Today, despite the abundance of temporal information on multiple syndromes, multivariate investigation of temporal anomalies remains under-explored. Traditionally, an outbreak is thought of as disease localization in time. That is, for an event to qualify as an outbreak, a significant deviation from the observed distribution of the disease must occur.  However, the underlying processes that govern the health seeking behavior of a population with respect to one disease can potentially impact multiple syndromes leading to observable correlation patterns in the daily rates of those syndromes. Thus, a deviation from the observed correlation pattern between different syndromes can be an early indicator of potential anomalies when the rise in the daily rates of one or more syndrome is not sufficiently discernable to be identified by standard univariate techniques.

Objective

The objectives of this study are to develop a mathematical multi-syndrome framework for early detection of temporal anomalies, to demonstrate improvement in detection sensitivity and timeliness of the multivariate technique compared with those of standard uni-syndrome analysis, and to put forward a new practical concept for timely outbreak investigation.

Submitted by elamb on
Description

Existing statistical methods can perform well in detecting simulated bioterrorism events. However, these methods have not been well-evaluated for detection of the type of respiratory and gastrointestinal events of greatest interest for routine public health practice. To assess whether a syndromic surveillance system can detect these outbreaks, we constructed simulated outbreaks based on public health interest and experience. We then inserted these outbreaks into real data. We assessed whether the simulated outbreaks could be detected using a battery of detection methods, including model-adjusted scan statistics and space-time permutation scan statistics.

 

Objective

We used simulation methods to assess the performance of two distinct anomaly-detection approaches, each under a variety of parameter settings, with respect to their ability to detect outbreaks of commonly occurring events of public health importance.

Submitted by elamb on
Description

Monitoring sales of over-the-counter products is becoming increasingly common for purposes of public health surveillance. Sales data for anti-diarrheal medications have been used to monitor outbreaks of waterborne Cryptosporidium outbreaks. An attractive feature of is its focus upon coupling predictions of sales for a given day (based upon times series methods) with control chart methods from the field of statistical process control.

 

Objective

This paper suggests and illustrates several approaches to surveillance when data are available for several regions.

Referenced File
Submitted by elamb on
Description

Syndromic surveillance systems have long been an important part of the public health arena. The long standing goal of early detection of disease outbreak has gained new urgency and requires a broader spectrum in the era of potential bioterrorism. A number of programs have used syndromic surveillance to broadly monitor community health. Outpatient chief complaints as well as positive laboratory tests have been used to monitor the occurrence of natural diseases. 

Limitations of the systems currently attempted include overbroad syndromic categories, labor intensive syndrome recognition training and time intensive manual data entry. Optimal use of laboratory data has been impeded by some of the same issues as well as a too often narrow focus and significant limitations on real time reporting. Given the likelihood of blunt and/or penetrating trauma being a manifestation of terrorist activity, the continuous inclusion of common traumatic and medical emergency conditions is a valuable tool for surveillance.

 

Objective

This paper describes the use of a multiple collective community health care database to monitor the occurrence of natural and manmade illness and injuries.

Submitted by elamb on
Description

Estimation of representative spatial probabilities and expected counts from baseline data can cause problems in applying spatial scan statistics when observed events are sparse in a large percentage of the spatial zones (e.g., zip codes or census tracts) found in the data records. In applications of scan statistics to datasets with fine spatial resolution, such as census tracts or block groups, such highly skewed data distributions are likely to occur. If the spatial distribution estimation process does not handle the zones with low counts correctly, bias in the determination of statistically significant clusters will occur.

In any 8-week baseline period, some of the sparse-data zones have no counts at all. If ignored, the zero-count spatial zones will result in division by zero in the loglikelihood ratio evaluation. The traditional method of setting a floor on the expected counts in each spatial zone leads to a loss of sensitivity when the number of zero count zones is a significant fraction of all the zones. One alternative method for estimating spatial probabilities is to add one count to the sum of baseline counts in each spatial zone. This method has been used in a study of spatial cluster detection using medical 911 call data from San Diego County with good results. However, when this method was applied to data with a more highly skewed spatial distribution, issues were uncovered which led to this investigation of alternatives.

 

Objective

Modifications to spatial scan statistics are investigated for prospective cluster detection at fine-resolution with highly skewed spatial distributions having many spatial zones with very few cases. Several alternative methods for the estimation of spatial probabilities and expected counts from counts in a baseline data window are evaluated with the Poisson spatial scan statistic and the space-time permutation scan statistic using goodness-of-fit statistics and cluster rates to compare performance.

Submitted by elamb on
Description

We developed a probabilistic model of how clinicians are expected to detect a disease outbreak due to an outdoor release of anthrax spores, when the clinicians only have access to traditional clinical information (e.g., no computer-based alerts). We used this model to estimate an upper bound on the amount of time expected for clinicians to detect such an outbreak. Such estimates may be useful in planning for outbreaks and in assessing the usefulness of various computer-based outbreak detection algorithms.

Submitted by elamb on
Description

The primary objective of this study is to assess the capability of an advanced text analytics tool that uses natural language processing techniques to extract important medical information collected as part of routine emergency room care (history, symptoms, vital signs, test results, initial diagnosis, etc.). This information will be automatically, accurately, and efficiently converted from unstructured text into use-able information, which can then be used to identify cases that are the result of a naturally occurring outbreak or bioterrorism event. This information would then be available to (1) communicate to the treating physician, and (2) message back to organizations aggregating data at a higher level, such as the Centers for Disease Control and Prevention (CDC) and the Department of Homeland Security (DHS).

Submitted by elamb on
Description

We propose a novel technique for building generative models of real-valued multivariate time series data streams. Such models are of considerable utility as baseline simulators in anomaly detection systems. The proposed algorithm, based on Linear Dynamical Systems (LDS) [1], learns stable parameters efficiently while yielding more accurate results than previously known methods. The resulting model can be used to generate infinitely long sequences of realistic baselines using small samples of training data.

Submitted by elamb on