Skip to main content

Spatio - Temporal Scan

Description

There has been much recent interest in using disease signatures to better recognize disease outbreaks. Conversely, the metrics used to describe these signatures can also be used to better characterize the outbreaks. Recent work at the New York City Department of Health has shown the ability to identify characteristic age-specific patterns during influenza outbreaks. One issue that remains is how to implement a search for such patterns using prospective outbreak detection tools such as SatScan.

A potential approach to this problem arises from another currently active research area: the simultaneous use of multiple datastreams. One form of this is to disaggregate a data stream with respect to a third variable such as age. Two drawbacks to this approach are that the categories used to make the streams have to be defined a priori and that relationships between the streams cannot be exploited. Furthermore, the resulting description is less rich as it describes outbreaks in a few non-overlapping age-specific streams. It would be desirable to look for age specific patterns with the age groupings implicitly defined.

 

Objective

This paper presents an implementation of a citywide SatScan analysis that uses age as a one-dimensional spatial variable. The resulting clusters identify age-specific clusters of respiratory and fever/flu syndromes in the New York City Emergency Department Data.

Submitted by elamb on
Description

SaTScan is a program often used for space-time cluster detection. In order to run SaTScan, the data must be in a pre-specified text format. Once the input files are in the correct format, the typical user opens SaTScan, chooses the appropriate options, and runs SaTScan. The output from SaTScan consists of one or more text files with statistical and geographical information about the clusters. Errors in SaTScan often require re-extraction of the data into the specified text format.

When running SaTScan many times per day, as is commonly done in surveillance, it can be cumbersome to create all of the necessary data sets and run SaTScan. This is also true for any kind of evaluation of systems that rely on SaTScan for surveillance. In addition, the lack of graphical output, such as a map of the areas identified in the cluster, detracts from the utility of otherwise excellent software.

 

Objective

The purpose of this project was to create a SAS (SAS Institute, Cary, NC) interface for SaTScan which can be used to create the necessary input files, run SaTScan directly from SAS (without using SaTScan’s GUI), and to combine the output with geographic boundary files to create a single-page output containing a map and statistics describing the resulting clusters found by SaTScan.

Submitted by elamb on
Description

T-Cube is especially useful for rapidly retrieving responses to ad-hoc queries against large datasets of additive time series labeled using a set of categorical attributes. It can be used as a general tool to support any task requiring access to such data. From the application’s perspective it is transparent: it acts just like the database itself, but an incredibly quickly responding one. The authors had a chance to put T-Cubes into practical use as an enabling technology in applications requiring massive screening of multidimensional temporal data. These applications include two systems to support monitoring of food and agriculture safety and predictive analytics developed at the US Department of Agriculture and the Food and Drug Administration, as well as a system to monitor and forecast health of a fleet of aircraft operated by the US Air Force.

 

Objective

T-Cube, a data structure designed to efficiently represent large collections of temporal data has been shown to benefit surveillance applications involving monitoring sales of over-the-counter medications and emergency department visits. In this paper we present efficiencies which can be realized in practical applications of T-Cube beyond its original areas of deployment, and we advocate a widespread use of it as a technology which makes manual ad-hoc lookups as well as many kinds of complex automated analyses feasible.

Submitted by elamb on
Description

Tuberculosis (TB) has reemerged as a global public health epidemic in recent years. TB remains a serious public health problem among certain patient populations, and is prevalent in many urban areas. The World Health Organization estimates that approximately nine million individuals will develop active TB disease and more than two million will die from TB. The global burden of TB remains enormous, and will likely rank high among public health problems in the coming decades. Although evaluating local disease clusters leads to effective prevention and control of TB, there are few, if any, spatiotemporal comparisons for epidemic diseases. In this study, we used the space-time scan statistic to identify where and when the prevalence of TB is high in Fukuoka Prefecture. The ability to detect disease outbreaks is important for local and national health departments to minimize morbidity and mortality through timely implementation of disease prevention and control measures. Because the statistic meets these needs completely, results that are effective and practical for public health officials are expected from this study.

Submitted by elamb on
Description

Los Angeles County Department of Health Services is currently testing SaTScan’s space-time permutation model to assist in identifying aberrant illness activity in the community and determine it’s ability to detect outbreaks through analyzing real-time syndromic data. SaTScan could be useful especially due to its ability to provide geographic locations of outbreaks in the community.

 

Objective

To determine the usefulness of SaTScan as an outbreak and illness cluster detection tool in syndromic surveillance and to compare to a purely temporal CUSUM algorithm.

Submitted by elamb on
Description

Irregularly shaped cluster finders frequently end up with a solution consisting of a large zone z spreading through the map, which is merely a collection of the highest valued regions, but not a geographically sound cluster. One way to amenize this problem is to introduce penalty functions to avoid the excessive freedom of shape of z. The compactness penalty K(z) is a function used to reduce the scan value of irregularly shaped clusters, based on its geometric shape. Another penalty is the cohesion function C(z), a measure of the absence of weak links, or underpopulated regions within the cluster which disconnect it when removed. It was mentioned in that such weak links could be responsible for a diminished power of detection in cluster finder algorithms. Methods using those penalty functions present better performance. The geometric  compactness is not entirely satisfactory, although, because it has a tendency to avoid potentially interesting irregularly shaped clusters, acting as a low-pass filter. The cohesion function penalty method, although, has slightly less specificity.

 

Objective

We introduce a novel spatial scan algorithm for finding irregularly shaped disease clusters maximizing simultaneously two objectives: the regularity of shape and the internal cohesion of the cluster.

Submitted by elamb on
Description

Bio-surveillance systems monitor multiple data streams (over-the-counter (OTC) sales, Emergency Department visits, etc.) to detect both natural disease outbreaks (e.g. influenza) and bio-terrorist attacks (e.g. anthrax re-lease). Many detection algorithms show impressive results under simulated environments, but the complex behavior of real-world data and high costs associated with processing false positives make it difficult to develop practical bio-surveillance systems. We believe that using expert knowledge from public health officials will help us to better understand the real-world data, improving our ability to distinguish actual disease outbreaks from non-outbreak patterns.

 

Objective

This paper describes the evolution of a bio-surveillance system that incorporates user feedback to improve system utility and usability. The system monitors national-level OTC pharmacy sales on a daily basis. We use fast spatio-temporal scan statistics to detect disease outbreaks.

Submitted by elamb on
Description

Estimation of representative spatial probabilities and expected counts from baseline data can cause problems in applying spatial scan statistics when observed events are sparse in a large percentage of the spatial zones (e.g., zip codes or census tracts) found in the data records. In applications of scan statistics to datasets with fine spatial resolution, such as census tracts or block groups, such highly skewed data distributions are likely to occur. If the spatial distribution estimation process does not handle the zones with low counts correctly, bias in the determination of statistically significant clusters will occur.

In any 8-week baseline period, some of the sparse-data zones have no counts at all. If ignored, the zero-count spatial zones will result in division by zero in the loglikelihood ratio evaluation. The traditional method of setting a floor on the expected counts in each spatial zone leads to a loss of sensitivity when the number of zero count zones is a significant fraction of all the zones. One alternative method for estimating spatial probabilities is to add one count to the sum of baseline counts in each spatial zone. This method has been used in a study of spatial cluster detection using medical 911 call data from San Diego County with good results. However, when this method was applied to data with a more highly skewed spatial distribution, issues were uncovered which led to this investigation of alternatives.

 

Objective

Modifications to spatial scan statistics are investigated for prospective cluster detection at fine-resolution with highly skewed spatial distributions having many spatial zones with very few cases. Several alternative methods for the estimation of spatial probabilities and expected counts from counts in a baseline data window are evaluated with the Poisson spatial scan statistic and the space-time permutation scan statistic using goodness-of-fit statistics and cluster rates to compare performance.

Submitted by elamb on
Description

The spatial scan statistic [1] detects significant spatial clusters of disease by maximizing a likelihood ratio statistic over a large set of spatial regions. Typical spatial scan approaches either constrain the search regions to a given shape, reducing power to detect patterns that do not correspond to this shape, or perform a heuristic search over a larger set of irregular regions, in which case they may not find the most relevant clusters. In either case, computation time is a serious issue when searching over complex region shapeso r when analyzing a large amount of data. Analternative approach might be to search over all possible subsets of the data to find the  most relevant pat-terns, but since there are exponentially many subsets, an exhaustive search is computationally infeasible.

Objective

We present a new method of "linear-time subset scanning" and apply this technique to various spatial outbreak detection scenarios, making it computationally feasible (and very fast) to perform spatial scans over huge numbers of search regions.

Submitted by elamb on
Description

The spatial scan statistic is the usual measure of strength of a cluster [1]. Another important measure is its geometric regularity [2]. A genetic multiobjective algorithm was developed elsewhere to identify irregularly shaped clusters [3]. A search is executed aiming to maximize two objectives, namely the scan statistic and the regularity of shape (using the compactness concept). The solution presented is a Pareto-set, consisting of all the clusters found which are not simultaneously worse in both objectives. A significance evaluation is conducted in parallel for all clusters in the Pareto-set through Monte Carlo simulation, then finding the most likely cluster. \

Objective

Situations where a disease cluster does not have a regular shape are fairly common. Moreover, maps with multiple clustering, when there is not a clearly dominating primary cluster, also occur frequently. We would like to develop a method to analyze more thoroughly the several levels of clustering that arise naturally in a disease map divided into m regions.

Submitted by elamb on