Skip to main content

Cluster Detection

Description

Tuberculosis (TB) has reemerged as a global health epidemic in recent years. Although several researchers have examined the use of space-time surveillance to detect TB clusters, they have not used genetic information to verify that detected clusters are due to person-to-person transmission. Using genetic fingerprinting data for TB cases, we sought to determine whether detected clusters were due to recent transmission.

 

Objective

This paper describes the utility of prospective spacetime surveillance to detect genetic clusters of TB due to person-to-person spread.

Submitted by elamb on
Description

In a classical surveillance system one looks for disturbances in the number of cases, but in a spatio-temporal system, not only the number of cases observed but also where they are located is reported. What location is reported, and to which degree of accuracy it is reported are important. At one extreme les near-perfect information about each case, as with contact tracing; at the other extreme we have no information about location; viz. just that the patient exisits, or a temporal system. From maximum spatial precision to no spatial precision, one gains in speed of reporting and privacy; but one loses power to detect outbreaks. For example, in Ozonoff et al. we see that more than one address is better than just a single one. This general point is intuitively appealing, and can be demonstrated. 

 

Objective

This paper quantifies the effect of not providing full information about the location of patients when dealing with spatio-temporal systems in syndromic surveillance. The study investigates the loss of power to detect clusters when aggregation takes place. 

Submitted by elamb on
Description

CDC is building a public health information grid to enable controlled distribution of data, services and applications for researchers, Federal authorities, local and state health departments nationwide, enabling efficient controlled sharing of data and analytical tools. Federated aggregate analysis of distributed data sources may detect clusters that might be invisible to smaller, isolated systems. Success of the public health grid is contingent upon the number of participating agencies and the quantity, quality, and utility of data and applications available for sharing. Grid protocols allow data owners to control data access, but requires a model to control the level of identifiability of depending upon the user’s permissions. Here, we describe a work currently in progress involving the design and implementation of an ambulatory syndromic surveillance data stream generator for the public health grid. The project is intended to broadly disseminate aggregate syndrome counts for general use by the public health community, to develop a model for sharing varying levels of identifiable data on cases depending upon the user, and to facilitate ongoing development of the grid.

 

Objective

To implement a syndromic surveillance system on CDC’s public health information grid, capable of securely distributing syndromic data streams ranging from aggregate case counts to individual case details, to appropriate personnel.

Submitted by elamb on
Description

The interpretation of aberrations detected by syndromic surveillance is critical for success, but poses challenges for local health departments who must conduct appropriate follow-up and confirm outbreaks. This paper describes the response of the Boston Public Health Commission (BPHC) to a cluster of emergency department (ED) visits in children detected by syndromic surveillance.

Submitted by elamb on
Description

With the widespread deployment of near real time population health monitoring, there is increasing focus on spatial cluster detection for identifying disease outbreaks. These spatial epidemiologic methods rely on knowledge of patient location to detect unusual clusters. In hospital administrative data, patient location is collected as home address but use of this precise location raises privacy concerns. Regional locations, such as center points of zip codes, have been deployed in many existing systems. However, this practice could distort the geographic properties of the raw data and affect subsequent spatial analyses. The impact of location error due to centroid assignment on the statistical analyses underlying these systems requires study.

 

Objective

To investigate the impact of address precision (exact latitude and longitude versus the center points of zip codes) on spatial cluster detection.

Submitted by elamb on
Description

Los Angeles County Department of Health Services is currently testing SaTScan’s space-time permutation model to assist in identifying aberrant illness activity in the community and determine it’s ability to detect outbreaks through analyzing real-time syndromic data. SaTScan could be useful especially due to its ability to provide geographic locations of outbreaks in the community.

 

Objective

To determine the usefulness of SaTScan as an outbreak and illness cluster detection tool in syndromic surveillance and to compare to a purely temporal CUSUM algorithm.

Submitted by elamb on
Description

Irregularly shaped cluster finders frequently end up with a solution consisting of a large zone z spreading through the map, which is merely a collection of the highest valued regions, but not a geographically sound cluster. One way to amenize this problem is to introduce penalty functions to avoid the excessive freedom of shape of z. The compactness penalty K(z) is a function used to reduce the scan value of irregularly shaped clusters, based on its geometric shape. Another penalty is the cohesion function C(z), a measure of the absence of weak links, or underpopulated regions within the cluster which disconnect it when removed. It was mentioned in that such weak links could be responsible for a diminished power of detection in cluster finder algorithms. Methods using those penalty functions present better performance. The geometric  compactness is not entirely satisfactory, although, because it has a tendency to avoid potentially interesting irregularly shaped clusters, acting as a low-pass filter. The cohesion function penalty method, although, has slightly less specificity.

 

Objective

We introduce a novel spatial scan algorithm for finding irregularly shaped disease clusters maximizing simultaneously two objectives: the regularity of shape and the internal cohesion of the cluster.

Submitted by elamb on
Description

We hypothesize that epidemics around their onset tend to affect primarily a well-defined subgroup of the overall population that is for some reason particularly susceptible. While the vulnerable cohort is often well described for many human diseases, this is not the case for instance when we wish to detect a novel computer virus. Clustering may be used to define the subgroups that will be tested for over-density of symptom occurrence. The clustering slowly changes in response to changes in the population.

 

Objective

This paper describes a method of detecting a slowlygrowing signal in a large population, based on clustering the population into subgroups more homogeneous in their infectious agent susceptibility.

Submitted by elamb on
Description

Bio-surveillance systems monitor multiple data streams (over-the-counter (OTC) sales, Emergency Department visits, etc.) to detect both natural disease outbreaks (e.g. influenza) and bio-terrorist attacks (e.g. anthrax re-lease). Many detection algorithms show impressive results under simulated environments, but the complex behavior of real-world data and high costs associated with processing false positives make it difficult to develop practical bio-surveillance systems. We believe that using expert knowledge from public health officials will help us to better understand the real-world data, improving our ability to distinguish actual disease outbreaks from non-outbreak patterns.

 

Objective

This paper describes the evolution of a bio-surveillance system that incorporates user feedback to improve system utility and usability. The system monitors national-level OTC pharmacy sales on a daily basis. We use fast spatio-temporal scan statistics to detect disease outbreaks.

Submitted by elamb on
Description

Estimation of representative spatial probabilities and expected counts from baseline data can cause problems in applying spatial scan statistics when observed events are sparse in a large percentage of the spatial zones (e.g., zip codes or census tracts) found in the data records. In applications of scan statistics to datasets with fine spatial resolution, such as census tracts or block groups, such highly skewed data distributions are likely to occur. If the spatial distribution estimation process does not handle the zones with low counts correctly, bias in the determination of statistically significant clusters will occur.

In any 8-week baseline period, some of the sparse-data zones have no counts at all. If ignored, the zero-count spatial zones will result in division by zero in the loglikelihood ratio evaluation. The traditional method of setting a floor on the expected counts in each spatial zone leads to a loss of sensitivity when the number of zero count zones is a significant fraction of all the zones. One alternative method for estimating spatial probabilities is to add one count to the sum of baseline counts in each spatial zone. This method has been used in a study of spatial cluster detection using medical 911 call data from San Diego County with good results. However, when this method was applied to data with a more highly skewed spatial distribution, issues were uncovered which led to this investigation of alternatives.

 

Objective

Modifications to spatial scan statistics are investigated for prospective cluster detection at fine-resolution with highly skewed spatial distributions having many spatial zones with very few cases. Several alternative methods for the estimation of spatial probabilities and expected counts from counts in a baseline data window are evaluated with the Poisson spatial scan statistic and the space-time permutation scan statistic using goodness-of-fit statistics and cluster rates to compare performance.

Submitted by elamb on