Skip to main content

Cluster Detection

Attached is a word document with multiple syndromes that we have found useful during the coagulopathy cluster situation. Most used queries are highlighted in yellow.

These queries were created in response to marijuana, particularly synthetic marijuana, tainted with anticoagulants often found in rodent poisons.

 

 

Submitted by Anonymous on
Description

Heuristics to detect irregularly shaped spatial clusters were reviewed recently. The spatial scan statistic is a widely used measure of the strength of clusters. However, other measures may also be useful, such as the geometric compactness penalty, the non-connectivity penalty and other measures based on graph topology and weak links.5,6 Those penalties p(z) are often coupled with the spatial scan statistic T(z), employing either the multiplicative formula maximization maxz T(z) ! p(z) or a multiobjective optimization procedure maxz(T(z), p(z)),3,6 or even a combination of both approaches. The geometric penalty of a cluster z is defined as the quotient of the area of z by the area of the circle, with the same perimeter as the convex hull of z, thus penalizing more the less rounded clusters. Now, let V and E be the vertices and edges sets, respectively, of the graph Gz(V, E) associated with the potential cluster z. The non-connectivity penalty y(z) is a function of the number of edges e(z) and the number of nodes n(z) of Gz(V, E), defined as y(z) ¼ e(z)/3[n(z)#2]. The less interconnected tree-shaped clusters are the most penalized. However, none of those two measures includes the effect of the individual populations.

Objective

Irregularly shaped clusters in maps divided into regions are very common in disease surveillance. However, they are difficult to delineate, and usually we notice a loss of power of detection. Several penalty measures for the excessive freedom of shape have been proposed to attack this problem, involving the geometry and graph topology of clusters. We present a novel topological measure that displays better performance in numerical tests.

Submitted by uysz on
Description

The H1N1 outbreak in the spring of 2009 in NYC originated in a school in Queens before spreading to others nearby. Active surveillance established epidemiological links between students at the school and new cases at other schools through household connections. Such findings suggest that spatial cluster detection methods should be useful for identifying new influenza outbreaks in school-aged children. As school-to-school transmission should occur between those with high levels of interaction, existing cluster detection methods can be improved by accurately characterizing these links. We establish a prospective surveillance system that detects outbreaks in NYC schools using a flexible spatial scan statistic (FlexScan), with clusters identified on a network constructed from student interactions.

Objective

To improve cluster detection of influenza-like illness within New York City (NYC) public schools using school health and absenteeism data by characterizing the degree to which schools interact.

Submitted by Magou on
Description

Prior work demonstrates the extent to which sampling strategies reduce the power to detect clusters.1 Additionally, the power to detect clusters can vary across space.2 A third, unexplored, effect is how much the sample size impacts the power of spatial cluster detection methods. This research examines this effect.

Objective

In syndromic surveillance settings, the use of samples may be unavoidable, as when only a part of the population reports flu-like symptoms to their physician. Taking samples from a complete population weakens the power of spatial cluster detection methods.1 This research examines the effectiveness of different sampling strategies and sample sizes on the power of cluster detection methods.

Submitted by Magou on
Description

Quantifying the spatial-temporal diffusion of diseases such as seasonal influenza is difficult at the urban scale for a variety of reasons including the low specificity of the extant data, the heterogenous nature of healthcare seeking behavior and the speed with which diseases spread throughout the city. Nevertheless, the New York City Department of Health and Mental Hygiene’s syndromic surveillance system attempts to detect spatial clusters resulting from outbreaks of influenza. The success of such systems is dependent on there being a discernible spatial-temporal pattern of disease at the neighborhood (sub-urban) scale.

We explore ways to extend global methods such as serfling regression that estimate excess burdens during outbreak periods to characterize these patterns. Traditionally, these methods are aggregated at the national or regional scale and are used only to estimate the total burden of a disease outbreak period. Our extension characterizes the spatial-temporal pattern at the neighborhood scale by day. We then compare our characterizations to prospective spatial cluster detection efforts of our syndromic surveillance system and to demographic covariates.

 

Objective

To develop a novel method to characterize the spatial-temporal pattern of seasonal influenza and then use this characterization to: (1) inform the spatial cluster detection efforts of syndromic surveillance, (2) explore the relationship of spatial-temporal patterns and covariates and (3) inform conclusions made about the burden of seasonal and pandemic influenza. 

Submitted by hparton on
Description

Traditionally, surveillance systems for dengue and other infectious diseases locate each individual case by home address, aggregate these locations to small areas, and monitor the number of cases in each area over time. However, human mobility plays a key role in dengue transmission, especially due to the mosquito day-biting habit, and relying solely on individuals' residential address as a proxy for dengue infection ignores a multitude of exposures that individuals are subjected to during their daily routines. Residence locations may be a poor indicator of the actual regions where humans and infected vectors tend to interact more, and hence, provide little information for dengue prevention. The increasing availability of geolocated data in online platforms such as Twitter offers a unique opportunity: in addition to identifying diseased individuals based on the textual content, we can also follow them in time and space as they move on the map and model their movement patterns. Comparing the observed mobility patterns for case and control individuals can provide relevant information to detect localized regions with higher risk of dengue infection. Incorporating the mobility of individuals into risk modeling requires the development of new spatial models that can cope with this type of data in a principled way and efficient algorithms to deal with the ever-growing amount of data. We propose new spatial scan models and exploit geo-located data from Twitter to detect geographic clusters of dengue infection risk.

Objective: We develop new spatial scan models that use individuals' movement data, rather than a single location per individual, in order to identify areas with a high relative risk of infection by dengue disease.

Submitted by elamb on
Description

Consider the most likely disease cluster produced by any given method, like SaTScan,  for the detection and inference of spatial clusters in a map divided into areas; if this cluster is found to be statistically significant, what could be said of the external areas adjacent to the cluster? Do we have enough information to exclude them from a health program of prevention? Do all the areas inside the cluster have the same importance from a practitioner perspective? How to access quantitatively the risk of those regions, given that the information we have (cases count) is also subject to variation in our statistical modeling? A few papers have tackled these questions recently; produces confidence intervals for the risk in every area, which are compared with the risks inside the most likely cluster. There exists a crescent demand of interactive software for the visualization of spatial clusters. A technique was developed to visualize relative risk and statistical significance simultaneously.

Objective

Given an aggregated-area map with disease cases data, we propose a criterion to measure the plausibility of each area in the map of being part of a possible localized anomaly.

Submitted by uysz on
Description

Event-based biosurveillance is a practice of monitoring diverse information sources for the detection of events pertaining to human health. Online documents, such as news articles on the Internet, have commonly been the primary information sources in event-based biosurveillance. With the large number of online publications as well as with the language diversity, thorough monitoring of online documents is challenging. Automated document classification is an important step toward efficient event-based biosurveillance. In Project Argus, a biosurveillance program hosted at Georgetown University Medical Center, supervised and unsupervised approaches to document classification are considered for event-based biosurveillance.

 

Objective

This paper describes ongoing efforts in enhancing automated document classification toward efficient event-based biosurveillance. 

Submitted by hparton on
Description

Syndromic surveillance typically involves collecting time-stamped transactional data, such as patient triage or examination records or pharmacy sales. Such records usually span multiple categorical features, such as location, age group, gender, symptoms, chief complaints, drug category and so on. The key analytic objective to identify potential disease clusters in such data observed recently (for example during last one week) as compared with baseline (for example derived from data observed over previous few months). In real world scenarios, a disease outbreak can impact any subset of categorical dimensions and any subset of values along each categorical dimension. As evaluating all possible outbreak hypotheses can be computationally challenging, popular state-of-the-art algorithms either limit the scope of search to exclusively conjunctive definitions or focus only on detecting spatially co-located clusters for disease outbreak detection. Further, it is also common to see multiple disease outbreaks happening simultaneously and affecting overlapping subsets of dimensions and values. Most such algorithms focus on finding just one most significant anomalous cluster corresponding to a possible disease outbreak, and ignore the possibility of a concurrent emergence of additional clusters.

 

Objective

We present Disjunctive Anomaly Detection (DAD), a novel algorithm to detect multiple overlapping anomalous clusters in large sets of categorical time series data. We compare performance of DAD and What’s Strange About Recent Events on a disease surveillance data from Sri Lanka Ministry of Health.

Submitted by hparton on
Description

The Centers for Disease Control and Prevention's (CDC) Emerging Infections Program (EIP) monitors and studies many infectious diseases, including influenza. In 10 states in the US, information is collected for hospitalized patients with laboratory-confirmed influenza. Data are extracted manually by EIP personnel at each site, stripped of personal identifiers and sent to the CDC. The anonymized data are received and reviewed for consistency at the CDC before they are incorporated into further analyses. This includes identifying errors, which are used for classification.

 

Objective

Introducing data quality checks can be used to generate feedback that remediates and/or reduces error generation at the source. In this report, we introduce a classification of errors generated as part of the data collection process for the EIP’s Influenza Hospitalization Surveillance Project at the CDC. We also describe a set of mechanisms intended to minimize and correct these errors via feedback, with the collection sites.

Submitted by hparton on