Data Analytics

Syndromic surveillance uses syndrome (a specific collection of clinical symptoms) data that are monitored as indicators of a potential disease outbreak. Advanced surveillance systems have been implemented globally for early detection of infectious disease outbreaks and bioterrorist attacks. However, such systems are often confronted with the challenges such as (i) incorporate situation specific characteristics such as covariate information for certain diseases; (ii) accommodate the spatial and temporal dynamics of the disease; and (iii) provide analysis and visualization tools to help detect unexpected patterns. New methods that improve the overall detection capabilities of these systems while also minimizing the number of false positives can have a broad social impact.

Referenced File

A_Spatio_Temporal_Bayesian_Model_For_Syndromic_Surveillance_Properties_And_Model_Performance.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

The Veterans Health Administration (VHA) uses the Electronic Surveillance System for the Early Notification of Community-based Epidemics to detect disease outbreaks and other health-related events earlier than other forms of surveillance. Although Veterans may use any VHA facility in the world, the strongest predictor of which health care facility is accessed is geographic proximity to the patient's residence. A number of outbreaks have occurred in the Veteran population when geographically separate groups convened in a single location for professional or social events. One classic example was the initial Legionnaire's disease outbreak, identified among participants at the Legionnaire's convention in Philadelphia in the late 1970s. Numerous events involving travel by large Veteran (and employee) populations are scheduled each year.

Objective

To develop an algorithm to identify disease outbreaks by detecting aberrantly large proportions of patient residential ZIP codes outside a health care facility catchment area.

Referenced File

Another_Type_Of_Cluster_Monitoring_Detection_Of_Groups_Of_Anomalous_Patient_Residence_Locations.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

An expanded ambulatory health record, the Comprehensive Ambulatory Patient Encounter Record (CAPER) will provide multiple types of data for use in DoD ESSENCE. A new type of data not previously available is the Reason for Visit (ROV), a free-text field analogous to the Chief Complaint (CC). Intake personnel ask patients why they have come to the clinic and record their responses. Traditionally, the text should reflect the patient's actual statement. In reality the staff often "translates" the statement and adds jargon. Text parsing maps key words or phrases to specific syndromes. Challenges exist given the vagaries of the English language and local idiomatic usage. Still, CC analysis by text parsing has been successful in civilian settings [1]. However, it was necessary to modify the parsing to reflect the characteristics of CAPER data and of the covered population. For example, consider the Shock/Coma syndrome. Loss of consciousness is relatively common in military settings due to prolonged standing, exertion in hot weather with dehydration, etc., whereas the main concern is shock/coma due to infectious causes. To reduce false positive mappings the parser now excludes terms such as syncope, fainting, electric shock, road march, parade formation, immunization, blood draw, diabetes, hypoglycemic, etc.

Objective

Rather than rely on diagnostic codes as the core data source for alert detection, this project sought to develop a Chief Complaint (CC) text parser to use in the U.S. Department of Defense (DoD) version of the Electronic Surveillance System for Early Notification of Community-Based Epidemics (ESSENCE), thereby providing an alternate evidence source. A secondary objective was to compare the diagnostic and CC data sources for complementarity.

Referenced File

Tuning_A_Chief_Complaint_Text_Parser_For_Use_In_Dod_Essence.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Parallel surveillance, separate monitoring of each continuous series, has been widely used for multivariate surveillance, however, it has severe limitations. Firstly, it faces the problem of multiplicity from multiple testing. Also, the ignorance of CBS reduces the performance of outbreak detection if data are truly correlated. Finally, since health data are normally dependent over time, CWS is another issue which should be taken into account. Sufficient reduction methods are used to reduce the dimensionality of a simple multivariate series to a univariate series which has been proved to be sufficient for monitoring a mean shift in multivariate surveillance (1 and 2). Having considered the sufficiency property and the nature of health data, we propose a sufficient reduction method for detecting a mean shift in multivariate series where CWS and CBS are taken into account.

Objective

To reduce the dimensionality of p-dimensional multivariate series to a univariate series by deriving sufficient statistics which take into account all the information in the original data, correlation within series (CWS) and correlation between series (CBS).

Referenced File

Sufficient_Reduction_Methods_For_Multivariate_Surveillance.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Spatial cluster analysis is considered an important technique for the elucidation of disease causes and epidemiological surveillance. Kulldorff's spatial scan statistic, defined as a likelihood ratio, is the usual measure of the strength of geographic clusters. The circular scan, a particular case of the spatial scan statistic, is currently the most used tool for the detection and inference of spatial clusters of disease.

Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. We propose a modification to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found.

Objective

We propose a modification to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found.

Referenced File

A_New_Interpretation_Of_The_Inflrence_Test_For_The_Spatial_Scan_Statistic.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

The spatial scan statistic proposed by Kulldorff has been widely used in spatial disease surveillance and other spatial cluster detection applications. In one of its versions, such scan statistic was developed for inhomogeneous Poisson process. However, the underlying Poisson process may not be suitable to properly model the data. Particularly, for diseases with very low prevalence, the number of cases may be very low and zero excess may cause bias in the inferences.

Lambert introduced the zero-inflated Poisson (ZIP) regression model to account for excess zeros in counts of manufacturing defects. The use of such model has been applied to innumerous situations. Count data, like contingency tables, often contain cells having zero counts. If a given cell has a positive probability associated to it, a zero count is called a sampling zero. However, a zero for a cell in which it is theoretically impossible to have observations is called structural zero.

Objective

The scan statistic is widely used in spatial cluster detection applications of inhomogeneous Poisson processes. However, real data may present substantial departure from the underlying Poisson process. One of the possible departures has to do with zero excess. Some studies point out that when applied to zero-inflated data the spatial scan statistic may produce biased inferences. Particularly, Gomez-Rubio and Lopez-Quılez argue that Kulldorff’s scan statistic may not be suitable for very rare diseases problems. In this work we develop a closed-form scan statistic for cluster detection of spatial count data with zero excess.

Referenced File

A_Zero_Inflated_Poisson_Based_Spatial_Scan_Statistic.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Ordering-based approaches [1,2] and quadtrees [3] have been introduced recently to detect multiple spatial clusters in point event datasets. The Autonomous Leaves Graph (ALG) [4] is an efficient graph-based data structure to handle the communication of cells in discrete domains. This adaptive data structure was favorably compared to common tree-based data structures (quad-trees). An additional feature of the ALG data structure is the total ordering of the component cells through a modified adaptive Hilbert curve, which links sequentially the cells (the orange curve in the example of Figure 1).

Objective

To detect multiple significant spatial clusters of disease in case-control point event data using the Autonomous Leaves Graph and the spatial scan statistic.

Referenced File

Spatial_Cluster_Detection_In_Case_Control_Datasets_With_The_Autonomous_Leaves_Graph.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Data obtained through public health surveillance systems are used to detect and locate clusters of cases of diseases in space-time, which may indicate the occurrence of an outbreak or an epidemic. We present a methodology based on adaptive likelihood ratios to compare the null hypothesis (no outbreaks) against the alternative hypothesis (presence of an emerging disease cluster).

Objective

Disease surveillance is based on methodologies to detect outbreaks as soon as possible, given an acceptable false alarm rate. We present an adaptive likelihood ratio method based on the properties of the martingale structure which allows the determination of an upper limit for the false alarm rate.

Referenced File

Adaptive_Liklihood_Ratio_For_The_Detection_Of_Space_Time_Disease_Clusters.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

The ability to rapidly detect any substantial change in disease incidence is of critical importance to facilitate timely public health response and, consequently, to reduce undue morbidity and mortality. Unlike testing methods (1, 2), modeling for spatio-temporal disease surveillance is relatively recent, and this is a very active area of statistical research (3). Models describing the behavior of diseases in space and time allow covariate effects to be estimated and provide better insight into etiology, spread, prediction and control. Most spatio-temporal models have been developed for retrospective analyses of complete data sets (4). However, data in public health registries accumulate over time and sequential analyses of all the data collected so far is a key concept to early detection of disease outbreaks. When the analysis of spatially aggregated data on multiple diseases is of interest, the use of multivariate models accounting for correlations across both diseases and locations may provide a better description of the data and enhance the comprehension of disease dynamics.

Objective

This study deals with the development of statistical methodology for on-line surveillance of small area disease data in the form of counts. As surveillance systems are often focused on more than one disease within a predefined area, we extend the surveillance procedure to the analysis of multiple diseases. The multivariate approach allows for inclusion of correlation across diseases and, consequently, increases the outbreak detection capability of the methodology

Referenced File

Online_Surveillance_Of_Multivariate_Small_Area_Disease_Data_A_Bayesian_Approach.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

The spatial scan statistic [1] detects significant spatial clusters of disease by maximizing a likelihood ratio statistic F(S) over a large set of spatial regions, typically constrained by shape. The fast localized scan [2] enables scalable detection of irregular clusters by searching over proximity-constrained subsets of locations, using the linear-time subset scanning (LTSS) property to efficiently search over all subsets of each location and its k - 1 nearest neighbors. However, for a fixed neighborhood size k, each of the 2[k] subsets are considered equally likely, and thus the fast localized scan does not take into account the spatial attributes of a subset. Hence, we wish to extend the fast localized scan by incorporating soft constraints which give preference to spatially compact clusters while still considering all subsets within a given neighborhood.

Objective

We present a new method for efficiently and accurately detecting irregularly-shaped outbreaks by incorporating "soft" constraints, rewarding spatial compactness and penalizing sparse regions.

Referenced File

Scalable_Detection_Of_Irregular_Disease_Clusters_Using_Soft_Compactness_Constraints.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Subscribe to Data Analytics