Skip to main content

Data Analytics

Description

Spatial Scan Statistics [1] usually assume Poisson or Binomial distributed data, which is not adequate in many disease surveillance scenarios. For example, small areas distant from hospitals may exhibit a smaller number of cases than expected in those simple models. Also, underreporting may occur in underdeveloped regions, due to inefficient data collection or the difficulty to access remote sites. Those factors generate excess zero case counts or overdispersion, inducing a violation of the statistical model and also increasing the type I error (false alarms). Overdispersion occurs when data variance is greater than the predicted by the used model. To accommodate it, an extra parameter must be included; in the Poisson model, one makes the variance equal to the mean.

Objective

To propose a more realistic model for disease cluster detection, through a modification of the spatial scan statistic to account simultaneously for inflated zeros and overdispersion.

Submitted by uysz on
Description

There has been much research on statistical methods of prospective outbreak detection that are aimed at identifying unusual clusters of one syndrome or disease, and some work on multivariate surveillance methods. In England and Wales, automated laboratory surveillance of infectious diseases has been undertaken since the early 1990’s. The statistical methodology of this automated system is described in. However, there has been little research on outbreak detection methods that are suited to large, multiple surveillance systems involving thousands of different organisms.

 

Objective

To look at the diversity of the patterns displayed by a range of organisms, and to seek a simple family of models that adequately describes all organisms, rather than a well-fitting model for any particular organism.

Submitted by hparton on
Description

The National Collaborative for Bio-Preparedness (NCB-Prepared) was established in 2010 to create a biosurveillance resource to enhance situational awareness and emergency preparedness. This jointinstitutional effort has drawn on expertise from the University of North Carolina- Chapel Hill, North Carolina State University, and SAS Institute, leveraging North Carolina’s role as a leader in syndromic surveillance, technology development and health data standards. As an unprecedented public/private alliance, they bring the flexibility of the private sector to support the public sector. The project has developed a functioning prototype system for multiple states that will be scaled and made more robust for national adoption.

Objective:

Demonstrate the functionality of the National Collaborative for Bio-Preparedness system.

 

Submitted by Magou on
Description

Multiple data sources are essential to provide reliable information regarding the emergence of potential health threats, compared to single source methods [1,2]. Spatial Scan Statistics have been adapted to analyze multivariate data sources [1]. In this context, only ad hoc procedures have been devised to address the problem of selecting the most likely cluster and computing its significance. A multi-objective scan was proposed to detect clusters for a single data source [3].

Objective:

To incorporate information from multiple data streams of disease surveillance to achieve more coherent spatial cluster detection using statistical tools from multi-criteria analysis.

Submitted by Magou on
Description

Los Angeles County’s (LAC) early event detection system captures over 60% of total ED visits, as well as 800 to 1,000 emergency dispatch calls from Los Angeles City Fire (LACF) daily. Both ED visits and EDC calls are classified into syndrome categories, and then analyzed for aberrations in count and spatial distribution. During periods of high temperatures, a heat report is generated and sent to stakeholders upon request. We describe how syndromic surveillance serves as an important near real-time, population-based instrument for measuring the impact of heat waves on emergency service utilization in LAC.

Objective: 

To assess current indicators for situational awareness during heat waves derived from electronic emergency department (ED) and 911 emergency dispatch call (EDC) center data.

 

Submitted by Magou on
Description

Previous studies have documented significant lags in official reporting of outbreaks compared to unofficial reporting (1,2). MoH+ provides an additional tool to analyze this issue, with the unique advantage of actively gathering a wide range of streamlined official communication, including formal publications, online press releases, and social media updates.

Objective:

To introduce MoH+, HealthMap’s (HM) real-time feed of official government sources, and demonstrate its utility in comparing the timeliness of outbreak reporting between official and unofficial sources.

 

Submitted by Magou on
Description

Population surges or large events may cause shift of data collected by biosurveillance systems [1]. For example, the Cherry Blossom Festival brings hundreds of thousands of people to DC every year, which results in simultaneous elevations in multiple data streams (Fig. 1). In this paper, we propose an MGD model to accommodate the needs of dealing with baseline shifts.

Objective:

Outbreak detection algorithms monitoring only disease-relevant data streams may be prone to false alarms due to baseline shifts. In this paper, we propose a Multinomial-Generalized-Dirichlet (MGD) model to adjust for baseline shifts.

 

Submitted by Magou on
Description

Detection and response to seasonal outbreaks of endemic diseases provides an excellent testbed for quantitative bio-surveillance. As a case study we focus on annual influenza outbreaks. To incorporate observed year-over-year variation in flu incidence cases and timing of outbreaks, we analyze a stochastic compartmental SIS model that includes seasonal forcing by a latent Markovian factor. Epidemic detection then consists in identifying the presence of the environmental factor (“high” flu season), as well as estimation of the epidemic parameters, such as contact and recovery rates.

Objective

Development of a sequential Bayesian methodology for inference and detection of seasonal infectious disease epidemics.

Submitted by ynwang@ufl.edu on
Description

Modern biosurveillance data contains thousands of unique time series defined across various categorical dimensions (zipcode, age groups, hospitals). Many algorithms are overly specific (tracking each time series independently would often miss early signs of outbreaks), or too general (detections at state level may lack specificity reflective of the actual process at hand). Disease outbreaks often impact multiple values (disjunctive sets of zipcodes, hospitals, multiple age groups) along subsets of multiple dimensions of data. It is not uncommon to see outbreaks of different diseases occurring simultaneously (e.g. food poisoning and flu) making it hard to detect and characterize the individual events. We proposed Disjunctive Anomaly Detection (DAD) algorithm to efficiently search across millions of potential clusters defined as conjunctions over dimensions and disjunctions over values along each dimension. An example anomalous cluster detectable by DAD may identify zipcode = {z1 or z2 or z3 or z5} and age_group = {child or senior} to show unusual activity in the aggregate. Such conjunctive-disjunctive language of cluster definitions enables finding realworld outbreaks that are often missed by other state-of-art algorithms like What’s Strange About Recent Events (WSARE) or Large Average Submatrix (LAS). DAD is able to identify multiple interesting clusters simultaneously and better explain complex anomalies in data than those alternatives.

Objective

Disjunctive anomaly detection (DAD) algorithm can efficiently search across multidimensional biosurveillance data to find multiple simultaneously occurring (in time) and overlapping (across different data dimensions) anomalous clusters. We introduce extensions of DAD to handle rich cluster interactions and diverse data distributions

Submitted by ynwang@ufl.edu on

Presented May 24, 2018.

Mauricio Santillana, MS, PhD describes machine learning methodologies that leverage Internet-based information from search engines, twitter microblogs, crowd-sourced disease surveillance systems, electronic medical records, and historical synchronicities in disease activity across spatial regions, to successfully monitor and forecast disease outbreaks in multiple locations around the globe in near real-time.

Presenter