Multivariate

Recent years' informatics advances have increased availability of various sources of health-monitoring information to agencies responsible for disease surveillance. These sources differ in clinical relevance and reliability, and range from streaming statistical indicator evidence to outbreak reports. Information-gathering advances have outpaced the capability to combine the disparate evidence for routine decision support. In view of the need for analytical tools to manage an increasingly complex data environment, a fusion module based on Bayesian networks (BN) was developed in 2011 for the Dept. of Defense (DoD) Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE). In 2012 this module was expanded with syndromic queries, data-sensitive algorithm selection, and hierarchical fusion network training [1]. Subsequent efforts have produced a full fusion-enabled version of ESSENCE for beta testing, further upgrades, and a software specification for live DoD integration. Beta test reviewers cited the reduced alert burden and the detailed evidence underlying each alert. However, only 39 reported historical events were available for training and calibration of 3 networks designed for fusion of influenza-like-illness, gastrointestinal, and fever syndrome categories. The current presentation describes advances to formalize the network training, calibrate the component alerting algorithms and decision nodes together for each BN, and implement a validation strategy aimed at both the ESSENCE public health user and machine learning communities.

Objective

This presentation aims to reduce the gap between multivariate analytic surveillance tools and public health acceptance and utility. We developed procedures to verify, calibrate, and validate an evidence fusion capability based on a combination of clinical and syndromic indicators and limited knowledge of historical outbreak events.

Submitted by elamb on Thu, 05/02/2019 - 20:43

Parallel surveillance, separate monitoring of each continuous series, has been widely used for multivariate surveillance, however, it has severe limitations. Firstly, it faces the problem of multiplicity from multiple testing. Also, the ignorance of CBS reduces the performance of outbreak detection if data are truly correlated. Finally, since health data are normally dependent over time, CWS is another issue which should be taken into account. Sufficient reduction methods are used to reduce the dimensionality of a simple multivariate series to a univariate series which has been proved to be sufficient for monitoring a mean shift in multivariate surveillance (1 and 2). Having considered the sufficiency property and the nature of health data, we propose a sufficient reduction method for detecting a mean shift in multivariate series where CWS and CBS are taken into account.

Objective

To reduce the dimensionality of p-dimensional multivariate series to a univariate series by deriving sufficient statistics which take into account all the information in the original data, correlation within series (CWS) and correlation between series (CBS).

Referenced File

Sufficient_Reduction_Methods_For_Multivariate_Surveillance.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Noroviruses are the single most common cause of epidemic, non-bacterial gastroenteritis worldwide. NoVs cause an estimated 68-80% of gastroenteritis outbreaks in industrialized countries and possibly more in developing countries.

Objective

The purpose of this study was to identify global epidemiologic trends in human norovirus (NoV) outbreaks by transmission route and setting, and describe relationships between these characteristics, attack rates and the occurrence of genogroup I (GI) or genogroup II (GII) strains in outbreaks.

Referenced File

Risk_Factors_For_Norovirus_Outbreaks_Associated_With_Attack_Rate_And_Genogroup.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Much progress has been made on the development of novel systems for influenza surveillance, or explored the choices of algorithms for detecting the start of a peak season. The use of multiple streams of surveillance data has been shown to improve performance but few studies have explored its use in situational awareness to quantify level or trend of disease activity. In this study we propose a multivariate statistical approach which describes overall influenza activity and handles interrupted or drop-in surveillance systems.

Objective

This paper describes the use of multiple influenza surveillance data for situational awareness of influenza activity.

Referenced File

Multistream_Influenza_Surveillance_For_Situational_Awareness.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

INDICATOR is a multi-stream open source platform for biosurveillance and outbreak detection, currently focused on Champaign County in Illinois. It has been in production since 2008 and is currently receiving data from emergency department, patient advisory nurse, outpatient convenient care clinic, school absenteeism, animal control, and weather sources. Historical data from some of these sources goes back to 2006.

Objective

To examine the correlation between different types of surveillance signals and climate information obtained from a well-defined geographic area.

Referenced File

Analysis_Of_Five_Years_Of_Multistream_Surveillance_And_Weather_Data_In_Champaign_County.pdf

Submitted by elamb on Thu, 05/02/2019 - 08:52

Time series analysis is very popular in syndromic surveillance. Mostly, public health officials track in the order of hundreds of disease models or univariate time series daily looking for signals of disease outbreaks. These time series can be aggregated counts of various syndromes, possibly different genders and age-groups. Recently, spatial scan algorithms find anomalous regions by aggregating zipcode level counts [1]. Usually, public health officials have a set of disease models (for e.g. fever or headache symptom in male adults is indicative of a particular disease). Based on the past experience public health officials track these disease models daily to find anomalies that might be indicative of disease outbreaks. A typical syndromic surveillance system these days will track in the order of 100-200 time series on daily basis using different univariate algorithms like CUSUM, moving average, EWMA, etc.

Let us consider a representative dataset of a state which has 100 zipcodes that monitors 10 syndromes among 3 age groups and 2 genders in emergency rooms. There are a total of 6,000 (100 x 10 x 3 x 2) distinct time series for a particular zipcode, syndrome, age-group and gender. This number already seems too high to monitor daily. Hence most syndromic systems only monitor state level aggregates for all syndromes or a few combinations of syndromes, gender and age-groups.

But most real world disease models are more complex and affect multiple syndromes, or multiple agegroups. We need to analyze more complex streams that aggregate multiple values in the attributes to mine more interesting patterns not seen otherwise. As an example, a massive search could reveal that recently senior female patients having fever and nausea have increased in the north eastern part of the state.

Objective

This paper shows how T-Cubes, a data structure that makes tracking millions of disease models simultaneously feasible, can be used to perform multivariate time series analysis using primitive univariate algorithms. Hence, the use of T-Cube in brute-force search helps identify stronger disease outbreak signals currently missed by the surveillance systems.

Referenced File

Multivariate_Time_Series_Analyses_Using_Primitive_Univariate_Algorithms.pdf