Skip to main content

Multivariate Time Series Analyses Using Primitive Univariate Algorithms

Description

Time series analysis is very popular in syndromic surveillance. Mostly, public health officials track in the order of hundreds of disease models or univariate time series daily looking for signals of disease outbreaks. These time series can be aggregated counts of various syndromes, possibly different genders and age-groups. Recently, spatial scan algorithms find anomalous regions by aggregating zipcode level counts [1]. Usually, public health officials have a set of disease models (for e.g. fever or headache symptom in male adults is indicative of a particular disease). Based on the past experience public health officials track these disease models daily to find anomalies that might be indicative of disease outbreaks. A typical syndromic surveillance system these days will track in the order of 100-200 time series on daily basis using different univariate algorithms like CUSUM, moving average, EWMA, etc.

Let us consider a representative dataset of a state which has 100 zipcodes that monitors 10 syndromes among 3 age groups and 2 genders in emergency rooms. There are a total of 6,000 (100 x 10 x 3 x 2) distinct time series for a particular zipcode, syndrome, age-group and gender. This number already seems too high to monitor daily. Hence most syndromic systems only monitor state level aggregates for all syndromes or a few combinations of syndromes, gender and age-groups.

But most real world disease models are more complex and affect multiple syndromes, or multiple agegroups. We need to analyze more complex streams that aggregate multiple values in the attributes to mine more interesting patterns not seen otherwise. As an example, a massive search could reveal that recently senior female patients having fever and nausea have increased in the north eastern part of the state.

Objective

This paper shows how T-Cubes, a data structure that makes tracking millions of disease models simultaneously feasible, can be used to perform multivariate time series analysis using primitive univariate algorithms. Hence, the use of T-Cube in brute-force search helps identify stronger disease outbreak signals currently missed by the surveillance systems.

Submitted by elamb on