Skip to main content

Algorithm

Description

Our purpose was to develop an ROC curve for public health surveillance similar to those used in diagnostic testing. We developed syndrome surveillance algorithms with differing sensitivity and specificity in detecting the seasonal influenza (ILI) outbreak. For each algorithm we plotted: days to detect the event against the numbers of false positive alarms during the non-ILI season.

Submitted by elamb on
Description

Measures aimed at controlling epidemics of infectious diseases critically benefit from early outbreak recognition [1]. SSS seek early detection by focusing on pre-diagnostic symptoms that by themselves may not alarm clinicians. We have previously determined the performance of various Case Detector (CD) algorithms at finding cases of influenza-like illness (ILI) recorded in the electronic medical record of the Veterans Administration (VA) health system. In this work, we measure the impact of using CDs of increasing sensitivity but decreasing specificity on the time it takes a VA-based SSS to identify a modeled community-wide influenza outbreak. Objective This work uses a mathematical model of a plausible influenza epidemic to test the influence of different case-detection algorithms on the performance of a real-world syndromic surveillance system (SSS).

Submitted by elamb on
Description

The Ontario Telehealth Telephone Helpline (henceforth referred to as “Telehealth”) was implemented in Ontario in 2001. It is administered by Clinidata, a private contractor hired by the Ontario Ministry of Health and Long-Term Care, 24 hours a day, 7 days a week, including holidays, at no cost to the caller. The calls are answered by registered nurses in both official languages from four calling centres that use identical decision rules (algorithms) and store all call information into one centralized data repository. The calls are usually approximately 10-minutes, patient based, and are directed by a nurse-operated electronic clinical support system.

 

Objective

Following the lead established by the UK’s NHS Direct Syndromic Surveillance system as well as the SARS Report’s desire to “broaden the information collection capacity of Telehealth as a syndromic surveillance tool,” we are retrospectively evaluating the value of Ontario’s Telehealth’s health helpline as a syndromic surveillance system. To date, there have been no published descriptions of Telehealth. This article endeavours to address this lacuna.

Submitted by elamb on
Description

Time series analysis is very popular in syndromic surveillance. Mostly, public health officials track in the order of hundreds of disease models or univariate time series daily looking for signals of disease outbreaks. These time series can be aggregated counts of various syndromes, possibly different genders and age-groups. Recently, spatial scan algorithms find anomalous regions by aggregating zipcode level counts [1]. Usually, public health officials have a set of disease models (for e.g. fever or headache symptom in male adults is indicative of a particular disease). Based on the past experience public health officials track these disease models daily to find anomalies that might be indicative of disease outbreaks. A typical syndromic surveillance system these days will track in the order of 100-200 time series on daily basis using different univariate algorithms like CUSUM, moving average, EWMA, etc.

Let us consider a representative dataset of a state which has 100 zipcodes that monitors 10 syndromes among 3 age groups and 2 genders in emergency rooms. There are a total of 6,000 (100 x 10 x 3 x 2) distinct time series for a particular zipcode, syndrome, age-group and gender. This number already seems too high to monitor daily. Hence most syndromic systems only monitor state level aggregates for all syndromes or a few combinations of syndromes, gender and age-groups.

But most real world disease models are more complex and affect multiple syndromes, or multiple agegroups. We need to analyze more complex streams that aggregate multiple values in the attributes to mine more interesting patterns not seen otherwise. As an example, a massive search could reveal that recently senior female patients having fever and nausea have increased in the north eastern part of the state.

Objective

This paper shows how T-Cubes, a data structure that makes tracking millions of disease models simultaneously feasible, can be used to perform multivariate time series analysis using primitive univariate algorithms. Hence, the use of T-Cube in brute-force search helps identify stronger disease outbreak signals currently missed by the surveillance systems.

Submitted by elamb on
Description

Clinicians can pursue the clinical findings for specific patients until reaching a diagnosis in real time.  When using electronic ED complaints, one relies on symptoms volunteered by patients in the triage setting.  Patients seek emergency care at different stages of disease and there is scant information detailing how they respond when allowed only 2-3 complaints.  Our emergency department (ED) clinical data warehouse includes date, demographics, complaints, diagnosis, laboratory results, and disposition. We used a process similar to reverse engineering to augment our ability to detect chief complaints and test results consistent with MEE.  We started with the diagnosis of MEE and examined the chief complaints and diagnostic findings in patients diagnosed with MEE to develop expanded algorithms.

Objective

Our research questions were:

1.) could we use existing data to empirically improve our syndrome surveillance algorithms?

2.) Is it feasible to combine disparate data sources to detect the same event? We studied these questions using the meningoencephali-tis (MEE) syndrome and the West Nile Virus Chicago outbreak in 2002.

Submitted by elamb on
Description

Current state-of-the-art outbreak detection methods [1-3] combine spatial, temporal, and other covariate information from multiple data streams to detect emerging clusters of disease.  However, these approaches use fixed methods and models for analysis, and cannot improve their performance over time.   Here we consider two methods for overcoming this limitation, learning a prior over outbreak regions and learning outbreak models from user feedback, using the recently proposed multivariate Bayesian scan statistic (MBSS) framework [1]. Given a set of outbreak types {Ok}, set of space-time regions S, and the multivariate dataset D, MBSS computes the posterior probability Pr(H1(S, Ok) | D) of each outbreak type in each region, using Bayes’ Theorem to combine the prior probabilities Pr(H1(S, Ok)) and the data likelihoods Pr(D | H1(S, Ok)). Each outbreak type can have a different prior distribution over regions, as well as a different model for its effects on the multiple streams.  The set of outbreak types, as well as the region priors and outbreak models for each type, can be learned incrementally from labeled data or user feedback.

Objective

We argue that the incorporation of machine learning algorithms is a natural next step in the evolution and improvement of disease surveillance systems. We consider how learning can be incorporated into one recently proposed multivariate detection method, and demonstrate that learning can enable systems to substantially improve detection performance over time.

Submitted by elamb on
Description

Developing and evaluating outbreak detection is challenging for many reasons.  A central difficulty is that the data the detection algorithms are “trained” on are often relatively short historical samples and thus do not represent the full range of possible background scenarios.  Once developed, the same dearth of historical data complicates evaluation.  In systems where only a count of cases is provided, plausible synthetic data are relatively easy to generate.  When precise location data is available, simple approaches to generating hypothetical cases is more difficult.

Advances in epidemiological modeling have allowed for increasingly realistic simulations of infectious disease spread in highly detailed synthetic populations. These agent-based simulations are capable of better representing real-world stochastic disease transmission process and thus show highly variable results even under identical initial conditions. Due to their ability to mimic a wide range outcomes and more fully represent the unknowns in a system, models of this class have become increasingly used to help inform decisions about public policies about hypothetical situations (eg pandemic influenza [1]).  This characteristic also makes them a powerful tool to represent the processes that create surveillance information.

Objective

Developing and evaluating detection algorithms in noisy surveillance data is complicated by a lack of realistic noise, meaning the surveillance data stream when nothing of public health interest is happening. These jobs are even more complex when data on the precise location of cases is available. This paper describes a methodology for plausible generation of such noise using agent-based models of infectious disease transmission based on highly resolved dynamic social networks.

Submitted by elamb on
Description

The existing New York State Department of Health emergency department syndromic surveillance system has used patient’s chief complaint (CC) for assigning to six syndrome categories (Respiratory, Fever, Gastrointestinal, Neurological, Rash, Asthma). The sensitivity and specificity of the CC computer algorithms that assign CC to syndrome categories are determined by using chart review as the criterion standard. These analyses are used to refine the algorithm and to evaluate the effect of changes in the syndrome definitions. However, the chart review (CR) method is labor intensive and expensive. Using an automated ICD9 code-based assignment as a surrogate for chart review could offer a significant cost reduction in this process and allow us to survey a much larger sample of visits.

Submitted by elamb on
Description

Space-time detection of disease clusters can be a computationally intensive task which defies the real time constraint for disease surveillance. At the same time, it has been shown that using exact patient locations, instead of their representative administrative regions, result in higher detection rates and accuracy while improving upon detection timeliness. Using such higher spatial resolution data, however, further exacerbates the computational burden on real time surveillance. The critical need for real time processing and interpretation of data dictate highly responsive models that may be best achievable utilizing high performance computing platforms.

Objective

Space-time detection techniques often require computationally intense searching in both the time and space domains. We introduce a high performance computing technique for parallelizing a variation of space-time permutation scan statistic applied to real data of varying spatial resolutions and demonstrate the efficiency of the technique by comparing the parallelized performance under different spatial resolutions with that of serial computation.

Submitted by elamb on