Data Analytics

Presented December 13, 2018.

For public health surveillance, is machine learning worth the effort? What methods are relevant? Do you need special hardware? This talk was motivated by these and other questions asked by ISDS members. It will focus on providing practical—and slightly opinionated—advice about how to determine whether machine learning could be a useful tool for your problem.

Presenter

Presented November 27, 2018.

The space-time scan statistic is a powerful statistical tool for prospective disease surveillance. It searches over a set of spatio-temporal regions (each representing some spatial area S for the last k days), finding the most significant regions (S, k) by maximizing a likelihood ratio statistic, and computing p-values of these potential clusters by randomization.

The standard, "population-based" method assumes that, for each spatial location s_ion each day t, we have a population p^tiand a count (observed number of cases) cti. Then, under the null hypothesis of no clusters, we expect each count cti to be proportional to its population pti. We then search for regions (S, k) with disease rate (cases per unit population) significantly higher inside the region than outside. In the original space-time scan statistic, the populations are assumed to be given, and in, populations are estimated assuming independence of space and time.

Here we propose an alternative, "expectation-based" method, in which we infer the expected number of cases bti in each spatial location, based on the time series of previous counts. In this case, under the null hypothesis of no clusters, we expect each count cti to be equal to bti, rather than proportional to population. We then search for regions (S, k) with counts that are significantly higher than expected.

Objective

This paper describes a new class of space-time scan statistics designed for rapid detection of emerging disease clusters. We evaluate these methods on the task of prospective disease surveillance, and show that our methods consistently outperform the standard space-time scan statistic approach.

Submitted by Sandra.Gonzale… on Thu, 09/20/2018 - 18:57

There is limited closed-form statistical theory to indicate how well the prospective space-time permutation scan statistic will perform in the detection of localized excess illness activity. Instead, detection methods can be applied to simulated data to gain insight about detection performance. Such results are dependent on the way outbreaks are simulated and the nature of the background data. As an alternative, we explore an empirical approach in which the membership of a large health plan is used to represent a community and detection performance is assessed in samples from the larger group.

Objective

Our goal was to assess the impact of sentinel sample size and criteria for a signal on performance of daily prospective space-time permutation detection by comparing results in varying size random samples from a large health plan to results found in the full membership.

Submitted by Sandra.Gonzale… on Thu, 09/20/2018 - 13:44

Expectation-based scan statistics extend the traditional spatial scan statistic approach by using historical data to infer the expected counts for each spatial location, then detecting regions with higher than expected counts. Here we consider five recently proposed expectation-based statistics: the expectation-based Poisson (EBP), expectation-based Gaussian (EBG), population-based Poisson (PBP), populationbased Gaussian (PBG), and robust Bernoulli-Poisson (RBP) methods. We also consider five different time series analysis methods used to predict the expected counts (including the Holt-Winters method and moving averages optionally adjusted for day of week and seasonality), giving a total of 25 methods to compare. All of these methods are detailed in the full paper.

Objective

We present a systematic empirical comparison of five recently proposed expectation-based scan statistics, in order to determine which methods are most successful for which spatial disease surveillance tasks.

Submitted by Sandra.Gonzale… on Thu, 09/20/2018 - 13:16

ARIMA models use past values (autoregressive terms) and past forecasting errors (moving average terms) to generate future forecasts, making it a potential candidate method for modeling citywide time series of syndromic data [1]. While past research supports the use of ARIMA modeling as a detection algorithm in syndromic surveillance [2], there has been little evaluation of an ARIMA model's prospective outbreak detection capabilities. We built an ARIMA model to prospectively detect simulated outbreaks in ED syndromic data. This method is one of eight being formally evaluated as part of a grant from the Alfred P. Sloan Foundation.

Objective

To evaluate seasonal autoregressive integrated moving average (ARIMA) models for prospective analysis of New York City (NYC) emergency department (ED) syndromic data.