Skip to main content

An Algorithm for Early Outbreak Detection in Multiple Data Streams

Description

Current biosurveillance systems run multiple univariate statistical process control (SPC) charts to detect increases in multiple data streams. The method of using multiple univariate SPC charts is easy to implement and easy to interpret. By examining alarms from each control chart, it is easy to identify which data stream is causing the alarm. However, testing multiple data streams simultaneously can lead to multiple testing problems that inflate the combined false alarm probability. Although methods such as the Bonferroni correction can be applied to address the multiple testing problem by lowering the false alarm probability in each control chart, these approaches can be extremely conservative. Biosurveillance systems often make use of variations of popular univariate SPC charts such as the Shewart Chart, the cumulative sum chart (CUSUM), and the exponentially weighted moving average chart (EWMA). In these control charts an alarm is signaled when the charting statistic exceeds a pre-defined control limit. With the standard SPC charts, the false alarm rate is specified using the in-control average run length (ARL0). If multiple charts are used, the resulting multiple testing problem is often addressed using family-wise error rate (FWER) based methods that are known to be conservative - for error control. A new temporal method is proposed for early event detection in multiple data streams. The proposed method uses p-values instead of the control limits that are commonly used with standard SPC charts. In addition, the proposed method uses false discovery rate (FDR) for error control over the standard ARL0 used with conventional SPC charts. With the use of FDR for error control, the proposed method makes use of more powerful and up-to-date procedures for handling the multiple testing problem than FWER-based methods.

Objective: To propose a computationally simple, fast, and reliable temporal method for early event detection in multiple data streams.

Submitted by elamb on