Skip to main content

Burkom Howard

Description

Every public health monitoring operation faces important decisions in its design phase. These include information sources to be used, the aggregation of data in space and time, the filtering of data records for required sensitivity, and the design of content delivery for users. Some of these decisions are dictated by available data limitations, others by objectives and resources of the organization doing the

surveillance. Most such decisions involve three characteristic tradeoffs: how much to monitor for exceptional vs customary health threats, the level of aggregation of the monitoring, and the degree of automation to be used.

The first tradeoff results from heightened concern for bioterrorism and pandemics, while everyday threats involve endemic disease events such as seasonal outbreaks. A system focused on bioterrorist attacks is scenario-based, concerned with unusual diagnoses or patient distributions, and likely to include attack hypothesis testing and tracking tools. A system at the other end of this continuum has broader syndrome groupings and is more concerned with general anomalous levels at manageable alert rates. 

Major aggregation tradeoffs are temporal, spatial, and syndromic. Bioterrorism fears have shortened the time scale of health monitoring from monthly or weekly to near-real-time. The spatial scale of monitoring is a function of the spatial resolution of data recorded and allowable for use as well as the monitoring institution’s purview and its capacity to collect, analyze and investigate localized outbreaks.

Automation tradeoffs involve the use of data processing to collect information, analyze it for anomalies, and make investigation and response decisions. The first of these uses has widespread acceptance, while in the latter two the degree of automation is a subject of ongoing controversy and research. To what degree can human judgment in alerting/response decisions be automated? What are the level and frequency of human inspection and adjustment? Should monitoring frequency change during elevated threat conditions?

All of these decisions affect monitoring tools and practices as well as funding for related research.

 

Objective

This purpose of this effort is to show how the goals and capabilities of health monitoring institutions can shape the selection, design, and usage of tools for automated disease surveillance systems.

Submitted by elamb on
Description

On 27 April 2005, a simulated bioterrorist event—the aerosolized release of Francisella tularensis in the men’s room of luxury box seats at a sports stadium—was used to exercise the disease surveillance capability of the National Capital Region (NCR). The objective of this exercise was to permit all of the health departments in the NCR to exercise inter-jurisdictional epidemiological investigations using an advanced disease surveillance system. Actual system data could not be used for the exercise as it both is proprietary and contains protected, though de-identified, health information about real people; nor is there much historical data describing how such an outbreak would manifest itself in normal syndromic data. Thus, it was essential to develop methods to generate virtual health care records that met specific requirements and represented both ‘normal’ endemic visits (the background) as well as outbreak-specific records (the injects).

 

Objective

This paper describes a flexible modeling and simulation process that can create realistic, virtual syndromic data for exercising electronic biosurveillance systems.

Submitted by elamb on
Description

The eleven syndrome classifications for clinical data records monitored by BioSense include rare events such as death or lymphadenitis and also common occurrences such as respiratory infections. BioSense currently uses two statistical methods for prediction and alerting with respect to the eleven syndromes. These are a modified CUSUM; and small area regression and testing (SMART), described by Ken Kleinman. At the inception of BioSense, these prediction methods were implemented as one-model-fits-all, and they remain largely unmodified. An evaluation of the predictive value of these methods is required. The SMART method, as used in BioSense, uses long-term data. As covariate predictors, day-of-week, a holiday indicator, day after holiday, and sine/cosine seasonality variables are used. Lengthy, stable historical data is not always available in BioSense data sources, and this obstacle is expected to grow as data sources are added. We wish to test regression methods of surveillance that use shorter time periods, and different sets of predictors.

 

Objective

This paper compares the prediction accuracy of regression models with different covariates and baseline periods, using a subset of data from CDC’s BioSense initiative. Accurate predictions are needed to achieve sensitivity at practical false alarm rates in anomaly detection for biosurveillance.

Submitted by elamb on
Description

To date, most syndromic surveillance systems rely heavily on complicated statistical algorithms to identify aberrations. The assumption is that when the statistics identify something unusual, follow-up should occur. However, with multiple strata analyzed, small numbers for some strata, and wide variances in daily counts, the statistical algorithms will generate flags too often. Experience has shown that these flags usually have little or no public health significance. As a result, syndromic surveillance systems suffer from the ‘boy who cried wolf’ syndrome. It is clear that the analyst’s ability to use professional judgment to sift through multitudes of flags is very important to the success of the system, which suggests that statistics alone cannot identify issues of public health importance from ED data.

Objective

This study's aim was to refine an automated biosurveillance system in order to better suit the daily monitoring capabilities and resources of a health department.

Submitted by elamb on
Description

To recognize outbreaks so that early interventions can be applied, BioSense uses a modification of the EARS C2 method, stratifying days used to calculate the expected value by weekend vs weekday, and including a rate-based method that accounts for total visits. These modifications produce lower residuals (observed minus expected counts), but their effect on sensitivity has not been studied.

 

Objective

To evaluate several variations of a commonlyused control chart method for detecting injected signals in 2 BioSense System datasets.

Submitted by elamb on
Description

Estimation of representative spatial probabilities and expected counts from baseline data can cause problems in applying spatial scan statistics when observed events are sparse in a large percentage of the spatial zones (e.g., zip codes or census tracts) found in the data records. In applications of scan statistics to datasets with fine spatial resolution, such as census tracts or block groups, such highly skewed data distributions are likely to occur. If the spatial distribution estimation process does not handle the zones with low counts correctly, bias in the determination of statistically significant clusters will occur.

In any 8-week baseline period, some of the sparse-data zones have no counts at all. If ignored, the zero-count spatial zones will result in division by zero in the loglikelihood ratio evaluation. The traditional method of setting a floor on the expected counts in each spatial zone leads to a loss of sensitivity when the number of zero count zones is a significant fraction of all the zones. One alternative method for estimating spatial probabilities is to add one count to the sum of baseline counts in each spatial zone. This method has been used in a study of spatial cluster detection using medical 911 call data from San Diego County with good results. However, when this method was applied to data with a more highly skewed spatial distribution, issues were uncovered which led to this investigation of alternatives.

 

Objective

Modifications to spatial scan statistics are investigated for prospective cluster detection at fine-resolution with highly skewed spatial distributions having many spatial zones with very few cases. Several alternative methods for the estimation of spatial probabilities and expected counts from counts in a baseline data window are evaluated with the Poisson spatial scan statistic and the space-time permutation scan statistic using goodness-of-fit statistics and cluster rates to compare performance.

Submitted by elamb on
Description

A U.S. Department of Defense program is underway to assess health surveillance in resource-poor settings and to evaluate the Early Warning Outbreak Reporting System. This program has included several information-gathering trips, including a trip to Lao PDR in September, 2006.

 

Objective

This modeling effort will provide guidance for policy and planning decisions in developing countries in the event of an acute respiratory illness epidemic, particularly an outbreak with pandemic potential.

Submitted by elamb on
Description

The increased threat of bioterrorism and naturally occurring diseases, such as pandemic influenza, continually forces public health authorities to review methods for evaluating data and reports. The objective of bio-surveillance is to automatically process large amounts of information in order to rapidly provide the user with a situational awareness. Most systems currently deployed in health departments use only statistical algorithms to filter data for decision-making. These algorithms are capable of high sensitivity, but this sensitivity comes at the cost of excessive false positives [2], especially when multiple syndrome groups and data types are processed.

Objective

An intelligent information fusion approach is proposed to identify and provide early alerting of naturally-occurring disease outbreaks, as well as bioterrorist attacks, while reducing false positives. The proposed system statistically preprocesses information from multiple sources and fuses it in a manner comparable with the domain expert's decision-making process. Currently, system users lower the false alarm rate by "explaining away" the statistical data anomalies with alternative hypotheses derived from external, non-syndromic knowledge. We seek to incorporate this heuristic decision-making into a probabilistic network that accepts the outputs of statistical algorithms in a hybrid model of domain knowledge and data inference.

Submitted by elamb on
Description

CDC’s BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Spatial approaches depend strongly on having reliable estimated values for counts among the geographic sub-regions. If estimates are poor, algorithms will find irrelevant clusters, and clusters of importance may be missed. While many studies have focused on improved computation time and more general cluster shapes, our effort focused on finding anomalies that are correct according to available BioSense data history.

 

Objective

We applied spatial scan statistics to data from CDC’s BioSense system and examined the effect of the spatial prediction method on determination of anomalous disease clusters. The objectives were to decide on a reliable spatial estimation method for one BioSense data source and to establish criteria for making this decision using other sources.

Submitted by elamb on