Skip to main content

Bayesian Methods

Description

Scientists have utilized many chief complaint (CC) classification techniques in biosurveillance including keyword search, weighted keyword search, and naïve Bayes. These techniques may utilize CC-to-syndrome or CC-to-symptom-to-syndrome classification approaches. In the former approach, we classify a CC directly into syndrome categories. In the latter approach, we first classify a CC into symptom categories. Then, we use a syndrome definition, a combination of one or more symptoms, to determine whether or not a chief complaint belongs in a particular syndrome category. One approach to CC-to-symptom-to-syndrome classification uses manually weighted keyword search and Boolean operations to build syndrome classifiers. A limitation to this approach is that it does not address uncertainty in the data and the system is manually parameterized. A CC-tosymptom-to-syndrome approach that is both probabilistic and utilizes machine learning addresses these limitations.

 

Objective

Design, build and evaluate a symptom-based probabilistic chief complaint classifier for the Real-time Outbreak and Disease Surveillance System.

Submitted by elamb on
Description

Current state-of-the-art outbreak detection methods [1-3] combine spatial, temporal, and other covariate information from multiple data streams to detect emerging clusters of disease.  However, these approaches use fixed methods and models for analysis, and cannot improve their performance over time.   Here we consider two methods for overcoming this limitation, learning a prior over outbreak regions and learning outbreak models from user feedback, using the recently proposed multivariate Bayesian scan statistic (MBSS) framework [1]. Given a set of outbreak types {Ok}, set of space-time regions S, and the multivariate dataset D, MBSS computes the posterior probability Pr(H1(S, Ok) | D) of each outbreak type in each region, using Bayes’ Theorem to combine the prior probabilities Pr(H1(S, Ok)) and the data likelihoods Pr(D | H1(S, Ok)). Each outbreak type can have a different prior distribution over regions, as well as a different model for its effects on the multiple streams.  The set of outbreak types, as well as the region priors and outbreak models for each type, can be learned incrementally from labeled data or user feedback.

Objective

We argue that the incorporation of machine learning algorithms is a natural next step in the evolution and improvement of disease surveillance systems. We consider how learning can be incorporated into one recently proposed multivariate detection method, and demonstrate that learning can enable systems to substantially improve detection performance over time.

Submitted by elamb on
Description

A Bayesian Network (BN) is a probabilistic graphical model representing dependencies and relationships. The structure of the network and conditional probabilities capture an expert’s view of a system. BN have been applied to the public health domain for research purposes, but have not been used directly by the end users of public health systems. As BN technology becomes more and more accepted in the public health domain, the data fusion visualization becomes a critical component of the overall system design. The tools developed utilize computer assisted analysis on BN in the public health domain, provide a concise view of the data for better decision support, and shorten the decision making phase allowing rapid dissemination of information to public health.

 

OBJECTIVE

This paper describes the use and visualization of BNs to better assists public health users. The Data Fusion Visualization (DFV) provides an intuitive graphical interface that supports users in three ways. The first is by providing a seamless drill down interpretation of a dataset. The second is by providing an intuitive interpretation of BN. Finally, by abstracting the visualization from the underlying model, the DFV is capable of masking inter-operating BNs into a single visualization. The DFV provides a graphical representation of BN Network Data Fusion.

Submitted by elamb on
Description

Syndromic surveillance needs to be (1) transparent, (2) actionable, and (3) flexible. Traditional frequentist approaches to syndromic surveillance, such as cusum charts and scan statistics, tend to fail on all three criteria. First, the validity of the assumptions is generally difficult to check and the methods are hard to modify; second, the false positive rate makes it impossible to be both sensitive to true signal and resistant to spurious signal; and third, the implementation usually requires significant hand-tinkering to adjust background rates for known seasonal affects and other identifiable influences.

 

OBJECTIVE

This paper describes a Bayesian approach to syndromic surveillance. The method provides more interpretable inference than traditional frequentist approaches. Bayesian methods avoid many of the problems associated with alpha levels and multiple comparisons, and make better use of prior information. The technique is illustrated on simulated data.

Submitted by elamb on
Description

Bio-surveillance is an area providing real time or near real time data sets with a rich structure. In this area, the new wave of interest lies in incorporating medical-based data such as percentage of Influenza-Like-Illnesses (ILI) or count of ILI observed during visits to Emergency Room as intelligence function; since many different bioterrorist agents present with flu-like symptoms. Developing a control technique for ILI however is a complex process which involves the unpredictability of the time of emergence of influenza, the severity of the outbreak and the effectiveness of influenza epidemic interventions. Furthermore, the need to detect the beginning of epidemic in an on-line fashion as data are received one at the time and sequentially make the problems surrounding ILI's even more challenging. Statistical tools for analyzing these data are currently well short of being able to capture all their important structural details. Tools from statistical process control are on the face of it ideally suited for the task, since they address the exact problem of detecting a sudden shift against a background of random variability. Bayesian statistical methods are ideally suited to the setting of partial but imperfect information on the statistical parameters describing time series data such as are gathered in BioSense and Sentinel settings.

 

Objective

This paper presents a Bayesian approach to quality control through the use of sequential update technique in order built a fast detection method for influenza outbreak and potential intentional release of biological agents. The objective is to find evidence of outbreaks against a background in which markers of possible intentional release are non-stationary and serially dependent. This work takes on the US Sentinel ILI data to find this evidence and to address some issues related to the control of infectious diseases. A sensitivity analysis is conducted through simulation to assess timeliness, correct alarm and missed alarm rates of our technique.

Submitted by elamb on
Description

Non-temporal Bayesian network outbreak detection methods only look at data from the most recent day. For example, PANDA-CDCA (PC) only looks at data from the last 24 hours to determine how likely an outbreak is occurring. PC is a Bayesian network disease outbreak detection system that models 12 diseases. A system that looks only at each day's data might signal an outbreak one day and not signal it the next. Cooper et al. obtained such results when evaluating the ability of PC to detect a laboratory validated outbreak of influenza. We hypothesized that temporal modeling would attenuate this problem.

 

Objective

A temporal method for outbreak detection using a Bayesian network is presented and evaluated.

Submitted by elamb on
Description

This paper describes a Bayesian algorithm for diagnosing the CDC Category A diseases, namely, anthrax, smallpox, tularemia, botulism and hemorrhagic fever, using emergency department chief complaints. The algorithm was evaluated on real data and on semi-synthetic data, and this paper summarizes the results of that evaluation.

Submitted by elamb on