Skip to main content

Machine Learning

Presented December 13, 2018.

For public health surveillance, is machine learning worth the effort? What methods are relevant? Do you need special hardware? This talk was motivated by these and other questions asked by ISDS members. It will focus on providing practical—and slightly opinionated—advice about how to determine whether machine learning could be a useful tool for your problem.

Presenter

Presented November 16, 2018.

The current opioid overdose/addiction crisis in the United States presents a challenge to public health intervention due to a lack of data on current and past incidence. Very little information is known regarding what is happening when/where and in comparison to the past. Marin County, California is addressing the lack of clarity in opioid overdose data by designing a novel cloud-based system to identify opioid overdoses for both surveillance and outreach purposes using county owned Emergency Medical Services (EMS) data.

Description

Much attention has been given recently to the purported ability of social media to provide early warning and/or situational awareness and event characterization during a biological event of national concern. The National Biosurveillance Integration Center's (NBIC) innovation project on Social Media Analysis seeks to demonstrate the viability of extracting relevant, health information from social media data, with the ultimate goal to establish an operational social media system for biological event surveillance. Early work in this project has focused on demonstrating the relevance of social media to the biosurveillance problem through data analysis and algorithm development. Preliminary assessments of a commercial social media product also yielded valuable insights for the system architecture required to support such an operational tool. In addition to continued analysis of data utility (algorithm development) and system architecture, future work will include development of a comprehensive concept of operations (CONOPS) for implementation and use of a social media capability within the NBIC.

Objective

Through ongoing and future projects we will examine the utility of social media data for biosurveillance, including machine learning approaches for algorithm development, as well as the system and organizational architectures required to implement an operational system.

Submitted by knowledge_repo… on
Description

Scientists have utilized many chief complaint (CC) classification techniques in biosurveillance including keyword search, weighted keyword search, and naïve Bayes. These techniques may utilize CC-to-syndrome or CC-to-symptom-to-syndrome classification approaches. In the former approach, we classify a CC directly into syndrome categories. In the latter approach, we first classify a CC into symptom categories. Then, we use a syndrome definition, a combination of one or more symptoms, to determine whether or not a chief complaint belongs in a particular syndrome category. One approach to CC-to-symptom-to-syndrome classification uses manually weighted keyword search and Boolean operations to build syndrome classifiers. A limitation to this approach is that it does not address uncertainty in the data and the system is manually parameterized. A CC-tosymptom-to-syndrome approach that is both probabilistic and utilizes machine learning addresses these limitations.

 

Objective

Design, build and evaluate a symptom-based probabilistic chief complaint classifier for the Real-time Outbreak and Disease Surveillance System.

Submitted by elamb on
Description

Early and reliable detection of anomalies is a critical challenge in disease surveillance. Most surveillance systems collect data from multiple data streams but the majority of monitoring is performed at univariate time series level. Purely statistical methods used in disease surveillance look at each time series separately and tend to generate a large number of false alarms. Support Vector Machines can be used to develop rich multivariate models that allow detecting abnormal relationships between different time series leading to greatly reduced number of false alarms.

 

Objective

This paper depicts a novel method for reliable detection of disease outbreaks. The methodology and initial results obtained on ESSENCE data are presented.

Submitted by elamb on
Description

Free-text emergency department triage chief complaints (CCs) are a popular data source used by many syndromic surveillance systems because of their timeliness, availability, and relevance. The lack of standardization of CC vocabulary poses a major technical challenge to any automatic CC classification approach. This challenge can be partially addressed by several methods, for example, medical thesaurus, spelling check, manually-created synonym list, and supervised machine learning techniques that directly operate on free text. Current approaches, however, ignore the fact that medical terms appearing in CCs are often semantically related. Our research exploits such semantic relations through a medical ontology in the context of automatic CC classification for syndromic surveillance.

 

Objective

This paper presents a novel approach of using a medical ontology to classify free-text CCs into syndrome categories.

Submitted by elamb on
Description

This paper describes a hybrid (event-based and indicator-based) surveillance platform designed to streamline the collaboration between domain experts and machine learning algorithms for detection, prediction and response to health-related events (such as disease outbreaks).

Submitted by elamb on
Description

Current state-of-the-art outbreak detection methods [1-3] combine spatial, temporal, and other covariate information from multiple data streams to detect emerging clusters of disease.  However, these approaches use fixed methods and models for analysis, and cannot improve their performance over time.   Here we consider two methods for overcoming this limitation, learning a prior over outbreak regions and learning outbreak models from user feedback, using the recently proposed multivariate Bayesian scan statistic (MBSS) framework [1]. Given a set of outbreak types {Ok}, set of space-time regions S, and the multivariate dataset D, MBSS computes the posterior probability Pr(H1(S, Ok) | D) of each outbreak type in each region, using Bayes’ Theorem to combine the prior probabilities Pr(H1(S, Ok)) and the data likelihoods Pr(D | H1(S, Ok)). Each outbreak type can have a different prior distribution over regions, as well as a different model for its effects on the multiple streams.  The set of outbreak types, as well as the region priors and outbreak models for each type, can be learned incrementally from labeled data or user feedback.

Objective

We argue that the incorporation of machine learning algorithms is a natural next step in the evolution and improvement of disease surveillance systems. We consider how learning can be incorporated into one recently proposed multivariate detection method, and demonstrate that learning can enable systems to substantially improve detection performance over time.

Submitted by elamb on
Description

Spatial scan finds the most anomalous region that has shown increase in observed counts when compared to the expected baseline. As there can be infinitely many regions to search for, most state-of-the-art algorithms assumes a specific shape of the attack region (circles for Kulldorff and rectangles for Ultra-Fast Spatial Scan Statistics). This assumption might reduce the detection power as real world attacks don't follow standard geometric shapes.

 

Objective

We propose discriminative random field approach for detecting a disease outbreak. Given observed data on a spatial grid, the goal is to label each node as being under attack and non-attack.

Submitted by elamb on