Skip to main content

Machine Learning

Description

Timely surveillance of disease outbreak events of public health concern currently requires detailed and time consuming manual analysis by experts. Recently in addition to traditional information sources, the World Wide Web has offered a new modality in surveillance, but the massive collection of multilingual texts which must be processed in real time presents an enormous challenge.

 

Objective

In this paper we present a summary of the BioCaster system architecture for Web rumour surveillance, the rationale for the choices made in the system design and an empirical evaluation of topic classification accuracy for a gold-standard of English and Vietnamese news.

Submitted by elamb on

Presented May 24, 2018.

Mauricio Santillana, MS, PhD describes machine learning methodologies that leverage Internet-based information from search engines, twitter microblogs, crowd-sourced disease surveillance systems, electronic medical records, and historical synchronicities in disease activity across spatial regions, to successfully monitor and forecast disease outbreaks in multiple locations around the globe in near real-time.

Presenter

Description

The choice of outbreak detection algorithm and its configuration can result in important variations in the performance of public health surveillance systems. Our work aims to characterize the performance of detectors based on outbreak types. We are using Bayesian networks (BN) to model the relationships between determinants of outbreak detection and the detection performance based on a significant study on simulated data.

Objective

To predict the performance of outbreak detection algorithms under different circumstances which will guide the method selection and algorithm configuration in surveillance systems, to characterize the dependence of the performance of detection algorithms on the type and severity of outbreak, to develop quantitative evidence about determinants of detection performance.

Submitted by teresa.hamby@d… on
Description

Early detection of influenza outbreaks is critical to public health officials. Case detection is the foundation for outbreak detection. Previous study by Elkin el al. demonstrated that using individual emergency department (ED) reports can better detect influenza cases than using chief complaints. Our recent study using ED reports processed by Bayesian networks (using expert constructed network structure) showed high detection accuracy on detection of influenza cases.

Objective

Compare 7 machine learning algorithms with an expert constructed Bayesian network on detection of patients with influenza syndrome.

Submitted by teresa.hamby@d… on
Description

Global biosurveillance is an extremely important, yet challenging task. One form of global biosurveillance comes from harvesting open source online data (e.g. news, blogs, reports, RSS feeds). The information derived from this data can be used for timely detection and identification of biological threats all over the world. However, the more inclusive the data harvesting procedure is to ensure that all potentially relevant articles are collected, the more data that is irrelevant also gets harvested. This issue can become even more complex when the online data is in a non-native language. Foreign language articles not only create language-specific issues for Natural Language Processing (NLP), but also add a significant amount of translation costs. Previous work shows success in the use of combinatory monolingual classifiers in specific applications, e.g., legal domain. A critical component for a comprehensive, online harvesting biosurveillance system is the capability to identify relevant foreign language articles from irrelevant ones based on the initial article information collected, without the additional cost of full text retrieval and translation.

Objective:

The objective is to develop an ensemble of machine learning algorithms to identify multilingual, online articles that are relevant to biosurveillance. Language morphology varies widely across languages and must be accounted for when designing algorithms. Here, we compare the performance of a word embedding-based approach and a topic modeling approach with machine learning algorithms to determine the best method for Chinese, Arabic, and French languages.

Submitted by elamb on
Description

A socio-marker is a measurable indicator of social conditions where a patient is embedded in and exposed to, being analogous with a biomarker indicating the severity or presence of some disease state. Social factors are one of the most clinical health determinants, which play a critical role in explaining health outcomes. Socio-markers can help medical practitioners and researchers to reliably identify high-risk individuals in a timely manner.

Objective:

Asthma is one of the most common chronic childhood diseases in the United States. In addition to its pervasiveness, pediatric asthma shows high sensitivity to the environment. Combining medical-social dataset with machine learning methods we demonstrate how socio-markers play an important role in identifying patients at risk of hospital revisits due to pediatric asthma within a year.

Submitted by elamb on
Description

At the Governor’s Opioid Addiction Crisis Datathon in September 2017, a team of Booz Allen data scientists participated in a two-day hackathon to develop a prototype surveillance system for business users to locate areas of high risk across multiple indicators in the State of Virginia. We addressed 1) how different geographic regions experience the opioid overdose epidemic differently by clustering similar counties by socieconomic indicators, and 2) facilitating better data sharing between health care providers and law enforcement. We believe this inexpensive, open source, surveillance approach could be applied for states across the nation, particularly those with high rates of death due to drug overdoses and those with significant increases in death.

Objective:

A team of data scientists from Booz Allen competed in an opioid hackathon and developed a prototype opioid surveillance system using data science methods. This presentation intends to 1) describe the positives and negatives of our data science approach, 2) demo the prototype applications built, and 3) discuss next steps for local implementation of a similar capability.

Submitted by elamb on
Description

Currently, there is an abundance of data coming from most of the surveillance environments and applications. Identification and filtering of responsive messages from this big data ocean and then processing these informative datasets to gain knowledge are the two real challenges in today’s applications.

Use of Analytics has revolutionized many areas. At LongRiver Infotech, we have used various Machine Learning techniques (Regression, Classification, Text Analytics, Decision Trees, Clustering etc.) in different types of applications. These methodologies are abstracted in a generic platform, which can be put to use in many public health and surveillance applications, which are enumerated here.

Objective

To summarize ways in which Analytics, Machine Learning (ML) and Natural Language Processing (NLP) can improve accuracy and efficiency in bio surveillance and public health practices. We also discuss the use of this framework in typical surveillance applications (Integration with Devices/Sensors, Web/Mobile, Clinical Records, Internet queries, Social/News media).

Submitted by teresa.hamby@d… on

This presentation given August 3, 2017 describes work toward applying machine learning methods to CDC’s autism surveillance program. CDC’s population-based autism surveillance is labor-intensive and costly, as it requires clinicians to manually review children’s medical and educational records for descriptions of autism symptoms. Using the words in these records, our team is building algorithms to predict which children will meet the surveillance case definition for autism.