Skip to main content

Natural Language Processing (NLP)

Description

This presentation introduces the U.S. Department of Homeland Security (DHS) National Bio-Surveillance Integration System (NBIS) and the analytics functionality within the NBIS that integrates and analyzes structured and unstructured data streams across domains to provide inter-agency analysts with an integrated view of threat scenarios. The integration of Human and Animal incidences of Avian Influenza will be used to demonstrate initial capability.

Submitted by elamb on
Description

 Syndromic surveillance systems often classify patients into syndromic categories based on emergency department (ED) chief complaints. There exists no standard set of syndromes for syndromic surveillance, and the available syndromic case definitions demonstrate substantial heterogeneity of findings constituting the definition. The use of fever in the definition of syndromic categories is arbitrary and unsystematic. We determined whether chief complaints accurately represent whether a patient has any of five febrile syndromes: febrile respiratory, febrile gastrointestinal, febrile rash, febrile neurological, or febrile hemorrhagic.

Submitted by elamb on
Description

The main stay of recording patient data is the free text of electronic medical records (EMR). While stating the chief complaint and history of presenting illness in the patients ‘own words’, the rest of the electronic note is written by the provider in their words. Providers often use boiler-plate templates from EMR pull-downs to document information on the patient in the form of checklists, check boxes, yes/no and free text responses to questions. When these templates are used for recording symptoms, demographic information or medical, social or travel history, they represent an important source of surveillance data [1]. There is a dearth of literature on the use of natural language processing in extracting data from templates in the EMR.



Objective:

To highlight the importance of templates in extracting surveillance data from the free text of electronic medical records using natural language processing (NLP) techniques.

 

Submitted by Magou on
Description

Global biosurveillance is an extremely important, yet challenging task. One form of global biosurveillance comes from harvesting open source online data (e.g. news, blogs, reports, RSS feeds). The information derived from this data can be used for timely detection and identification of biological threats all over the world. However, the more inclusive the data harvesting procedure is to ensure that all potentially relevant articles are collected, the more data that is irrelevant also gets harvested. This issue can become even more complex when the online data is in a non-native language. Foreign language articles not only create language-specific issues for Natural Language Processing (NLP), but also add a significant amount of translation costs. Previous work shows success in the use of combinatory monolingual classifiers in specific applications, e.g., legal domain. A critical component for a comprehensive, online harvesting biosurveillance system is the capability to identify relevant foreign language articles from irrelevant ones based on the initial article information collected, without the additional cost of full text retrieval and translation.

Objective:

The objective is to develop an ensemble of machine learning algorithms to identify multilingual, online articles that are relevant to biosurveillance. Language morphology varies widely across languages and must be accounted for when designing algorithms. Here, we compare the performance of a word embedding-based approach and a topic modeling approach with machine learning algorithms to determine the best method for Chinese, Arabic, and French languages.

Submitted by elamb on
Description

The Global Public Health Intelligence Network is a non-traditional all-hazards multilingual surveillance system introduced in 1997 by the Government of Canada in collaboration with the World Health Organization.1 GPHIN software collects news articles, media releases, and incident reports and analyzes them for information about communicable diseases, natural disasters, product recalls, radiological events and other public health crises. Since 2016, the Public Health Agency of Canada (PHAC) and National Research Council Canada (NRC) have collaborated to replace GPHIN with a modular platform that incorporates modern natural language processing techniques to support more ambitious situational awareness goals.

Objective:

To rebuild the software that underpins the Global Public Health Intelligence Network using modern natural language processing techniques to support recent and future improvements in situational awareness capability.

Submitted by elamb on
Description

Recently, a growing number of studies have made use of Twitter to track the spread of infectious disease. These investigations show that there are reliable spikes in traffic related to keywords associated with the spread of infectious diseases like Influenza [1], as well as other Syndromes [2]. However, little research has been done using Social Media to monitor chronic conditions like Asthma, which do not spread from sufferer to sufferer. We therefore test the feasibility of using Twitter for Asthma surveillance, using techniques from NLP and machine learning to achieve a deeper understanding of what users Tweet about Asthma, rather than relying only on keyword search.

Objective

We present a Content Analysis project using Natural Language Processing to aid in Twitter-based syndromic surveillance of Asthma

Submitted by rmathes on
Description

Health surveillance systems provide important functionalities to detect, monitor, respond, prevent, and report on a variety of conditions across multiple owners. They offer important capabilities, with some of the most fundamental including data warehousing and transfer, descriptive statistics, geographic analysis, and data mining and querying. We observe that while there is significant variety among surveillance systems, many may still report duplicative data sources, use basic forms of analysis, and provide rudimentary functionality.

Objective

To identify analytic gaps and duplication across U.S. government, international agencies, non-profit and academic health surveillance systems, programs, and initiatives in four areas: Analytics, Data Sources, Statistics, and System Requirements.

 

Submitted by Magou on