Skip to main content

Natural Language Processing (NLP)

Description

Since their introduction to the US market in 2007, electronic cigarettes (e-cigarettes) have posed considerable challenges to both public health authorities and government regulators, especially given the debate – in both the scientific world and the community at large – regarding the potential advantages (e.g. helping individuals quit smoking) and disadvantages (e.g. renormalizing smoking) associated with the product1. Similarly, hookah – a kind of waterpipe used to smoke flavored tobacco – has increased in popularity in recent years, is known to be particularly popular among younger people, and has prompted a range of regulatory responses2. One important – and currently largely unexplored – area of research involves exploring consumer perceptions and experiences of these emerging tobacco products. In this work, we use online health discussion forums in conjunction with text mining and novel data visualization techniques to investigate consumer perceptions and experiences of e-cigarettes and hookah, focusing on the automatic identification of symptoms associated with each product, and consumer motivations for product use. Previous related research has focused on using text-mining to analyze e-cigarette or hookah related Twitter posts3,4 and on the qualitative identification of e-cigarette related symptoms from online discussion forums5. The research reported in this abstract is – to the best of our knowledge – the first time that text mining techniques have been used with online health forums to understand e-cigarette or hookah use.

Objective

Our aim in this work is to apply text mining and novel visualization techniques to textual data derived from online health discussion forums in order to better understand consumers’ experiences and perceptions of electronic cigarettes and hookah.

 

Submitted by Magou on
Description

Characterizing mentions found in clinical texts that support, refute, or represent uncertainty for suspected pneumonia is one area where automated Natural Language Processing (NLP) screening algorithms could be improved. Mentions of uncertainty and negation commonly occur in clinical texts, and opportunities exist to extend existing algorithms [1] and taxonomies [2]. In general there are three main sources of uncertainty found in healthcare: 1) probability or risk; 2) ambiguity – lack of reliability, credibility or adequacy of the information; and, 3) complexity – aspects of the phenomenon that make it difficult to comprehend [3].

Objective

We sought to identify relevant evidence that supports, refutes or contributes uncertainty when reviewing cases of suspected pneumonia and characterize their interaction with uncertainty phenomena found in clinical texts.

 

 

Submitted by Magou on
Description

Major depressive disorder has a lifetime prevalence of 16.6% in the United States. Social media platforms – e.g. Twitter, Facebook, Reddit – are potential resources for better understanding and monitoring population-level mental health status over time. Based on DSM-5 diagnostic criteria, our research aims to develop a natural language processing-based system for monitoring major depressive disorder at the population-level using public social media data.

Objective

We aim to develop an annotation scheme and corpus of depression-related tweets to serve as a test-bed for the development of natural language processing algorithms capable of automatically identifying depression-related symptoms from Twitter feeds.

Submitted by teresa.hamby@d… on
Description

Despite numerous successes in using social media to detect food borne illness and to predict influenza trends, the use of social media as a public health tool has yet to gain widespread adoption. While social media data cannot directly diagnose illness, aggregate trends in symptom proliferation may readily be observed. Such trends may allow a health agency to watch for signs and symptoms related to target conditions within its jurisdiction. Further, social media surveillance offers a distinct advantage in immediacy and sensitivity as it is not dependent upon infected individuals seeking care for reportable illnesses and as such information is not delayed by the handling, transfer, and processing of reports. These advantages may enable the earlier preparation and initiation of scaled response sequences during public health emergencies. Such data may also yield additional evidence through shared symptoms, rumors, and observations crucial to an epidemiological investigation.

Objective

To formally introduce ChatterGrabber, an open source, natural language processing based toolset for public health social media surveillance. ChatterGrabber is designed to collect and categorize a high volume of content at a low cost, providing a readily deployable solution for Epidemiologists to track emergent outbreaks in the field and a signal for syndromic surveillance.

 

Submitted by Magou on
Description

Natural language processing algorithms that accurately screen clinical documents for suspected pneumonia must extract and reason about whether these mentions provide evidence that supports, refutes, or represents uncertainty. Our efforts extend existing algorithms [1] and taxonomies [2] that can be leveraged by NLP tools for more accurate handling of uncertainty for suspected pneumonia case review.

Objective

We sought to classify evidence that supports, refutes, or contributes uncertainty when reviewing cases of suspected pneumonia. We extend an existing taxonomy of uncertainty to classify these phenomena with the goal of improving existing Natural Language Processing (NLP) algorithms.

Submitted by Magou on
Description

Processing free-text clinical information in an electronic medical record (EMR) may enhance surveillance systems for early identification of ILI outbreaks. However, processing clinical text using NLP poses a challenge in preserving the semantics of the original information recorded. In this study, we discuss several NLP and technical issues as well as potential solutions for implementation in syndromic surveillance systems.

Objective

To review the natural language processing (NLP) and technical challenges encountered in an automated influenza-like illness (ILI) surveillance system.

Submitted by teresa.hamby@d… on
Description

Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Recently, Wikipedia access logs (e.g., McIver 2014, Generous 2014) have been shown to be effective in this arena. Much richer Wikipedia data are available, though, including the entire Wikipedia article content and edit histories.

We study two different aspects of Wikipedia content as it relates to unfolding disease events: 1) we demonstrate how to capture case, death, and hospitalization counts from the article text, and 2) we show there are valuable time series data present in the tables found in certain articles.

We argue that Wikipedia data cannot only be used for disease surveillance but also as a centralized repository system for collecting disease-related data in near real-time.

Objective

To improve traditional outbreak surveillance systems by utilizing the content of Wikipedia articles.

Submitted by teresa.hamby@d… on
Description

When hazardous materials or products emerge in the market, injury prevention researchers take action to promote awareness and legislation with the goal to prevent further injuries. This cannot be achieved without reliable data on trends and outcomes identifying large cohorts with the injury of interest. Lags in providing such data will delay knowledge sharing to prevent avoidable and potentially fatal injuries.

Glass tables and earth magnets are two examples of consumer products with potential for significant injuries, particularly to children. Magnet toys caused a large number of injuries with associated morbidity and mortality. For months there were no available data to support policy or prevention initiatives. Similarly, certain disease and injury mechanisms such as penetrating oral trauma are not included as structured data and cannot be collected using ICD-9/ICD-10 codes. Data on these types of injury mechanisms exist exclusively within the clinical narrative.

Objective

• Describe injury-related surveillance using clinical narratives within electronic health records

• Present a user friendly, physician transferrable operated natural language processing (NLP) module, which can identify injury related events from electronic health record narratives

• Present a variety of use cases and results

Submitted by teresa.hamby@d… on