Skip to main content

Natural Language Processing (NLP)

Description

There are a number of Natural Language Processing (NLP) annotation and Information Extraction (IE) systems and platforms that have been successfully used within the medical domain. Although these groups share components of their systems, there has not been a successful effort in the medical domain to codify and standardize either the syntax or semantics between systems to allow for interoperability between annotation tools, NLP tools, IE tools, corpus evaluation tools and encoded clinical documents. There are two components to a successful interoperability standard: an information and a semantic model.

Objective

The Consortium for Healthcare Informatics Research, a Department of Veterans Affairs (VA) Office of Research and Development is sponsoring the development of a standard ontology and information model for Natural Language Processing interoperability within the biomedical domain.

Submitted by uysz on
Description

Mortality is an indicator of the severity of the impact of an event on the population. In France mortality surveillance is part of the syndromic surveillance system SurSaUD and is carried out by Santé publique France, the French public health agency. The set-up of an Electronic Death Registration System (EDRS) in 2007 enabled to receive in real-time medical causes of death in free-text format. This data source was considered as reactive and valuable to implement a reactive mortality surveillance system using medical causes of death (1). The reactive mortality surveillance system is based on the monitoring of Mortality Syndromic Groups (MSGs). An MSG is defined as a cluster of medical causes of death (pathologies, syndromes, symptoms) that meet the objectives of early detection and impact assessment of events (2). Since causes of death are entered in free-text format, their automatic classifications into MSGs require the use of natural language processing methods. We observe a constant increase in the use of these methods to classify medical information and for health surveillance over the last two decades (3).

Objective: This study aims to implement and evaluate two automatic classification methods of free-text medical causes of death into Mortality Syndromic Groups (MSGs) in order to be used for reactive mortality surveillance.

Submitted by elamb on
Description

Opioid overdoses have emerged within the last five to ten years to be a major public health concern. The high potential for fatal events, disease transmission, and addiction all contribute to negative outcomes. However, what is currently known about opioid use and overdose is generally gathered from emergency room data, public surveys, and mortality data. In addition, opioid overdoses are a non-reportable condition. As a result, state/national standardized procedures for surveillance or reporting have not been developed, and local government monitoring is frequently not specific enough to capture and track all opioid overdoses. Lastly, traditional means of data collection for conditions such as heart disease through hospital networks or insurance companies are not necessarily applicable to opioid overdoses, due to the often short disease course of addiction and lack of consistent health care visits. Overdose patients are also reluctant to follow-up or provide contact information due to law enforcement or personal reasons. Furthermore, collected data related to overdoses several months or years after the fact are useless in terms of short-term outreach. Therefore, given the potentially brief timeline of addiction or use to negative outcome, the current project set to create a near real-time surveillance and treatment/outreach system for opioid overdoses using an already existing EMS data collection framework.

Objective: To develop and implement a classifcation algorithm to identify likely acute opioid overdoses from text fields in emergency medical services (EMS) records.

Submitted by elamb on
Description

Emergency department (ED) syndromic surveillance relies on a chief complaint, which is often a free-text field, and may contain misspelled words, syntactic errors, and healthcare-specific and/or facility-specific abbreviations. Cleaning of the chief complaint field may improve syndrome capture sensitivity and reduce misclassification of syndromes. We are building a spell-checker, customized with language found in ED corpora, as our first step in cleaning our chief complaint field. This exercise would elucidate the value of pre-processing text and would lend itself to future work using natural language processing (NLP) techniques, such as topic modeling. Such a tool could be extensible to other datasets that contain free-text fields, including electronic reportable disease lab and case reporting.

Objective: To share progress on a custom spell-checker for emergency department chief complaint free-text data and demonstrate a spell-checker validation Shiny application.

Submitted by elamb on
Description

Despite considerable effort since the turn of the century to develop Natural Language Processing (NLP) methods and tools for detecting negated terms in chief complaints, few standardised methods have emerged. Those methods that have emerged (e.g. the NegEx algorithm) are confined to local implementations with customised solutions. Important reasons for this lack of progress include (a) limited shareable datasets for developing and testing methods (b) jurisdictional data silos, and (c) the gap between resource-constrained public health practitioners and technical solution developers, typically university researchers and industry developers. To address these three problems ISDS, funded by a grant from the Defense Threat Reduction Agency, organized a consultancy meeting at the University of Utah designed to bring together (a) representatives from public health departments, (b) university researchers focused on the development of computational methods for public health surveillance, (c) members of public health oriented non-governmental organisations, and (d) industry representatives, with the goal of developing a roadmap for the development of validated, standardised and portable resources (methods and data sets) for negation detection in clinical text used for public health surveillance.

Objective: This abstract describes an ISDS initiative to bring together public health practitioners and analytics solution developers from both academia and industry to define a roadmap for the development of algorithms, tools, and datasets to improve the capabilities of current text processing algorithms to identify negated terms (i.e. negation detection).

Submitted by elamb on
Description

We are developing a Bayesian surveillance system for realtime surveillance and characterization of outbreaks that incorporates a variety of data elements, including free-text clinical reports. An existing natural language processing (NLP) system called Topaz is being used to extract clinical data from the reports. Moving the NLP system from a research project to a real-time service has presented many challenges.

 

Objective

Adapt an existing NLP system to be a useful component in a system performing real-time surveillance.

Submitted by hparton on
Description

Patient consultations recorded as voice dictations are frequently stored electronically as transcriptions in free text format. The information stored in free text is not computer tractable. Advances in artificial intelligence permit the conversion of free text into structured information that allows statistical analysis.

 

Objective

This paper describes DMReporter, a medical language processing system that automatically extracts information pertaining to diabetes (demography, numerical measurement values, medication list, and diagnoses) from the free text in physicians’ notes and stores it in a structured format in a MYSQL database.

Submitted by hparton on
Description

Pro-WATCH (protecting war fighters using algorithms for text processing to capture health events), a syndromic surveillance project for veterans of operation enduring freedom (OEF)/operation Iraqi freedom (OIF), includes a task to identify medically unexplained symptoms (MUS). The v3NLP entity extraction tool is being customized to identify symptoms within VA clinical documents, and then refined to assign duration. The identification of medically unexplained symptoms and the aggregation of this information across documents by patient’s is not addressed here.

Objective

Pro-WATCH (protecting war fighters using algorithms for text processing to capture health events), a syndromic surveillance project, includes a task to identify medically unexplained symptoms. The v3NLP entity extraction tool is being customized to identify symptoms, then to assign duration assertions to address part of this project. The v3NLP tool was recently enhanced to find problems, treatments, and tests for the i2b2/VA challenge. The problem capability is being further refined to find symptoms. Machine learning models will be developed using an annotated corpus currently in development to find duration assertions.

Submitted by teresa.hamby@d… on
Description

Current methods for influenza surveillance include laboratory confirmed case reporting, sentinel physician reporting of Influenza-Like-Illness (ILI) and chief-complaint monitoring from emergency departments (EDs).

The current methods for monitoring influenza have drawbacks. Testing for the presence of the influenza virus is costly and delayed. Specific, sentinel physician reporting is subject to incomplete, delayed reporting. Chief complaint (CC) based surveillance is limited in that a patient’s chief complaint will not contain all signs and symptoms of a patient.

A possible solution to the cost, delays, incompleteness and low specificity (for CC) in current methods of influenza surveillance is automated surveillance of ILI using clinician-provided free-text ED reports.

 

Objective

This paper describes an automated ILI reporting system based on natural language processing of transcribed ED notes and its impact on public health practice at the Allegheny County Health Department.

Submitted by hparton on
Description

PyConTextKit is a web-based platform that extracts entities from clinical text and provides relevant metadata - for example, whether the entity is negated or hypothetical - using simple lexical clues occurring in the window of text surrounding the entity. The system provides a flexible framework for clinical text mining, which in turn expedites the development of new resources and simplifies the resulting analysis process. PyConTextKit is an extension of an existing Python implementation of the ConText algorithm, which has been used successfully to identify patients with an acute pulmonary embolism and to identify patients with findings consistent with seven syndromes. Public health practitioners are beginning to have access to clinical symptoms, findings, and diagnoses from the EMR. Making use of this data is difficult, because much of it is in the form of free text. Natural language processing techniques can be leveraged to make sense of this text, but such techniques often require technical expertise. PyConTextKit provides a web-based interface that makes it easier for the user to perform concept identification for surveillance. We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

 

Objective

We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

Submitted by elamb on