Skip to main content

Natural Language Processing (NLP)

Description

We previously experimented with tracking influenza in ER chief complaint data using existing syndromic surveillance tools. We identified several deficiencies in these tools: poor natural language processing, inefficient user interfaces, frequent (thus costly) false alarms, and one-size-fits-all approaches to syndromes. Furthermore, we were surprised that some epidemiologists we spoke with had relatively little faith in existing surveillance tools, and so we set out to build one that would address their concerns: DADAR (Data Analysis, Detection, And Response).

Objective

To develop an adaptable platform for periodically loading semi-structured medical text, extracting syndromic information using advanced natural language processing, detecting outbreaks in the data (including the ability to tune sensitivity vs. specificity on a syndrome-by-syndrome basis so as to reduce the rate of false alarms), generating timely cartographic surveillance reports, and providing tools to quickly validate or rule out syndromic alerts.

Submitted by knowledge_repo… on
Description

The variability of free text emergency department (ED) data is problematic for biosurveillance, and current methods of identifying search terms for symptoms of interest are inefficient as well as time- and labor-intensive. Our ad hoc approach to term identification for the North Carolina Disease and Epidemiologic Collection Tool (NC DETECT) begins with development of clinical case definitions from which we build automated syndrome queries in standard query language. The queries are used to search free text clinical data from EDs, with the goal of identifying free text terms to match the case definitions. The free text search terms were initially collected from epidemiologists and clinical and technical staff at NC DETECT through informal review of ED data. Over time, we reviewed individual cases missed by our queries and identified additional search terms. We also manually reviewed records to find misspellings, abbreviations and acronyms for known search terms (e.g., dypnea, diff. br. and SHOB for dyspnea), and developed a pre-processor to clean text prior to syndromic classification. The purpose of this project was to develop and test a more standardized approach to search term identification.

 

Objective

This paper describes and applies a new method for identifying biosurveillance search terms using the Semantic Network of the Unified Medical Language System.

Submitted by elamb on
Description

A major goal of biosurveillance is the timely detection of an infectious disease outbreak. Once a disease has been identified, another very important goal is to find all known cases of the disease to assist public health investigators. Natural language processing (NLP) systems may be able to assist in identifying epidemiological variables and decrease time-consuming manual review of records.

 

Objective

To identify epidemiologically important factors such as infectious disease exposure history, travel or specific variables from unstructured data using NLP methods.

Submitted by elamb on
Description

Free-text emergency department triage chief complaints (CCs) are a popular data source used by many syndromic surveillance systems because of their timeliness, availability, and relevance. The lack of standardization of CC vocabulary poses a major technical challenge to any automatic CC classification approach. This challenge can be partially addressed by several methods, for example, medical thesaurus, spelling check, manually-created synonym list, and supervised machine learning techniques that directly operate on free text. Current approaches, however, ignore the fact that medical terms appearing in CCs are often semantically related. Our research exploits such semantic relations through a medical ontology in the context of automatic CC classification for syndromic surveillance.

 

Objective

This paper presents a novel approach of using a medical ontology to classify free-text CCs into syndrome categories.

Submitted by elamb on
Description

Malaria control programs suffer from weak and fragmented surveillance of the wide range of information required to manage the disease effectively and efficiently. A computational framework to manage, integrate, analyze, and visualize the data resources, a cyberenvironment, can improve the surveillance and the outcomes.

 

Objective

This paper presents an ontology of a cyberenvironment for malaria surveillance. The ontology encapsulates a comprehensive natural language enumeration of the requirements of the cyberenvironment using a structured terminology. It can be used to systematically analyze and prioritize the functions of the cyberenvironment. It will help the medical, individual, environmental, and strategic management of malaria.

Submitted by elamb on
Description

An N-gram is a sub-sequence of n items from a given sequence where n can be 1, 2,…, n and the items can be letters or words. N-gram models are widely used in statistical natural language processing [3]. In the syndromic surveillance context, N-grams can be used to cluster or classify natural language data.  They can also help in the design of kernels for machine learning algorithms such as support vector machines to learn from text data.  This work calculates the similarity percentages of ED or TH reasons to syndromic fingerprints using Ngrams. We define “reasons similarity” as the percentage of matched N-grams derived from the reasons field of an ED or TH record with the fingerprint of a syndrome. The fingerprint of a syndrome is a list of frequent N-grams related to this syndrome.  This fingerprint is constructed by collecting a large sample of classified reasons data for a particular syndrome, calculating all of the N-grams for this set and then selecting the most frequent N-grams to form a profile or fingerprint. N-gram generation may require extensive processing time especially for large files but this issue has been addressed by using parallel computation.

Objective

The objective of this work is to identify syndromic fingerprints in reasons for entering an emergency department (ED) or calling telehealth (TH). It also demonstrates that these fingerprints are valuable for classification.

Submitted by elamb on
Description

With increased penetration of clinical information system products and increased interest in clinical data exchange, a variety of clinician’s notes are becoming available for surveillance. Chief complaints have been studied extensively, and emergency department notes have received attention, but narrative clinic visit notes have gotten little attention.

 

Objective

To assess the performance of an unmodified, general purpose natural language processing system to detect fever, and to assess the feasibility of parsing visit notes for syndromic surveillance.

Submitted by elamb on