Natural Language Processing (NLP)

Presented November 27, 2018.

We previously experimented with tracking influenza in ER chief complaint data using existing syndromic surveillance tools. We identified several deficiencies in these tools: poor natural language processing, inefficient user interfaces, frequent (thus costly) false alarms, and one-size-fits-all approaches to syndromes. Furthermore, we were surprised that some epidemiologists we spoke with had relatively little faith in existing surveillance tools, and so we set out to build one that would address their concerns: DADAR (Data Analysis, Detection, And Response).

Objective

To develop an adaptable platform for periodically loading semi-structured medical text, extracting syndromic information using advanced natural language processing, detecting outbreaks in the data (including the ability to tune sensitivity vs. specificity on a syndrome-by-syndrome basis so as to reduce the rate of false alarms), generating timely cartographic surveillance reports, and providing tools to quickly validate or rule out syndromic alerts.

Submitted by knowledge_repo… on Wed, 08/22/2018 - 21:29

The variability of free text emergency department (ED) data is problematic for biosurveillance, and current methods of identifying search terms for symptoms of interest are inefficient as well as time- and labor-intensive. Our ad hoc approach to term identification for the North Carolina Disease and Epidemiologic Collection Tool (NC DETECT) begins with development of clinical case definitions from which we build automated syndrome queries in standard query language. The queries are used to search free text clinical data from EDs, with the goal of identifying free text terms to match the case definitions. The free text search terms were initially collected from epidemiologists and clinical and technical staff at NC DETECT through informal review of ED data. Over time, we reviewed individual cases missed by our queries and identified additional search terms. We also manually reviewed records to find misspellings, abbreviations and acronyms for known search terms (e.g., dypnea, diff. br. and SHOB for dyspnea), and developed a pre-processor to clean text prior to syndromic classification. The purpose of this project was to develop and test a more standardized approach to search term identification.

Objective

This paper describes and applies a new method for identifying biosurveillance search terms using the Semantic Network of the Unified Medical Language System.

Referenced File

Using_Umls_Semantic_Network_To_Identify_Search_Terms_For_Biosurveillance.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

A major goal of biosurveillance is the timely detection of an infectious disease outbreak. Once a disease has been identified, another very important goal is to find all known cases of the disease to assist public health investigators. Natural language processing (NLP) systems may be able to assist in identifying epidemiological variables and decrease time-consuming manual review of records.

Objective

To identify epidemiologically important factors such as infectious disease exposure history, travel or specific variables from unstructured data using NLP methods.

Referenced File

Using_Nlp_On_Va_Electronic_Medical_Records_To_Facilitate_Epidemiologic_Case_Investigations.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

Free-text emergency department triage chief complaints (CCs) are a popular data source used by many syndromic surveillance systems because of their timeliness, availability, and relevance. The lack of standardization of CC vocabulary poses a major technical challenge to any automatic CC classification approach. This challenge can be partially addressed by several methods, for example, medical thesaurus, spelling check, manually-created synonym list, and supervised machine learning techniques that directly operate on free text. Current approaches, however, ignore the fact that medical terms appearing in CCs are often semantically related. Our research exploits such semantic relations through a medical ontology in the context of automatic CC classification for syndromic surveillance.

Objective

This paper presents a novel approach of using a medical ontology to classify free-text CCs into syndrome categories.

Referenced File

Ontology-Based_Automatic_Chief_Complaints_Classification_For_Syndromic_Surveillance.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

Malaria control programs suffer from weak and fragmented surveillance of the wide range of information required to manage the disease effectively and efficiently. A computational framework to manage, integrate, analyze, and visualize the data resources, a cyberenvironment, can improve the surveillance and the outcomes.

Objective

This paper presents an ontology of a cyberenvironment for malaria surveillance. The ontology encapsulates a comprehensive natural language enumeration of the requirements of the cyberenvironment using a structured terminology. It can be used to systematically analyze and prioritize the functions of the cyberenvironment. It will help the medical, individual, environmental, and strategic management of malaria.

Referenced File

Ontology_Of_A_Cyberenvironment_For_Malaria_Surveillance.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

Objective

To demonstrate how natural language processing (NLP) of clinical records can contribute to case detection and characterization in biosurveillance.

Referenced File

Natural_Language_Processing_Can_It_Help_Detect_Cases_And_Characterize_Outbreaks_.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

An N-gram is a sub-sequence of n items from a given sequence where n can be 1, 2,…, n and the items can be letters or words. N-gram models are widely used in statistical natural language processing [3]. In the syndromic surveillance context, N-grams can be used to cluster or classify natural language data. They can also help in the design of kernels for machine learning algorithms such as support vector machines to learn from text data. This work calculates the similarity percentages of ED or TH reasons to syndromic fingerprints using Ngrams. We define “reasons similarity” as the percentage of matched N-grams derived from the reasons field of an ED or TH record with the fingerprint of a syndrome. The fingerprint of a syndrome is a list of frequent N-grams related to this syndrome. This fingerprint is constructed by collecting a large sample of classified reasons data for a particular syndrome, calculating all of the N-grams for this set and then selecting the most frequent N-grams to form a profile or fingerprint. N-gram generation may require extensive processing time especially for large files but this issue has been addressed by using parallel computation.

Objective

The objective of this work is to identify syndromic fingerprints in reasons for entering an emergency department (ED) or calling telehealth (TH). It also demonstrates that these fingerprints are valuable for classification.

Referenced File

Identifying_Syndromic_Fingerprints_In_Reason_Fields_In_Emergency_Department_Or_Telehealth_Records_Using_N-Grams_For_Similarity_Analysis.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

Objective

We asked to what extent computerized processing of the full free-text clinical documentation could enhance syndrome detection compared to the sole use of structured data elements from a comprehensive electronic medical record.

Referenced File

Free-Text_Processing_To_Enhance_Detection_Of_Acute_Respiratory_Infections.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

With increased penetration of clinical information system products and increased interest in clinical data exchange, a variety of clinician’s notes are becoming available for surveillance. Chief complaints have been studied extensively, and emergency department notes have received attention, but narrative clinic visit notes have gotten little attention.

Objective

To assess the performance of an unmodified, general purpose natural language processing system to detect fever, and to assess the feasibility of parsing visit notes for syndromic surveillance.

Referenced File

Fever_Detection_In_Clinic_Visit_Notes_Using_A_General_Purpose_Processor.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40

Subscribe to Natural Language Processing (NLP)