Skip to main content

Chapman Wendy

Description

There are a number of Natural Language Processing (NLP) annotation and Information Extraction (IE) systems and platforms that have been successfully used within the medical domain. Although these groups share components of their systems, there has not been a successful effort in the medical domain to codify and standardize either the syntax or semantics between systems to allow for interoperability between annotation tools, NLP tools, IE tools, corpus evaluation tools and encoded clinical documents. There are two components to a successful interoperability standard: an information and a semantic model.

Objective

The Consortium for Healthcare Informatics Research, a Department of Veterans Affairs (VA) Office of Research and Development is sponsoring the development of a standard ontology and information model for Natural Language Processing interoperability within the biomedical domain.

Submitted by uysz on
Description

We are developing a Bayesian surveillance system for realtime surveillance and characterization of outbreaks that incorporates a variety of data elements, including free-text clinical reports. An existing natural language processing (NLP) system called Topaz is being used to extract clinical data from the reports. Moving the NLP system from a research project to a real-time service has presented many challenges.

 

Objective

Adapt an existing NLP system to be a useful component in a system performing real-time surveillance.

Submitted by hparton on
Description

Ontologies representing knowledge from the public health and surveillance domains currently exist. However, they focus on infectious diseases (infectious disease ontology), reportable diseases (PHSkbFretired) and internet surveillance from news text (BioCaster ontology), or are commercial products (OntoReason public health ontology). From the perspective of biosurveillance text mining, these ontologies do not adequately represent the kind of knowledge found in clinical reports. Our project aims to fill this gap by developing a stand-alone ontology for the public health/biosurveillance domain, which (1) provides a starting point for standard development, (2) is straightforward for public health professionals to use for text analysis, and (3) can be easily plugged into existing syndromic surveillance systems.

 

Objective

To develop an application ontology - the extended syndromic surveillance ontology - to support text mining of ER and radiology reports for public health surveillance. The ontology encodes syndromes, diagnoses, symptoms, signs and radiology results relevant to syndromic surveillance (with a special focus on bioterrorism).

Submitted by hparton on
Description

In 2010, as rules for the Centers for Medicaid and Medicare Electronic Heatlh Record (EHR) Incentive Programs (Meaningful Use)(1), were finalized, ISDS became aware of a trend towards new EHR systems capturing or sending emergency department (ED) chief complaint (CC) data as structured variables without including the free-text. This perceived shift in technology was occurring in the absence of consensus-based technical requirements for syndromic surveillance and survey data on the value of free-text CC to public health practice. On 1/31/11, ISDS, in collaboration with CDC BioSense, recommended a core set of data for public health syndromic surveillance (PHSS) to support public health's participation in Meaningful Use.

Objective

This study was conducted to better support a requirement for ED CC as free-text, by investigating the relationship between the unstructured, free-text form of CC data and its usefulness in public health practice. To better inform health IT standardization practices, specifically related to Meaningful Use, by describing how US public health agencies use unstructured, free-text EHR data to monitor, assess, investigate and manage issues of public health interest.

Submitted by elamb on
Description

Mining text for real-time syndromic surveillance usually requires a comprehensive knowledge base (KB) which contains detailed information about concepts relevant to the domain, such as disease names, symptoms, drugs, and radiology findings. Two such resources are the Biocaster Ontology [1] and the Extended Syndromic Surveillance Ontology (ESSO) [2]. However, both these resources are difficult to manipulate, customize, reuse and extend without knowledge of ontology development environments (like Protege) and Semantic Web standards (like RDF and OWL). The cKASS software tool provides an easy-to-use, adaptable environment for extending and modifying existing syndrome definitions via a web-based Graphical User Interface, which does not require knowledge of complex, ontology-editing environments or semantic web standards. Further, cKASS allows for - indeed encourages - the sharing of user-defined syndrome definitions, with collaborative features that will enhance the ability of the surveillance community to quickly generate new definitions in response to emerging threats.

Objective

We describe cKASS (clinical Knowledge Authoring & Sharing Service), a system designed to facilitate the authoring and sharing of knowledge resources that can be applied to syndromic surveillance.

Submitted by elamb on
Description

PyConTextKit is a web-based platform that extracts entities from clinical text and provides relevant metadata - for example, whether the entity is negated or hypothetical - using simple lexical clues occurring in the window of text surrounding the entity. The system provides a flexible framework for clinical text mining, which in turn expedites the development of new resources and simplifies the resulting analysis process. PyConTextKit is an extension of an existing Python implementation of the ConText algorithm, which has been used successfully to identify patients with an acute pulmonary embolism and to identify patients with findings consistent with seven syndromes. Public health practitioners are beginning to have access to clinical symptoms, findings, and diagnoses from the EMR. Making use of this data is difficult, because much of it is in the form of free text. Natural language processing techniques can be leveraged to make sense of this text, but such techniques often require technical expertise. PyConTextKit provides a web-based interface that makes it easier for the user to perform concept identification for surveillance. We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

 

Objective

We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

Submitted by elamb on
Description

The Extended Syndromic Surveillance Ontology (ESSO) is an open source terminological ontology designed to facilitate the text mining of clinical reports in English [1,2]. At the core of ESSO are 279 clinical concepts (for example, fever, confusion, headache, hallucination, fatigue) grouped into eight syndrome categories (rash, hemorrhagic, botulism, neurological, constitutional, influenza-like-illness, respiratory, and gastrointestinal). In addition to syndrome groupings, each concept is linked to synonyms, variant spellings and UMLS Concept Unique Identifiers. ESSO builds on the Syndromic Surveillance Ontology [3], a resource developed by a working group of eighteen researchers representing ten syndromic surveillance systems in North America. ESSO encodes almost three times as many clinical concepts as the Syndromic Surveillance Ontology, and incorporates eight syndrome categories, in contrast to the Syndromic Surveillance Ontology's four (influenza-like-illness, constitutional, respiratory and gastrointestinal). The new clinical concepts and syndrome groupings in ESSO were developed by a board-certified infectious disease physician (author JD) in conjunction with an informaticist (author MC).

Objective

In order to evaluate and audit these new syndrome definitions, we initiated a survey of syndromic surveillance practitioners. We present the results of an online survey designed to evaluate syndrome definitions encoded in the Extended Syndromic Surveillance Ontology.

Submitted by elamb on
Description

A major goal of biosurveillance is the timely detection of an infectious disease outbreak. Once a disease has been identified, another very important goal is to find all known cases of the disease to assist public health investigators. Natural language processing (NLP) systems may be able to assist in identifying epidemiological variables and decrease time-consuming manual review of records.

 

Objective

To identify epidemiologically important factors such as infectious disease exposure history, travel or specific variables from unstructured data using NLP methods.

Submitted by elamb on
Description

There exists no standard set of syndromes for syndromic surveillance, and available syndromic case definitions demonstrate substantial heterogeneity of findings constituting the definition. Many syndromic case definitions require the presence of a syndromic finding (e.g., cough or diarrhea) and a fever.

 

Objective

Automated syndromic surveillance systems often use chief complaints as input. Our objective was to determine whether chief complaints accurately represent whether a patient has any of the following febrile syndromes: Febrile respiratory, febrile gastrointestinal, febrile rash, febrile neurological, or febrile hemorrhagic.

Submitted by elamb on
Description

Case detection from chief complaints suffers from low to moderate sensitivity. Emergency Department (ED) reports contain detailed clinical information that could improve case detection ability and enhance outbreak characterization. We developed a text processing system called Topaz that could be used to answer questions from ED reports, such as: How many new patients have come to the ED with acute lower respiratory symptoms? Of the respiratory patients, how many had a productive cough or wheezing? How many of the respiratory patients have a past history of asthma?

 

Objective

To evaluate how well a text processing system called Topaz can identify acute episodes of 55 clinical conditions described in ED notes.

Submitted by elamb on