Skip to main content

A web-based platform to support text mining of clinical reports for public health surveillance

Description

PyConTextKit is a web-based platform that extracts entities from clinical text and provides relevant metadata - for example, whether the entity is negated or hypothetical - using simple lexical clues occurring in the window of text surrounding the entity. The system provides a flexible framework for clinical text mining, which in turn expedites the development of new resources and simplifies the resulting analysis process. PyConTextKit is an extension of an existing Python implementation of the ConText algorithm, which has been used successfully to identify patients with an acute pulmonary embolism and to identify patients with findings consistent with seven syndromes. Public health practitioners are beginning to have access to clinical symptoms, findings, and diagnoses from the EMR. Making use of this data is difficult, because much of it is in the form of free text. Natural language processing techniques can be leveraged to make sense of this text, but such techniques often require technical expertise. PyConTextKit provides a web-based interface that makes it easier for the user to perform concept identification for surveillance. We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

 

Objective

We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

Submitted by elamb on