Text Mining

Despite considerable effort since the turn of the century to develop Natural Language Processing (NLP) methods and tools for detecting negated terms in chief complaints, few standardised methods have emerged. Those methods that have emerged (e.g. the NegEx algorithm) are confined to local implementations with customised solutions. Important reasons for this lack of progress include (a) limited shareable datasets for developing and testing methods (b) jurisdictional data silos, and (c) the gap between resource-constrained public health practitioners and technical solution developers, typically university researchers and industry developers. To address these three problems ISDS, funded by a grant from the Defense Threat Reduction Agency, organized a consultancy meeting at the University of Utah designed to bring together (a) representatives from public health departments, (b) university researchers focused on the development of computational methods for public health surveillance, (c) members of public health oriented non-governmental organisations, and (d) industry representatives, with the goal of developing a roadmap for the development of validated, standardised and portable resources (methods and data sets) for negation detection in clinical text used for public health surveillance.

Objective: This abstract describes an ISDS initiative to bring together public health practitioners and analytics solution developers from both academia and industry to define a roadmap for the development of algorithms, tools, and datasets to improve the capabilities of current text processing algorithms to identify negated terms (i.e. negation detection).

Submitted by elamb on Tue, 06/18/2019 - 20:22

Ontologies representing knowledge from the public health and surveillance domains currently exist. However, they focus on infectious diseases (infectious disease ontology), reportable diseases (PHSkbFretired) and internet surveillance from news text (BioCaster ontology), or are commercial products (OntoReason public health ontology). From the perspective of biosurveillance text mining, these ontologies do not adequately represent the kind of knowledge found in clinical reports. Our project aims to fill this gap by developing a stand-alone ontology for the public health/biosurveillance domain, which (1) provides a starting point for standard development, (2) is straightforward for public health professionals to use for text analysis, and (3) can be easily plugged into existing syndromic surveillance systems.

Objective

To develop an application ontology - the extended syndromic surveillance ontology - to support text mining of ER and radiology reports for public health surveillance. The ontology encodes syndromes, diagnoses, symptoms, signs and radiology results relevant to syndromic surveillance (with a special focus on bioterrorism).

Referenced File

Developing_An_Application_Ontology_For_Mining_Clinical_Reports_The_Extended_Syndromic_Surveillance_Ontology.pdf

Submitted by hparton on Fri, 06/14/2019 - 10:58

Emerging event detection is the process of automatically identifying novel and emerging ideas from text with minimal human intervention. With the rise of social networks like Twitter, topic detection has begun leveraging measures of user influence to identify emerging events. Twitter's highly skewed follower/followee structure lends itself to an intuitive model of influence, yet in a context like the Emerging Infections Network (EIN), a sentinel surveillance listserv of over 1400 infectious disease experts, developing a useful model of authority becomes less clear. Who should we listen to on the EIN? To explore this, we annotated a body of important EIN discussions and tested how well 3 models of user authority performed in identifying those discussions. In previous work we proposed a process by which only posts that are based on specific "important" topics are read, thus drastically reducing the amount of posts that need to be read. The process works by finding a set of "bellwether" users that act as indicators for "important" topics and only posts relating to these topics are then read. This approach does not consider the text of messages, only the patterns of user participation. Our text analysis approach follows that of Cataldi et al.[1], using the idea of semantic "energy" to identify emerging topics within Twitter posts. Authority is calculated via PageRank and used to weight each author's contribution to the semantic energy of all terms occurring in within some interval ti. A decay parameter d defines the impact of prior time steps on the current interval.

Objective

To explore how different models of user influence or authority perform when detecting emerging events within a small-scale community of infectious disease experts.

Referenced File

Who_Should_We_Be_Listening_To_Applying_Models_Of_User_Authority_To_Detecting_Emerging_Topics_On_The_Ein.pdf