Skip to main content

Conway Mike

Description

Despite considerable effort since the turn of the century to develop Natural Language Processing (NLP) methods and tools for detecting negated terms in chief complaints, few standardised methods have emerged. Those methods that have emerged (e.g. the NegEx algorithm) are confined to local implementations with customised solutions. Important reasons for this lack of progress include (a) limited shareable datasets for developing and testing methods (b) jurisdictional data silos, and (c) the gap between resource-constrained public health practitioners and technical solution developers, typically university researchers and industry developers. To address these three problems ISDS, funded by a grant from the Defense Threat Reduction Agency, organized a consultancy meeting at the University of Utah designed to bring together (a) representatives from public health departments, (b) university researchers focused on the development of computational methods for public health surveillance, (c) members of public health oriented non-governmental organisations, and (d) industry representatives, with the goal of developing a roadmap for the development of validated, standardised and portable resources (methods and data sets) for negation detection in clinical text used for public health surveillance.

Objective: This abstract describes an ISDS initiative to bring together public health practitioners and analytics solution developers from both academia and industry to define a roadmap for the development of algorithms, tools, and datasets to improve the capabilities of current text processing algorithms to identify negated terms (i.e. negation detection).

Submitted by elamb on
Description

Mining text for real-time syndromic surveillance usually requires a comprehensive knowledge base (KB) which contains detailed information about concepts relevant to the domain, such as disease names, symptoms, drugs, and radiology findings. Two such resources are the Biocaster Ontology [1] and the Extended Syndromic Surveillance Ontology (ESSO) [2]. However, both these resources are difficult to manipulate, customize, reuse and extend without knowledge of ontology development environments (like Protege) and Semantic Web standards (like RDF and OWL). The cKASS software tool provides an easy-to-use, adaptable environment for extending and modifying existing syndrome definitions via a web-based Graphical User Interface, which does not require knowledge of complex, ontology-editing environments or semantic web standards. Further, cKASS allows for - indeed encourages - the sharing of user-defined syndrome definitions, with collaborative features that will enhance the ability of the surveillance community to quickly generate new definitions in response to emerging threats.

Objective

We describe cKASS (clinical Knowledge Authoring & Sharing Service), a system designed to facilitate the authoring and sharing of knowledge resources that can be applied to syndromic surveillance.

Submitted by elamb on
Description

PyConTextKit is a web-based platform that extracts entities from clinical text and provides relevant metadata - for example, whether the entity is negated or hypothetical - using simple lexical clues occurring in the window of text surrounding the entity. The system provides a flexible framework for clinical text mining, which in turn expedites the development of new resources and simplifies the resulting analysis process. PyConTextKit is an extension of an existing Python implementation of the ConText algorithm, which has been used successfully to identify patients with an acute pulmonary embolism and to identify patients with findings consistent with seven syndromes. Public health practitioners are beginning to have access to clinical symptoms, findings, and diagnoses from the EMR. Making use of this data is difficult, because much of it is in the form of free text. Natural language processing techniques can be leveraged to make sense of this text, but such techniques often require technical expertise. PyConTextKit provides a web-based interface that makes it easier for the user to perform concept identification for surveillance. We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

 

Objective

We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

Submitted by elamb on
Description

The Extended Syndromic Surveillance Ontology (ESSO) is an open source terminological ontology designed to facilitate the text mining of clinical reports in English [1,2]. At the core of ESSO are 279 clinical concepts (for example, fever, confusion, headache, hallucination, fatigue) grouped into eight syndrome categories (rash, hemorrhagic, botulism, neurological, constitutional, influenza-like-illness, respiratory, and gastrointestinal). In addition to syndrome groupings, each concept is linked to synonyms, variant spellings and UMLS Concept Unique Identifiers. ESSO builds on the Syndromic Surveillance Ontology [3], a resource developed by a working group of eighteen researchers representing ten syndromic surveillance systems in North America. ESSO encodes almost three times as many clinical concepts as the Syndromic Surveillance Ontology, and incorporates eight syndrome categories, in contrast to the Syndromic Surveillance Ontology's four (influenza-like-illness, constitutional, respiratory and gastrointestinal). The new clinical concepts and syndrome groupings in ESSO were developed by a board-certified infectious disease physician (author JD) in conjunction with an informaticist (author MC).

Objective

In order to evaluate and audit these new syndrome definitions, we initiated a survey of syndromic surveillance practitioners. We present the results of an online survey designed to evaluate syndrome definitions encoded in the Extended Syndromic Surveillance Ontology.

Submitted by elamb on
Description

In recent years, individuals have been using social network sites like Facebook, Twitter, and Reddit to discuss health-related topics. These social media platforms consequently became new avenues for research and applications for researchers, for instance disease surveillance. Reddit, in particular, can potentially provide more in-depth contextual insights compared to Twitter, and Reddit members discuss potentially more diverse topics than Facebook members. However, identifying relevant discussions remains a challenge in large datasets like Reddit. Thus, much previous research using Reddit data focused on selected few topically-oriented sub-communities. Although such approach allows for topically focused datasets, a large portion of related data can be missed. In this research, we examine all sub-communities in which members are discussing e-cigarettes in order to determine if investigating these other sub-communities could result in a better smoking surveillance system.

Objective:

We aim to explore how to effectively leverage social media for vaping electronic cigarette (e-cigarette) surveillance. This study examines how members of a social media platform called Reddit utilize topically-oriented sub-communities for e-cigarette discussions.

Submitted by elamb on
Description

Nearly 100 people per day die from opioid overdose in the United States. Further, prescription opioid abuse is assumed to be responsible for a 15-year increase in opioid overdose deaths. However, with increasing use of social media comes increasing opportunity to seek and share information. For instance, 80% of Internet users obtain health information online, including popular social interaction sites like Reddit (http://www.reddit.com), which had more than 82.5 billion page views in 20153. In Reddit, members often share information, and include URLs to supplement the information. Understanding the frequency of URL sharing and types of shared URLs can improve our knowledge of information seeking/sharing behaviors as well as domains of shared information on social media. Such knowledge has the potential to provide opportunities to improve public health surveillance practice. We use Reddit to track opioid related discussions and then investigate types of shared URLs among Reddit members in those discussions.

Objective:

We aim to understand (1) the frequency of URL sharing and (2) types of shared URLs among opioid related discussions that take place in the social media platform called Reddit.

Submitted by elamb on
Description

Since their introduction to the US market in 2007, electronic cigarettes (e-cigarettes) have posed considerable challenges to both public health authorities and government regulators, especially given the debate – in both the scientific world and the community at large – regarding the potential advantages (e.g. helping individuals quit smoking) and disadvantages (e.g. renormalizing smoking) associated with the product1. Similarly, hookah – a kind of waterpipe used to smoke flavored tobacco – has increased in popularity in recent years, is known to be particularly popular among younger people, and has prompted a range of regulatory responses2. One important – and currently largely unexplored – area of research involves exploring consumer perceptions and experiences of these emerging tobacco products. In this work, we use online health discussion forums in conjunction with text mining and novel data visualization techniques to investigate consumer perceptions and experiences of e-cigarettes and hookah, focusing on the automatic identification of symptoms associated with each product, and consumer motivations for product use. Previous related research has focused on using text-mining to analyze e-cigarette or hookah related Twitter posts3,4 and on the qualitative identification of e-cigarette related symptoms from online discussion forums5. The research reported in this abstract is – to the best of our knowledge – the first time that text mining techniques have been used with online health forums to understand e-cigarette or hookah use.

Objective

Our aim in this work is to apply text mining and novel visualization techniques to textual data derived from online health discussion forums in order to better understand consumers’ experiences and perceptions of electronic cigarettes and hookah.

 

Submitted by Magou on
Description

In recent years, the use of social media has increased at an unprecedented rate. For example, the popular social media platform Reddit (http://www.reddit.com) had 83 billion page views from over 88,000 active sub-communities (subreddits) in 2015. Members of Reddit made over 73 million individual posts and over 725 million associated comments in the same year [1]. We use Reddit to track opium related discussions, because Reddit allows for throwaway and unidentifiable accounts that are suitable for stigmatized discussions that may not be appropriate for identifiable accounts. Reddit members exchange conversation via a forum like platform, and members who have achieved a certain status within the community are able to create new topically focused group called subreddits.

Objective

We aim to develop an automated method to track opium related discussions that are made in the social media platform called Reddit. As a first step towards this goal, we use a keyword-based approach to track how often Reddit members discuss opium related issues.

Submitted by Magou on

Since the 1990s tobacco control strategies --- at least in the United States and some developed countries --- have had considerable success in reducing the number of new smokers and encouraging existing smokers to quit through the creation of a regulatory infrastructure designed to monitor tobacco sales, limit advertising for tobacco products, and "denormalize" smoking in public places.