Skip to main content

Data Analytics

Description

PyConTextKit is a web-based platform that extracts entities from clinical text and provides relevant metadata - for example, whether the entity is negated or hypothetical - using simple lexical clues occurring in the window of text surrounding the entity. The system provides a flexible framework for clinical text mining, which in turn expedites the development of new resources and simplifies the resulting analysis process. PyConTextKit is an extension of an existing Python implementation of the ConText algorithm, which has been used successfully to identify patients with an acute pulmonary embolism and to identify patients with findings consistent with seven syndromes. Public health practitioners are beginning to have access to clinical symptoms, findings, and diagnoses from the EMR. Making use of this data is difficult, because much of it is in the form of free text. Natural language processing techniques can be leveraged to make sense of this text, but such techniques often require technical expertise. PyConTextKit provides a web-based interface that makes it easier for the user to perform concept identification for surveillance. We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

 

Objective

We describe the development of a web based application - PyConTextKit - to support text mining of clinical reports for public health surveillance.

Submitted by elamb on
Description

Syndromic surveillance systems use electronic health-related data to support near-real time disease surveillance. Over the last 10 years, the use of ILI syndromes defined from emergency department (ED) data has become an increasingly accepted strategy for public health influenza surveillance at the local and national levels. However, various ILI definitions exist and few studies have used patient-level data to describe validity for influenza specifically.

Objective

Estimate and compare the accuracy of various ILI syndromes for detecting lab-confirmed influenza in children.

Submitted by elamb on
Description

Influenza-like illness (ILI) data is collected via an Influenza Sentinel Provider Surveillance Network at the state level. Because participation is voluntary, locations of the sentinel providers may not reflect optimal geographic placement. This study analyzes two different geographic placement schemes - a maximal coverage model (MCM) and a K-median model, two location-allocation models commonly used in geographic information systems. The MCM chooses sites in areas with the densest population. The K-median model chooses sites which minimize the average distance traveled by individuals to their nearest site. We have previously shown how a placement model can be used to improve population coverage for ILI surveillance in Iowa when considering the sites recruited by the Iowa Department of Public Health. We extend this work by evaluating different surveillance placement algorithms with respect to outbreak intensity and timing (i.e., being able to capture the start, peak and end of the influenza season).

 

Objective

To evaluate the performance of several sentinel surveillance site placement algorithms for ILI surveillance systems. We explore how these different approaches perform by capturing both the overall intensity and timing of influenza activity in the state of Iowa.

Submitted by elamb on
Description

INDICATOR is a multi-stream open source platform for biosurveillance and outbreak detection, currently focused on Champaign County in Illinois. It has been in production since 2008 and is currently receiving data from emergency department, patient advisory nurse, outpatient convenient care clinic, school absenteeism, animal control, and weather sources. Historical data from some of these sources goes back to 2006.

 

Objective

To examine the correlation between different types of surveillance signals and climate information obtained from a well-defined geographic area.

Submitted by elamb on
Description

Block 3 of the US Military Electronic Surveillance System for Early Notification of Community-Based Epidemics (ESSENCE) system affords routine access to multiple sources of data. These include administrative clinical encounter records in the Comprehensive Ambulatory Patient Encounter Record (CAPER), records of filled prescription orders in the Pharmacy Data Transaction Service, developed at the Department of Defense (DoD) Pharmacoeconomic Center, Laboratory test orders and results in HL7 format, and others. CAPER records include a free-text Reason for Visit field, analogous to chief complaint text in civilian records, and entered by screening personnel rather than the treating healthcare provider. Other CAPER data fields are related to case severity. DoD ESSENCE treats the multiple, recently available data sources separately, requiring users to integrate algorithm results from the various evidence types themselves. This project used a Bayes Network approach to create an ESSENCE module for analytic integration, combining medical expertise with analysis of 4 years of data using documented outbreaks.

 

Objective

The project objective was to develop and test a decision support module using the multiple data sources available in the U.S. DoD version of ESSENCE.

Submitted by elamb on
Description

Event-based biosurveillance is a practice of monitoring diverse information sources for the detection of events pertaining to human, plant, and animal health. Online documents, such as news articles, newsletters, and (micro-) blog entries, are primary information sources in it. Document classification is an important step to filter information and machine learning methods have been successfully applied to this task.

 

Objective

The objective of this literature review is to identify current challenges in document classification for event-based biosurveillance and consider the necessary efforts and the research opportunity.

Submitted by elamb on
Description

Commonly used syndromic surveillance methods based on the spatial scan statistic first classify disease cases into broad, pre-existing symptom categories ("prodromes") such as respiratory or fever, then detect spatial clusters where the recent case count of some prodrome is unexpectedly high. Novel emerging infections may have very specific and anomalous symptoms which should be easy to detect even if the number of cases is small. However, typical spatial scan approaches may fail to detect a novel outbreak if the resulting cases are not classified to any known prodrome. Alternatively, detection may be delayed because cases are lumped into an overly broad prodrome, diluting the outbreak signal.

 

Objective

We propose a new text-based spatial event detection method, the semantic scan statistic, which uses free-text data from Emergency Department chief complaints to detect, localize, and characterize newly emerging outbreaks of disease.

Submitted by elamb on
Description

Emergency Departments (ED) supply critical infrastructure to provide medical care in the event of a disaster or disease outbreak, including seasonal and pandemic influenza [1]. Already over-crowded and stretched to near-capacity, influenza activity augments patient volumes and increases ED crowding [2,3]; high ED patient volumes expected during a true influenza pandemic represents a significant threat to the nation's healthcare infrastructure [4]. EDs ability to manage both seasonal and pandemic influenza surges is dependent on coupling early detection with graded rapid response. Although many EDs have devised influenza response measures, the potential utility of coupling early warning systems with various response strategies for managing influenza outbreaks in the ED setting has not been rigorously studied. While practical use of traditional surveillance systems has been limited due to the several week lag associated with reporting, new internet-based surveillance tools, such as GFT, report surveillance data in near-real time, thus allowing rapid integration into healthcare response planning [5].

Objective

Google Flu Trends (GFT) is a novel internet-based influenza surveillance system that uses search engine query data to estimate influenza activity. This study assesses the temporal correlation of city GFT data to both confirmed cases of influenza, as well as standard crowding indices from one inner-city emergency department (ED).

Submitted by elamb on
Description

Time-of-arrival (TOA) surveillance methodology consists of identifying clusters of patients arriving to a hospital emergency department (ED) with similar complaints within a short temporal interval. TOA monitoring of ED visit data is currently conducted by the Florida Department of Health at the county level for multiple subsyndromes [1]. In 2011, North Carolina's NC DETECT system and CDC's Biosense Program collaborated to enhance and adapt this capability for 10 hospital-based Public Health Epidemiologists (PHEs), an ED-based monitoring group established in 2003, for North Carolina's largest hospital systems. At the present time, PHE hospital systems include coverage for approximately 44% of the statewide general/acute care hospital beds and 32% of all emergency department visits statewide. We present findings from TOA monitoring in one hospital system.

Objective

To describe collaborations between North Carolina Division of Public Health and the Centers for Disease Control and Prevention (CDC) implementing time-of-arrival (TOA) surveillance to monitor for exposure-related visits to emergency departments (ED) in small groups of North Carolina hospitals.

Submitted by elamb on
Description

The spatial scan statistic [1] is the most used measure for cluster strenght. The evaluation of all possible subsets of regions in a large dataset is computationally infeasible. Many heuristics have appeared recently to compute approximate values that maximizes the logarithm of the likelihood ratio. The Fast Subset Scan [2] finds exactly the optimal irregularly spatial cluster; however, the solution may not be connected. The spatial cluster detection problem was formulated as the classic knapsack problem [3], and modeled as a bi-objective unconstrained combinatorial optimization problem. Dynamic programming relies on the principle that, in an optimal sequence of decisions or choices, each sub-sequence must also be optimal. During the search for a solution it avoids full enumeration by pruning early partial decision solutions that cannot possibly lead to optimal solutions.

Objective

We propose a fast, exact algorithm to make detection and inference of arbitrarily shaped connected spatial clusters in aggregated area maps based on constrained dynamic programming.

Submitted by elamb on