Skip to main content

Data Analytics

Description

The research reported in this paper is part of a larger effort to achieve better signal-to-noise ratio, hence accuracy, in pharmacovigilance applications. The relatively low frequency of occurrence of adverse drug reactions leads to weak causal relations between the reaction and any measured signal. We hypothesize that by grouping related signals, we can enhance detection rate and suppress false alarm rate.

 

Objective

ICD-9 codes are commonly used to identify disease cohorts and are often found to be less than adequate. Data available in structured databasesFlab test results, medications etc.Fcan supplement the diagnosis codes. In this study, we describe an automated method that uses these related data items, and no additional manual annotations to more accurately identify patient cohorts.

Submitted by hparton on
Description

Recent years' informatics advances have increased availability of various sources of health-monitoring information to agencies responsible for disease surveillance. These sources differ in clinical relevance and reliability, and range from streaming statistical indicator evidence to outbreak reports. Information-gathering advances have outpaced the capability to combine the disparate evidence for routine decision support. In view of the need for analytical tools to manage an increasingly complex data environment, a fusion module based on Bayesian networks (BN) was developed in 2011 for the Dept. of Defense (DoD) Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE). In 2012 this module was expanded with syndromic queries, data-sensitive algorithm selection, and hierarchical fusion network training [1]. Subsequent efforts have produced a full fusion-enabled version of ESSENCE for beta testing, further upgrades, and a software specification for live DoD integration. Beta test reviewers cited the reduced alert burden and the detailed evidence underlying each alert. However, only 39 reported historical events were available for training and calibration of 3 networks designed for fusion of influenza-like-illness, gastrointestinal, and fever syndrome categories. The current presentation describes advances to formalize the network training, calibrate the component alerting algorithms and decision nodes together for each BN, and implement a validation strategy aimed at both the ESSENCE public health user and machine learning communities.

Objective

This presentation aims to reduce the gap between multivariate analytic surveillance tools and public health acceptance and utility. We developed procedures to verify, calibrate, and validate an evidence fusion capability based on a combination of clinical and syndromic indicators and limited knowledge of historical outbreak events.

Submitted by elamb on
Description

OSS is rapidly becoming part of more public health applications. Mobile health (mHealth) initiatives and the need for electronic processes to support healthcare (eHealth) provide particularly good examples of government use of open source software. The growth of global and national mHealth and eHealth needs has spurred innovation in software development. In resource limited areas that do not have the infrastructure for sophisticated computing tools but where cellular technology is prevalent, mHealth solutions are able to move such communities into the digital age. Monetary costs of licensing and maintaining proprietary software systems have been common challenges to these end users, but OSS helps solve these problems. OSS has already been used to further certain global public health initiatives, but more needs to be done. For instance, the passage of the World Health Organization (WHO) International Health Regulations (IHR) in 2005 required member countries to implement certain core public health capacities by June 2012. The adoption more broadly of OSS has the potential to improve the efficiency of IHR implementation, and therefore global public health initiatives in general, because it provides a free, modifiable software option which can be altered to meet specific requirements.

Objective

Provide an overview of common open source software (OSS) licenses used in public health applications, and discuss how OSS can help improve global public health security.

Submitted by elamb on
Description

Despite the number of infections, hospitalizations, and deaths from influenza each year, developing the ability to predict the timing of these outbreaks has remained elusive. Public health practitioners have lacked a reliable, easy-to-implement method for predicting the onset of a period of elevated influenza incidence in a community. We (a team of statisticians, epidemiologists, and clinicians) have developed a model to help public health practitioners develop simple, adaptable, data-driven rules to define a period of increased disease incidence in a given location. We call this method the Above Local Elevated Respiratory illness Threshold (ALERT) algorithm. The ALERT algorithm is a simple method that defines a period of elevated disease incidence in a community or hospital that systematically collects surveillance data on a particular disease.

Objective

Our objective was to develop a simple, easy-to-use algorithm to predict the onset of a period of elevated influenza incidence in a community using surveillance data.

Submitted by elamb on
Description

In 1911, Christophers developed an early-warning system for malaria epidemics in Punjab based on rainfall, fever-related deaths and wheat prices. Since that initial system, researchers and practitioners have continued to search for determinants of spatial and temporal variability of malaria to improve systems for forecasting disease burden. Malaria thrives in poor tropical and subtropical countries where resources are limited. Accurate disease prediction and early warning of increased disease burden can provide public health and clinical health services with the information needed to implement targeted approaches for malaria control and prevention that make effective use of limited resources. Malaria forecasting models do not typically consider clinical predictors, such as type of antimalarial treatment, in the forecasting models. 

Objective

The objective of the research was to identify the most accurate models for forecasting malaria at six different sentinel sites in Uganda, using environmental and clinical data sources.

Submitted by elamb on
Description

Influenza epidemics occur seasonally but with spatiotemporal variations in peak incidence. Many modeling studies examine transmission dynamics [1], but relatively few have examined spatiotemporal prediction of future outbreaks [2]. Bootsma et al [3] examined past influenza epidemics and found that the timing of public health interventions strongly affected the morbidity and mortality. Being able to predict when and where high influenza incidence levels will occur before they happen would provide additional lead time for public health professionals to plan mitigation strategies. These predictions are especially valuable to them when the positive predictive value is high and subsequently false positives are infrequent.

Objective

Advanced techniques in data mining and integrating evidence from multiple sources are used to predict levels of influenza incidence several weeks in advance and display results on a map in order to help public health professionals prepare mitigation measures.

Submitted by elamb on
Description

A new TB case can be classified as: 1) a source case for transmission leading to other, secondary active TB cases; 2) a secondary case, resulting from recent transmission; or 3) an isolated case, uninvolved in recent transmission (i.e. neither source nor recipient). Source and secondary cases require more intense intervention due to their involvement in a chain of transmission; thus, accurate and rapid classification of new patients should help public health personnel to effectively prioritize control activities. However, currently accepted method for the classification, DNA fingerprint analysis, takes many weeks to produce the results; therefore, public health personnel often solely rely on their intuition to identify the case who is most likely to be involved in transmission. Various clinical and socio-demographic features are known to be associated with TB transmission. By using these readily available data at the time of diagnosis, it is possible to rapidly estimate the probabilities of the case being source, secondary, and isolated.

Objective

To develop and validate a prediction model which estimates the probability of a newly diagnosed tuberculosis (TB) case being involved in ongoing chain of transmission, based on the case's clinical and socio-demographic attributes available at the time of diagnosis.

Submitted by elamb on
Description

Time series data involving counts are frequently encountered in many biomedical and public health applications. For example, in disease surveillance, the occurrence of rare infections over time is often monitored by public health officials, and the time series data collected can be used for the purpose of monitoring changes in disease activity. For rare diseases with low infection rates, the observed counts typically contain a high frequency of zeros (zero-inflated), but the counts can also be very large (overdispersed) during an outbreak period. Failure to account for zero-inflation and overdispersion in the data may result in misleading inference and the detection of spurious associations.

 

Objective

The purpose of this study is to develop novel statistical methods to analyze zero-inflated and overdispersed time series consisting of count data.

Submitted by elamb on
Description

In 2010, as rules for the Centers for Medicaid and Medicare Electronic Heatlh Record (EHR) Incentive Programs (Meaningful Use)(1), were finalized, ISDS became aware of a trend towards new EHR systems capturing or sending emergency department (ED) chief complaint (CC) data as structured variables without including the free-text. This perceived shift in technology was occurring in the absence of consensus-based technical requirements for syndromic surveillance and survey data on the value of free-text CC to public health practice. On 1/31/11, ISDS, in collaboration with CDC BioSense, recommended a core set of data for public health syndromic surveillance (PHSS) to support public health's participation in Meaningful Use.

Objective

This study was conducted to better support a requirement for ED CC as free-text, by investigating the relationship between the unstructured, free-text form of CC data and its usefulness in public health practice. To better inform health IT standardization practices, specifically related to Meaningful Use, by describing how US public health agencies use unstructured, free-text EHR data to monitor, assess, investigate and manage issues of public health interest.

Submitted by elamb on
Description

Syndromic surveillance uses syndrome (a specific collection of clinical symptoms) data that are monitored as indicators of a potential disease outbreak. Advanced surveillance systems have been implemented globally for early detection of infectious disease outbreaks and bioterrorist attacks. However, such systems are often confronted with the challenges such as (i) incorporate situation specific characteristics such as covariate information for certain diseases; (ii) accommodate the spatial and temporal dynamics of the disease; and (iii) provide analysis and visualization tools to help detect unexpected patterns. New methods that improve the overall detection capabilities of these systems while also minimizing the number of false positives can have a broad social impact.

Submitted by elamb on