Skip to main content

Data Analytics

Description

Analyses produced by epidemiologists and public health practitioners are susceptible to bias from a number of sources including missing data, confounding variables, and statistical model selection. It often requires a great deal of expertise to understand and apply the multitude of tests, corrections, and selection rules, and these tasks can be time-consuming and burdensome. To address this challenge, Aptima began development of CARRECT, the Collaborative Automation Reliably Remediating Erroneous Conclusion Threats system. When complete, CARRECT will provide an expert system that can be embedded in an analyst’s workflow. CARRECT will support statistical bias reduction and improved analyses and decision making by engaging the user in a collaborative process in which the technology is transparent to the analyst.

Objective

The objective of the CARRECT software is to make cutting edge statistical methods for reducing bias in epidemiological studies easy to use and useful for both novice and expert users.

 

Submitted by uysz on
Description

Temporal alerting algorithms commonly used in syndromic surveillance systems are often adjusted for data features such as cyclic behavior but are subject to overfitting or misspecification errors when applied indiscriminately. In a project for the Armed Forces Health Surveillance Center to enable multivariate decision support, we obtained 4.5 years of outpatient, prescription and laboratory test records from all US military treatment facilities. A proof-of-concept project phase produced 16 events with multiple evidence corroboration for comparison of alerting algorithms for detection performance. We used the representative streams from each data source to compare sensitivity of 6 algorithms to injected spikes, and we used all data streams from 16 known events to compare them for detection timeliness.

Objective

For a multi-source decision support application, we sought to match univariate alerting algorithms to surveillance data types to optimize detection performance.

Submitted by uysz on
Description

Each year, influenza results in increased Emergency Department crowding which can be mitigated through early detection linked to an appropriate response. Although current surveillance systems, such as Google Flu Trends, yield near real-time influenza surveillance, few demonstrate ability to forecast impending influenza cases.

Objective

We sought to develop a practical influenza forecast model, based on real-time, geographically focused, and easy to access data, to provide individual medical centers with advanced warning of the number of influenza cases, thus allowing sufficient time to implement an intervention. Secondly, we evaluated how the addition of a real-time influenza surveillance system, Google Flu Trends, would impact the forecasting capabilities of this model.

Submitted by teresa.hamby@d… on
Description

Spatial methods are an important component of epidemiological research motivated by a strong correlation between disease spread and ecological factors. Our case studies examine the relationship between environmental conditions, such as climate and location, and vector distribution and abundance. Therefore, GIS can be used as a platform for integrating local environmental and meteorological variables into the analysis of disease spread, which would help in surveillance and decision making.

Objective

Use GIS to illustrate and understand the association between environmental factors and spread of infectious diseases.

Submitted by teresa.hamby@d… on
Description

The multivariate linear-time subset scan (MLTSS) extends previous spatial and subset scanning methods  to achieve timely and accurate event detection in massive multivariate datasets, efficiently optimizing a likelihood ratio statistic over proximity-constrained subsets of locations and all subsets of the monitored data streams. However, some disease outbreaks may only affect a subpopulation of the monitored population (age group, gender, individuals engaging in a specific high-risk behavior, etc.), and MLTSS is unable to use this additional information to enhance detection ability.

Objective

We present Multidimensional Subset Scan (MD-Scan), a new method for early outbreak detection and characterization using multivariate case data from individuals in a population. MD-Scan extends previous work on multivariate event detection by identifying the characteristics of the affected subpopulation, and enables more timely and accurate detection while maintaining computational tractability

 

Submitted by teresa.hamby@d… on
Description

One of the primary goals of this research was to characterize the viability of biosurveillance models to provide operationally relevant information to decision makers, in order to identify areas for future research. Two critical characteristics differentiate this work from other infectious disease modeling reviews. First, we reviewed models that attempted to predict the disease event, not merely its transmission dynamics. Second, we considered models involving pathogens of concern as determined by the US National Select Agent Registry.

Background: A rich and diverse field of infectious disease modeling has emerged over the past 60 years and has advanced our understanding of population- and individual-level disease transmission dynamics, including risk factors, virulence and spatio-temporal patterns of disease spread. Recent modeling advances include biostatistical methods, and massive agent-based population, biophysical, ordinary differential equation, and ecological-niche models. Diverse data sources are being integrated into these models as well, such as demographics, remotely-sensed measurements and imaging, environmental measurements, and surrogate data such as news alerts and social media. Yet, there remains a gap in the sensitivity and specificity of these models not only in tracking infectious disease events but also predicting their occurrence.

Objective

The objective of this manuscript is to present a systematic review of biosurveillance models that operate on select agents and can forecast the occurrence of a disease event.

Submitted by teresa.hamby@d… on
Description

Early detection of influenza outbreaks is critical to public health officials. Case detection is the foundation for outbreak detection. Previous study by Elkin el al. demonstrated that using individual emergency department (ED) reports can better detect influenza cases than using chief complaints. Our recent study using ED reports processed by Bayesian networks (using expert constructed network structure) showed high detection accuracy on detection of influenza cases.

Objective

Compare 7 machine learning algorithms with an expert constructed Bayesian network on detection of patients with influenza syndrome.

Submitted by teresa.hamby@d… on
Description

The New York City (NYC) Department of Health and Mental Hygiene (DOHMH) receives daily ED data from 49 of NYC’s 52 hospitals, representing approximately 95% of ED visits citywide. Chief complaint (CC) is categorized into syndrome groupings using text recognition of symptom key-words and phrases. Hospitals are not required to notify the DOHMH of any changes to procedures or health information systems (HIS). Previous work noticed that CC word count varied over time within and among EDs. The variations seen in CC word count may affect the quality and type of data received by the DOHMH, thereby affecting the ability to detect syndrome visits consistently.

Objective

To identify changes in emergency department (ED) syndromic surveillance data by analyzing trends in chief complaint (CC) word count; to compare these changes to coding changes reported by EDs; and to examine how these changes might affect the ability of syndromic surveillance systems to identify syndromes in a consistent manner.

Submitted by teresa.hamby@d… on
Description

We present the EpiEarly, EpiGrid, and EpiCast tools for mechanistically-based biological decision support. The range of tools covers coarse-, medium-, and fine-grained models. The coarse-grained, aggregated time-series only data tool (EpiEarly) provides a statistic quantifying epidemic growth potential and associated uncertainties. The medium grained, geographically-resolved model (EpiGrid) is based on differential equation type simulations of disease and epidemic progression in the presence of various human interventions geared toward understanding the role of infection control, early vs. late diagnosis, vaccination, etc. in outbreak control. A fine-grained hybrid-agent epidemic model (EpiCast) with diurnal agent travel and contagion allows the analysis of the importance of contact-networks, travel, and detailed intervention strategies for the control of outbreaks and epidemics.

Objective:

We will demonstrate tools that allow mechanistic contraints on disease progression and epidemic spread to play off against interventions, mitigation, and control measures. The fundamental mechanisms of disease progression and epidemic spread provide important constraints on interpreting changing epidemic cases counts with time and geography in the context of on-going interventions, mitigations, and controls. Models such as these that account for the effects of human actions can also allow evaluation of the importance of categories of epidemic and disease controls.

Submitted by elamb on
Description

Uncertainty Quantification (UQ), the ability to quantify the impact of sample-to-sample variations and model misspecification on predictions and forecasts, is a critical aspect of disease surveillance. While quantifying the impact of stochastic uncertainty in the data is well understood, quantifying the impact of model misspecification is significantly harder. For the latter, one needs a "universal model" to which more restrictive parametric models are compared too.

Objective:

We present a mathematical framework for non-parametric estimation of the force of infection, together with statistical upper and lower confidence bands. The resulting estimates allow to assess how well simpler models, such as SEIR, fit the observed time series of incidence data.

Submitted by elamb on