Skip to main content

Data Analytics

Description

Health surveillance is well established for infectious diseases, but less so for non-communicable diseases. When spatio-temporal methods are used, selection often appears to be driven by arbitrary criteria, rather than optimal detection capabilities. Our aim is to use a theoretical simulation framework with known spatio-temporal clusters to investigate the sensitivity and specificity of several traditional (e.g. SatScan and Cusum) and Bayesian (incl. BaySTDetect and Dcluster) statistical methods for spatio-temporal cluster detection of non-communicable disease.

Objective: To determine the merits of different surveillance methods for cluster detection, in particular when used in conjuction with small area data. This will be investigated using a simulated framework. This is with a view to support further surviellance work using real small area data.

Submitted by elamb on
Description

Climate warming, globalization, social and economic crises lead to the activation of natural foci of vector-borne infections, among which a special place belongs to Lyme disease (Ixodic tick borreliosis – ITB), the vectors of which are the Ixodes ticks. More than 5,000 cases are registered in the United States every year. In European countries, the number of cases may reach up to 8,000-10,000 per year. Incidence rate for ITB in France is 39.4 per 100,000 population, in Bulgaria – 36.6. In Ukraine, among all ticks, 10-70% are infected with Borrelia; from 10% to 42.2% of Ukrainian population had contact with the causative agent of ITB. Mathematical modeling as an element of monitoring of natural focal infections makes it possible to assess the epidemiological potential of foci in the region and in individual territories, to forecast the trends of the epidemic process and to determine the main priorities and directions in the prevention of ITB. The most modern and effective method of simulation is multi-agent simulation, which is associated with the concept of an intelligent agent, as some robot, purposefully interacting with other similar elements and the external environment under given conditions. An intelligent agent is an imitation model of an active element, the state and behavior of which in various situations of achieving the goal vary depending on the state and behavior of other agents and the environment, in analogy with the intellectual behavior of a live organism (including a human) under similar conditions. As the epidemic process of Lyme disease is characterized by vector transmission, heterogeneous tick population, variable pathogen infectivity, heterogeneous environment, and seasonal changes in tick activity, the use of classical statistical methods for predicting the dynamics of morbidity cannot show high accuracy. The multiagent approach to simulation of the epidemic process of Lyme disease allows considering all of the above features, and since the dynamics of the modeled system is formed from the behavior of local objects (humans and ticks), we expect that a model constructed using a multiagent approach will yield a higher accuracy of prognosis morbidity. The multiagent model will allow not only to calculate the forecast, but also to reveal the factors influencing increase of the incidence of Lyme disease the most.

Objective: The objective of this research is to develop the model for calculating the forecast of the Lyme disease dynamics what will help to take effective preventive and control measures using the intelligent multi-agent approach.

Submitted by elamb on
Description

Current biosurveillance systems run multiple univariate statistical process control (SPC) charts to detect increases in multiple data streams. The method of using multiple univariate SPC charts is easy to implement and easy to interpret. By examining alarms from each control chart, it is easy to identify which data stream is causing the alarm. However, testing multiple data streams simultaneously can lead to multiple testing problems that inflate the combined false alarm probability. Although methods such as the Bonferroni correction can be applied to address the multiple testing problem by lowering the false alarm probability in each control chart, these approaches can be extremely conservative. Biosurveillance systems often make use of variations of popular univariate SPC charts such as the Shewart Chart, the cumulative sum chart (CUSUM), and the exponentially weighted moving average chart (EWMA). In these control charts an alarm is signaled when the charting statistic exceeds a pre-defined control limit. With the standard SPC charts, the false alarm rate is specified using the in-control average run length (ARL0). If multiple charts are used, the resulting multiple testing problem is often addressed using family-wise error rate (FWER) based methods that are known to be conservative - for error control. A new temporal method is proposed for early event detection in multiple data streams. The proposed method uses p-values instead of the control limits that are commonly used with standard SPC charts. In addition, the proposed method uses false discovery rate (FDR) for error control over the standard ARL0 used with conventional SPC charts. With the use of FDR for error control, the proposed method makes use of more powerful and up-to-date procedures for handling the multiple testing problem than FWER-based methods.

Objective: To propose a computationally simple, fast, and reliable temporal method for early event detection in multiple data streams.

Submitted by elamb on
Description

In 2004, Sante publique France, the French Public Health Agency set up a reactive all-cause mortality surveillance based on the administrative part of the death certificate, in the final objectives 1/ to detect unexpected or usual variations in mortality and 2/ to provide a first evaluation of mortality impact of events. In 2007, an Electronic Death Registration System (EDRS) was implemented, enabling electronic transmission of the medical causes of death to the agency in real-time. To date, 12% of the mortality is registered electronically. A pilot study demonstrated that these data were valuable for a reactive mortality surveillance system based on causes of death. A strategy has thus been developed for the analysis in routine of the medical causes of death with the objectives of early detection of expected and unexpected outbreaks and reactive evaluation of their impact. This system will allow approaching the cause accountability when an excess death will be observed.

Objective: The aim of this study is to present the syndromic groups that will be routinely monitored for the reactive mortality surveillance based on free-text medical causes of death.

Submitted by elamb on
Description

Surveillance of influenza epidemics is a priority for risk assessment and pandemic preparedness. Mapping epidemics can be challenging because influenza infections are incompletely ascertained, ascertainment can vary spatially, and often a denominator is not available. Rapid, more refined geographic or spatial intelligence could facilitate better preparedness and response.

Objective: Using the epidemic of influenza type A in 2016 in Australia, we demonstrated a simple but statistically sound adaptive method of automatically representing the spatial intensity and evolution of an influenza epidemic that could be applied to a laboratory surveillance count data stream that does not have a denominator.

Submitted by elamb on
Description

Hepatitis C virus (HCV) infection is a leading cause of liver disease-related morbidity and mortality in the United States. Monitoring the burden of chronic HCV infection requires robust methods to identify patients with infection. Insurance claims data are a potentially rich source of information about disease burden, but often lack the laboratory results necessary to define chronic HCV infection. We developed a machine learning-based algorithm to identify patients with chronic HCV infection using health insurance claims alone and compared it a previously developed ICD-9 code-based algorithm.

Objective: We developed a machine learning-based algorithm to identify patients with chronic hepatitis C infection in health insurance claims data.

Submitted by elamb on
Description

The evolution of a communicable disease in a human population is not entirely predictable. However, the spreading process can be assumed to vary smoothly in time. The time-dependent infection process can be linked to observations of the epidemic’s evolution by convolving it with a stochastic delay model. In retrospective analyses of epidemics, when the observations are the dates of exhibition of patients’ symptoms, the delay is the incubation period. In case of biosurveillance data, the delay is caused by incubation and a (hospital) visit delay, modeled as independent random variables. A model for observational error is also required. The time-dependent infection/spread rate may be inferred from observations by a deconvolution process. The smooth temporal variation of the infection rate allows its representation using a low dimensional parametric model, and the inference may be performed with relatively little data. For large outbreaks, the data may be available early in the epidemic, allowing timely modeling of the outbreak. Short-term forecasts using the model could thereafter be used for medical planning.

 

Objective

We present a statistical method to characterize an epidemic of a communicable disease from a time series of patients exhibiting symptoms. Characterization is defined as estimating an unobserved, time-dependent infection rate and associated parameters that completely define the evolution of an epidemic. The problem is posed as one of Bayesian inference, where parameters are inferred with quantified uncertainty. The method is demonstrated on synthetic and historical epidemic data. 

Submitted by hparton on
Description

We are developing a Bayesian surveillance system for realtime surveillance and characterization of outbreaks that incorporates a variety of data elements, including free-text clinical reports. An existing natural language processing (NLP) system called Topaz is being used to extract clinical data from the reports. Moving the NLP system from a research project to a real-time service has presented many challenges.

 

Objective

Adapt an existing NLP system to be a useful component in a system performing real-time surveillance.

Submitted by hparton on
Description

Current practices of automated case detection fall into the extremes of diagnostic accuracy and timeliness. In regards to diagnostic accuracy, electronic laboratory reporting (ELR) is at one extreme and syndromic surveillance is at the other. In regards to timeliness, syndromic surveillance can be immediate, and ELR is delayed 7 days from initial patient visit. A plausible solution, a middle way, to the extremes of diagnostic precision and timeliness in current case detection practices is an automated Bayesian diagnostic system that uses all available data types, for example, freetext ED reports, radiology reports, and laboratory reports.We have built such a solution - Bayesian case detection (BCD). As a probabilistic system, BCD operates across the spectrum of diagnostic accuracy, that is, it outputs the degree of certainty for every diagnosis. In addition, BCD incorporates multiple data types as they appear during the course of a patient encounter or lifetime, with no degradation in the ability to perform diagnosis.

 

Objective

This paper describes the architecture and evaluation of our recently developed automated BCD system.

Submitted by hparton on
Description

Real-Time Biosurveillance Program (RTBP) introduces modern surveillance technology to health departments in Sri Lanka and Tamil Nadu, India. Triage data from each patient visit (basic demographics, signs, symptoms, preliminary diagnoses) is recorded on paper at health facilities. Case records are transmitted daily to a central database using the RTBP mobile phone application. It is done by medical professionals in India, but in Sri Lanka, due to staffing constraints, the same duty is performed by lower cost personnel with limited domain knowledge. That results in noticeable differences in data entry error rates between the two locations. Most of such issues are due to systematic and subjectivemisinterpretations of the handwritten doctor notes by the data entry personnel. If not identified and remedied quickly, these errors can adversely affect accuracy and timeliness of health events detection. There is a need to support system managers in their efforts to maintain high reliability of data used for public health surveillance.

 

Objective

We present a method for automated detection of systematic data entry errors in real time biosurveillance.

Submitted by hparton on