Skip to main content

Data Analytics

Description

Twitter is a free social networking and micro-blogging service that enables its millions of users to send and read each other’s ‘tweets’, or short messages limited to 140 characters. The service has more than 190 million registered users and processes about 55 million tweets per day. Despite a high level of chatter, the Twitter stream does contain useful information, and, because tweets are often sent from handheld platforms on location, they convey more immediacy than other social networking systems.

Objective

This paper describes a system that uses Twitter to estimate influenza-like illness levels by geographic region.

Submitted by teresa.hamby@d… on
Description

Group A Streptococcal (GAS) pharyngitis, the most common bacterial cause of acute pharyngitis, causes more than half a billion cases annually worldwide. Treatment with antibiotics provides symptomatic benefit and reduces complications, missed work days and transmission. Physical examination alone is an unreliable way to distinguish GAS from other causes of pharyngitis, so the 4-point Centor score, based on history and physical, is used to classify GAS risk. Still, patients with pharyngitis are often misclassified, leading to inappropriate antibiotic treatment of those with viral disease and to under-treatment of those with bone fide GAS. One key problem, even when clinical guidelines are followed, is that diagnostic accuracy for GAS pharyngitis is affected by earlier probability of disease, which in turn is related to exposure. Point-of-care clinicians rarely have access to valuable biosurveillance-derived contextualizing information when making clinical management decisions.

Objective

The objective of this study was to measure the value of integrating real-time contemporaneous local disease incidence (biosurveillance) data with a clinical score, to more accurately identify patients with Group A Streptococcal (GAS) pharyngitis.

Submitted by teresa.hamby@d… on
Description

The burden of asthma is a major public health issue, and of a wider interest particularly to public health practitioners, health care providers and policy makers, as well as researchers. The literature on forecasting of adverse respiratory health events like asthma attacks is limited. It is an unclear field; and there is a need for more research on the forecasting of the demand for hospital respiratory services.

Objective

This paper describes a framework for creating a time series data set with daily asthma admissions, weather and air quality factors; and then generating suitable lags for predictive multivariate quantile regression models (QRMs). It also demonstrates the use of root mean square error (RMSE) and receiver operating characteristic (ROC) error measures in selecting suitable predictive models.

Submitted by uysz on
Description

The spatial scan statistic detects significant spatial clusters of disease by maximizing a likelihood ratio statistic over a large set of spatial regions. Several recent approaches have extended spatial scan to multiple data streams. Burkom aggregates actual and expected counts across streams and applies the univariate scan statistic, thus assuming a constant risk for the affected streams. Kulldorff et al. separately apply the univariate statistic to each stream and then aggregate scores across streams, thus assuming independent risks for each affected stream. Neill proposes a ‘fast subset scan’ approach, which maximizes the scan statistic over proximity-constrained subsets of locations, improving the timeliness of detection for irregularly shaped clusters. In the univariate event detection setting, many commonly used scan statistics satisfy the ‘linear-time subset scanning’ (LTSS) property, enabling exact and efficient detection of the highest-scoring space-time clusters.

Objective

We extend the recently proposed ‘fast subset scan’ framework from univariate to multivariate data, enabling computationally efficient detection of irregular space-time clusters even when the numbers of spatial locations and data streams are large. These fast algorithms enable us to perform a detailed empirical comparison of two variants of the multivariate spatial scan statistic, demonstrating the tradeoffs between detection power and characterization accuracy

Submitted by teresa.hamby@d… on
Description

Most, if not all, disease surveillance systems are federated in the sense that hospitals, doctors’ offices, pharmacies are the source of most surveillance data. Although a health department may request or mandate that these organizations report data, we are not aware of any requirements about the method of data collection or audits or other measures of quality control.

Because of the heterogeneity and lack of control over the processes by which the data are generated, data sources in a federated disease surveillance system are black boxes the reliability, completeness, and accuracy of which are not fully understood by the recipient.

In this paper, we use the variance-to-mean ratio of daily counts of surveillance events as a metric of data quality. We use thermometer sales data as an example of data from a federated disease surveillance system. We test a hypothesis that removing stores with higher baseline variability from pooled surveillance data will improve the signal-to-noise ratio of thermometer sales for an influenza outbreak.

 

Objective

We developed a novel method for monitoring the quality of data in a federated disease surveillance system, which we define as ‘a surveillance system in which a set of organizations that are not owned or controlled by public health provide data.’

Submitted by hparton on
Description

The nature of Emergency Room services makes the patients' visits hard to predict and control and the services incur high costs. Chronic patients should not require urgent care to treat their chronic illness, if they were properly managed in primary care. We track frequency of emergency room visits by chronically ill when the primary complaint of record is their chronic condition. We use a record of institutional insurance claims collected in over 400 hospitals in California between 2006 and 2010. We identify dimensions of data that provide statistically significant differences of utilization between strata. We found particularly significant differences in resource utilization subject to type of insurance coverage carried by the patient, and subject to patient's age. We studied Diabetes, Asthma, and Arthritis patients from 8 age groups spanning ages 5 to 85, and 13 insurance payer types.

Objective

To study patterns of utilization of emergency care resources by chronically ill in order to identify efficiency and quality of care improvement opportunities.

Submitted by elamb on