Skip to main content

Data Mining

Description

Over 300 independent practices transmit monthly quality reports to a data warehouse using an automated process to summarize patient information into quality measures. All practices have implemented an EHR that captures clinical information to be aggregated for population reporting, and is designed to assist providers by generating point-of-care reminders and simplify ordering and documentation.

Objective

Comparison of automated EHR-derived data with manually abstracted patient information on smoking status and cessation intervention.

Submitted by uysz on
Description

Obesity can lead to the death of at least 2.8 million people each year1, yet the rate of obesity around the world has continuously increased over the past 30 years1. Societal changes, including increased food consumption and decreased physical activity, have been determined as two of the main drivers behind the current obesity pandemic2. Examining socio-cultural factors (i.e., attitudes or perceptions of cultural groups)3 associated with food consumption and weight loss can provide important insights to guide effective interventions and a novel surveillance approach to characterize population obesity trends from sociological perspectives. The primary goal of this study is to examine socio-cultural factors associated with food consumption and weight loss by conducting sentiment analysis on related online chatters. The secondary goal is to discuss the potential implications of being exposed to these different chatters in the online environment. Scientific evidence in support of using social media to understand socio-cultural factors and its potential implications can be illustrated in two concise assertions. First, online chatters, including discussions on social media, have been shown to be an effective data source for understanding public interests4,5. Second, prolonged participation in social media has been suggested to have impacts on users6-8.

Objective: We aim to better understand socio-cultural factors (i.e., attitudes or perceptions of cultural groups) associated with food consumption and weight loss via sentiment analysis on tweets, short messages from Twitter.

Submitted by elamb on
Description

Overweight and obesity are recognized as one of the greatest modern public health problems1, yet worldwide prevalence of obesity has nearly doubled over the past 30 years2. As part of a strategy to control the obesity pandemic, the WHO recommends an obesity surveillance at the population level3. Empirical studies have shown the importance of social networks in obesity4 and new strategies focusing on social interactions and environments have been proposed5 to prevent the further increase in obesity prevalence. With the increasing use of the internet, online social networks, interactions, and environments (i.e., online social relational factors) deserve more attention. Nearly three- quarters of Americans go online daily6, for functions like connecting with individuals via social network sites7. Like face to face interactions, studies have suggested that social interactions and networks on the internet can influence behavior changes8. Previous studies examining social networking sites typically examine a few selected social networking sites (example studies9,10), although individuals could be members of multiple social networking sites. To better leverage online social relational factors for the purpose of characterizing and monitoring population obesity trends, we investigate weight management community members' other communities and their level of participation, a first step toward utilizing online multifactorial social interactions and environments.

Objective: We aim to better understand online social interactions and environments of individuals interested in weight management from a social media platform called Reddit.

Submitted by elamb on
Description

Influenza epidemics occur seasonally but with spatiotemporal variations in peak incidence. Many modeling studies examine transmission dynamics [1], but relatively few have examined spatiotemporal prediction of future outbreaks [2]. Bootsma et al [3] examined past influenza epidemics and found that the timing of public health interventions strongly affected the morbidity and mortality. Being able to predict when and where high influenza incidence levels will occur before they happen would provide additional lead time for public health professionals to plan mitigation strategies. These predictions are especially valuable to them when the positive predictive value is high and subsequently false positives are infrequent.

Objective

Advanced techniques in data mining and integrating evidence from multiple sources are used to predict levels of influenza incidence several weeks in advance and display results on a map in order to help public health professionals prepare mitigation measures.

Submitted by elamb on
Description

The status of each Intensive Care Unit (ICU) patient is routinely monitored and a number of vital signs are recorded at sub-second frequencies which results in large amounts of data. We propose an approach to transform this stream of raw vital measurements into a sparse sequence of discrete events. Each such event represents significant departure of an observed vital sequence from the null distribution learned from reference data. Any substantial departure may be indicative of an upcoming adverse health episode. Our method searches the space of such events for correlations with near-future changes in health status. Automatically extracted events with significant correlations can be used to predict impending undesirable changes in the patient's health. The ultimate goal is to equip ICU physicians with a surveillance tool that will issue probabilistic alerts of upcoming patient status escalations in sufficient advance to take preventative actions before undesirable conditions actually set in.

 

Objective

To present a statistical data mining approach designed to: 1. Identify change points in vital signs which may be indicative of impending critical health events in ICU patients and 2. Identify utility of these change points in predicting the critical events.

Submitted by elamb on
Description

Epidemic dynamics of dengue fever are driven by complex interactions between hosts, vectors and viruses that are influenced by environmental and climatic factors [1]. The development of new methods to identify such specific characteristics becomes crucial to better understand and control spatiotemporal transmission. We concentrated our efforts on applying sequential pattern mining [2] to an epidemiological and meteorological dataset to identify potential drivers of dengue fever outbreaks.

Objective

We used a data mining method based on sequential patterns extraction to identify local meteorological drivers of dengue fever epidemics in French Guiana.

Submitted by knowledge_repo… on
Description

The Indiana Public Health Emergency Surveillance System (PHESS) currently receives approximately 5,000 near real-time chief complaint messages from 55 hospital emergency departments daily.  The ISDH partners with the Regenstrief Institute to process, batch, and transmit data every three hours.  The Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) tool is utilized to analyze these chief complaint data and visualize generated alerts.1   

 

The ISDH syndromic surveillance team discovered that certain chief complaints of interest were coded into the “other” syndrome and not visible in typical daily alert data.  Staff determined that even a single chief complaint containing keywords related to specific reportable diseases could be of significant public health value and should be made available to investigating epidemiologists.2 

 

In addition, data quality is critical to the success of the program and must be evaluated to ensure optimal system performance.  Metrics related to data flow and completeness were identified to serve as indicators of hospital connectivity or coding problems.  These measures included the percent change in daily admits and the proportion of chief complaints missing the patient address.

Objective

This paper describes the development of targeted query tools and processes designed to maximize the extraction of information from, and improve the quality of, the hospital emergency department chief complaint data stream utilized by the Indiana State Department of Health (ISDH) for syndromic surveillance.

Submitted by elamb on
Description

Syndromic surveillance is focused upon organizing data into categories to detect medium to large scale clusters of illness. Detection often requires that a critical threshold be surpassed. Data mining searches through data to identify records containing keywords. New Hampshire has combined data mining with syndromic surveillance since January 2003 to improve detection capacity.

 

Objective

1. Understand the principles behind the use of syndromic surveillance and data mining. 2. Understand how New Hampshire's unique approach combining data mining with syndromic surveillance has enhanced disease surveillance efforts. 3. Describe the steps and code necessary to implement and enhance data mining.

Submitted by elamb on
Description

OBJECTIVE This paper describes a series of data mining techniques used to gather and analyze and disseminate large amounts of data from numerous sources in English as well as Chinese. The objective of the analysis is to attempt to identify locations where the data may indicate a current or future outbreak of the A-H5N1 strain of the flu virus.

Submitted by elamb on