Skip to main content

Polgreen Philip

Description

Dengue is a mosquito-borne viral disease, and there is considerable evidence that case numbers are rising and geographical distribution of the disease is widening within the United States, and around the world. 

The accuracy and reporting frequency of dengue morbidity and mortality information varies geographically, and often is an underestimation of the actual number of dengue infections. As traditional methods of disease surveillance may not accurately capture the true impact of this disease, it is important to gather professional observations and opinions from individuals in the public health, medical, and vector control fields of practice. Prediction markets are one way of supplementing traditional surveillance and quantifying the observations and predictions of professionals in the field. 

Prediction markets have been successfully used to forecast future events, including future influenza activity. For these markets, we divided the possible outcomes for each question into multiple mutually exclusive contracts to forecast dengue-related events. This differed from many previous prediction markets that offered single sets of yes-no questions and used ‘real’ money in the form of educational grants. However, with more detailed contracts, we were able to generate more refined predictions of dengue activity.

 

Objective

The objective of this project is to use prediction markets to forecast the spread of dengue.

Submitted by hparton on
Description

Influenza-like illness (ILI) data is collected by an Influenza Sentinel Provider Surveillance Network at the state (Iowa, USA) level. Historically, the Iowa Department of Public Health has maintained 19 different influenza sentinel surveillance sites. Because participation is voluntary, locations of the sentinel providers may not reflect optimal geographic placement. This study analyzes two different geographic placement algorithms - a maximal coverage model (MCM) and a K-median model. The MCM operates as follows: given a specified radius of coverage for each of the n candidate surveillance sites, we greedily choose the m sites that result in the highest population coverage. In previous work, we showed that the MCM can be used for site placement. In this paper, we introduce an alternative to the MCM - the K-median model. The K-median model, often called the P-median model in geographic literature, operates by greedily choosing the m sites which minimize the sum of the distances from each person in a population to that person’s nearest site. In other words, it minimizes the average travel distance for a population.

 

Objective

This paper describes an experiment to evaluate the performance of several alternative surveillance site placement algorithms with respect to the standard ILI surveillance system in Iowa.

Submitted by hparton on
Description

Predictionmarkets have been successfully used to forecast future events in other fields. We adapted this method to provide estimates of the likelihood of H5N1 influenza related events.

 

Objective

The purpose of this study is to compare the results of an H5N1 influenza prediction market model with a standard statistical model.

Submitted by hparton on
Description

Emerging event detection is the process of automatically identifying novel and emerging ideas from text with minimal human intervention. With the rise of social networks like Twitter, topic detection has begun leveraging measures of user influence to identify emerging events. Twitter's highly skewed follower/followee structure lends itself to an intuitive model of influence, yet in a context like the Emerging Infections Network (EIN), a sentinel surveillance listserv of over 1400 infectious disease experts, developing a useful model of authority becomes less clear. Who should we listen to on the EIN? To explore this, we annotated a body of important EIN discussions and tested how well 3 models of user authority performed in identifying those discussions. In previous work we proposed a process by which only posts that are based on specific "important" topics are read, thus drastically reducing the amount of posts that need to be read. The process works by finding a set of "bellwether" users that act as indicators for "important" topics and only posts relating to these topics are then read. This approach does not consider the text of messages, only the patterns of user participation. Our text analysis approach follows that of Cataldi et al.[1], using the idea of semantic "energy" to identify emerging topics within Twitter posts. Authority is calculated via PageRank and used to weight each author's contribution to the semantic energy of all terms occurring in within some interval ti. A decay parameter d defines the impact of prior time steps on the current interval.

Objective

To explore how different models of user influence or authority perform when detecting emerging events within a small-scale community of infectious disease experts.

Submitted by elamb on
Description

The time series of syphilis cases has been studied at the country and state level at the yearly basis, and it has been found that syphilis has a periodicity of approximately 10 years. However, to inform prevention efforts, it is important to understand the short term dynamics of disease activity.

 

Objective

(i) To forecast syphilis cases per state in the US to support early containment of outbreaks. (ii) For each state, to determine which states are most correlated, to find "bellwether" states to inform surveillance efforts. (iii) To determine a small collection of states whose syphilis incidence patterns are most closely correlated with all the states.

Submitted by elamb on
Description

Time series data involving counts are frequently encountered in many biomedical and public health applications. For example, in disease surveillance, the occurrence of rare infections over time is often monitored by public health officials, and the time series data collected can be used for the purpose of monitoring changes in disease activity. For rare diseases with low infection rates, the observed counts typically contain a high frequency of zeros (zero-inflated), but the counts can also be very large (overdispersed) during an outbreak period. Failure to account for zero-inflation and overdispersion in the data may result in misleading inference and the detection of spurious associations.

 

Objective

The purpose of this study is to develop novel statistical methods to analyze zero-inflated and overdispersed time series consisting of count data.

Submitted by elamb on
Description

Public health officials and epidemiologists have been attempting to eradicate syphilis for decades, but national incidence rates are again on the rise. It has been suggested that the syphilis epidemic in the US is a "rare example of unforced, endogenous oscillations in disease incidence, with an 8-11-yr period that is predicted by the natural dynamics of syphilis infection, to which there is partially protective immunity." While the time series of aggregate case counts seems to support this claim, between 1990 and 2010 there seems to have been a significant change in the spatial distribution of the syphilis epidemic. It is unclear if this change can also be attributed to "endogenous" factors or whether it is due to exogenous factors such as behavioral changes (e.g., the widespread use of the internet for anonymous sexual encounters). For example, it is pointed out that levels of syphilis in 1989 were abnormally high in counties in North Carolina (NC) immediately adjacent to highways. The hypothesis was that this may be due truck drivers and prostitution, and/or the emerging cocaine market. Our results indicate that syphilis distribution in NC has changed since 1989, diffusing away from highway counties.

 

Objective

To study the spatial distribution of syphilis at the county level for specific states and nationally, and to determine how this might have changed over time in order to improve disease surveillance.

Submitted by elamb on
Description

The spread of infectious diseases is facilitated by human travel. Infectious diseases are often introduced into a population by travelers and then spread among susceptible individuals. Likewise uninfected susceptible travelers can move into populations sustaining the spread of an infectious disease.

Several disease-modeling efforts have incorporated travel data (e.g., air, train, or subway traffic) as well as census data, all in an effort to better understand the spread of infectious diseases. Unfortunately, most travel data is not fine grained enough to capture individual movements over long periods and large spaces. It does not, for example, document what happens when people get off a train or airplane. Thus, other methods have been suggested to measure how people move, including both the tracking of currency and movement of individuals using cell phone data. Although these data are finer grained, they have their own limitations (e.g., sparseness) and are not generally available for research purposes.

FourSquare is a social media application that permits users to "check-in" (i.e., record their current location at stores, restaurants, etc.) via their mobile telephones in exchange for incentives (e.g., location-specific coupons). FourSquare and similar applications (Gowalla, Yelp, etc.) generally broadcast each check-in via Twitter or Facebook; in addition, some GPS-enabled mobile Twitter clients add explicit geocodes to individual tweets.

Here, we propose the use of geocoded social media data as a real-time fine-grained proxy for human travel.

 

Objective

To use sequential, geocoded social media data as a proxy for human movement to support both disease surveillance and disease modeling efforts.

Referenced File
Submitted by elamb on
Description

Influenza is a major cause of mortality. In developed countries, mortality is at its highest during winter months, not only as a result of deaths from influenza and pneumonia but also as a result of deaths attributed to other diseases (e.g. cardiovascular disease). Understandably, much of the surveillance of influenza follows predefined geographic regions (e.g. census regions or state boundaries). However, the spread of influenza and its resulting mortality does not respect such boundaries.

 

Objective

To cluster cities in the United States based on their levels of mortality from influenza and pneumonia.

Submitted by elamb on
Description

Influenza-like illness (ILI) data is collected via an Influenza Sentinel Provider Surveillance Network at the state level. Because participation is voluntary, locations of the sentinel providers may not reflect optimal geographic placement. This study analyzes two different geographic placement schemes - a maximal coverage model (MCM) and a K-median model, two location-allocation models commonly used in geographic information systems. The MCM chooses sites in areas with the densest population. The K-median model chooses sites which minimize the average distance traveled by individuals to their nearest site. We have previously shown how a placement model can be used to improve population coverage for ILI surveillance in Iowa when considering the sites recruited by the Iowa Department of Public Health. We extend this work by evaluating different surveillance placement algorithms with respect to outbreak intensity and timing (i.e., being able to capture the start, peak and end of the influenza season).

 

Objective

To evaluate the performance of several sentinel surveillance site placement algorithms for ILI surveillance systems. We explore how these different approaches perform by capturing both the overall intensity and timing of influenza activity in the state of Iowa.

Submitted by elamb on