Skip to main content

Segre Alberto

Description

Influenza-like illness (ILI) data is collected by an Influenza Sentinel Provider Surveillance Network at the state (Iowa, USA) level. Historically, the Iowa Department of Public Health has maintained 19 different influenza sentinel surveillance sites. Because participation is voluntary, locations of the sentinel providers may not reflect optimal geographic placement. This study analyzes two different geographic placement algorithms - a maximal coverage model (MCM) and a K-median model. The MCM operates as follows: given a specified radius of coverage for each of the n candidate surveillance sites, we greedily choose the m sites that result in the highest population coverage. In previous work, we showed that the MCM can be used for site placement. In this paper, we introduce an alternative to the MCM - the K-median model. The K-median model, often called the P-median model in geographic literature, operates by greedily choosing the m sites which minimize the sum of the distances from each person in a population to that person’s nearest site. In other words, it minimizes the average travel distance for a population.

 

Objective

This paper describes an experiment to evaluate the performance of several alternative surveillance site placement algorithms with respect to the standard ILI surveillance system in Iowa.

Submitted by hparton on
Description

Emerging event detection is the process of automatically identifying novel and emerging ideas from text with minimal human intervention. With the rise of social networks like Twitter, topic detection has begun leveraging measures of user influence to identify emerging events. Twitter's highly skewed follower/followee structure lends itself to an intuitive model of influence, yet in a context like the Emerging Infections Network (EIN), a sentinel surveillance listserv of over 1400 infectious disease experts, developing a useful model of authority becomes less clear. Who should we listen to on the EIN? To explore this, we annotated a body of important EIN discussions and tested how well 3 models of user authority performed in identifying those discussions. In previous work we proposed a process by which only posts that are based on specific "important" topics are read, thus drastically reducing the amount of posts that need to be read. The process works by finding a set of "bellwether" users that act as indicators for "important" topics and only posts relating to these topics are then read. This approach does not consider the text of messages, only the patterns of user participation. Our text analysis approach follows that of Cataldi et al.[1], using the idea of semantic "energy" to identify emerging topics within Twitter posts. Authority is calculated via PageRank and used to weight each author's contribution to the semantic energy of all terms occurring in within some interval ti. A decay parameter d defines the impact of prior time steps on the current interval.

Objective

To explore how different models of user influence or authority perform when detecting emerging events within a small-scale community of infectious disease experts.

Submitted by elamb on
Description

Public health officials and epidemiologists have been attempting to eradicate syphilis for decades, but national incidence rates are again on the rise. It has been suggested that the syphilis epidemic in the US is a "rare example of unforced, endogenous oscillations in disease incidence, with an 8-11-yr period that is predicted by the natural dynamics of syphilis infection, to which there is partially protective immunity." While the time series of aggregate case counts seems to support this claim, between 1990 and 2010 there seems to have been a significant change in the spatial distribution of the syphilis epidemic. It is unclear if this change can also be attributed to "endogenous" factors or whether it is due to exogenous factors such as behavioral changes (e.g., the widespread use of the internet for anonymous sexual encounters). For example, it is pointed out that levels of syphilis in 1989 were abnormally high in counties in North Carolina (NC) immediately adjacent to highways. The hypothesis was that this may be due truck drivers and prostitution, and/or the emerging cocaine market. Our results indicate that syphilis distribution in NC has changed since 1989, diffusing away from highway counties.

 

Objective

To study the spatial distribution of syphilis at the county level for specific states and nationally, and to determine how this might have changed over time in order to improve disease surveillance.

Submitted by elamb on
Description

The spread of infectious diseases is facilitated by human travel. Infectious diseases are often introduced into a population by travelers and then spread among susceptible individuals. Likewise uninfected susceptible travelers can move into populations sustaining the spread of an infectious disease.

Several disease-modeling efforts have incorporated travel data (e.g., air, train, or subway traffic) as well as census data, all in an effort to better understand the spread of infectious diseases. Unfortunately, most travel data is not fine grained enough to capture individual movements over long periods and large spaces. It does not, for example, document what happens when people get off a train or airplane. Thus, other methods have been suggested to measure how people move, including both the tracking of currency and movement of individuals using cell phone data. Although these data are finer grained, they have their own limitations (e.g., sparseness) and are not generally available for research purposes.

FourSquare is a social media application that permits users to "check-in" (i.e., record their current location at stores, restaurants, etc.) via their mobile telephones in exchange for incentives (e.g., location-specific coupons). FourSquare and similar applications (Gowalla, Yelp, etc.) generally broadcast each check-in via Twitter or Facebook; in addition, some GPS-enabled mobile Twitter clients add explicit geocodes to individual tweets.

Here, we propose the use of geocoded social media data as a real-time fine-grained proxy for human travel.

 

Objective

To use sequential, geocoded social media data as a proxy for human movement to support both disease surveillance and disease modeling efforts.

Referenced File
Submitted by elamb on
Description

Influenza-like illness (ILI) data is collected via an Influenza Sentinel Provider Surveillance Network at the state level. Because participation is voluntary, locations of the sentinel providers may not reflect optimal geographic placement. This study analyzes two different geographic placement schemes - a maximal coverage model (MCM) and a K-median model, two location-allocation models commonly used in geographic information systems. The MCM chooses sites in areas with the densest population. The K-median model chooses sites which minimize the average distance traveled by individuals to their nearest site. We have previously shown how a placement model can be used to improve population coverage for ILI surveillance in Iowa when considering the sites recruited by the Iowa Department of Public Health. We extend this work by evaluating different surveillance placement algorithms with respect to outbreak intensity and timing (i.e., being able to capture the start, peak and end of the influenza season).

 

Objective

To evaluate the performance of several sentinel surveillance site placement algorithms for ILI surveillance systems. We explore how these different approaches perform by capturing both the overall intensity and timing of influenza activity in the state of Iowa.

Submitted by elamb on
Description

Alcohol abuse is one of the major leading causes of preventable mortality in the United States. Binge drinking or excessive alcohol consumption, categorized as a pattern of drinking that brings a person's blood alcohol concentration to 0.08, has become a major cause for concern, especially in the 18 to 20 year old population. Iowa City is home to the University of Iowa, a large public university of 30,000 students. On June 1, 2010 the city council enacted a new ordinance prohibiting persons under 21 from entering or remaining in bars (establishments after 10:00 PM whose primary purpose is the sale of alcoholic beverages) after 10:00 PM. Prior to the ordinance, Iowa City was the only municipality in the region where underage patrons were allowed on premises. The new ordinance was enacted largely in response to public safety concerns, including perceptions of increased violence and sexual assaults, especially at bar closing time.

Our hypothesis is that the under 21 ordinance also resulted in changes to travel behavior, where prior to the ordinance, the campus bar culture constituted an "attractive nuisance," attracting a volatile mix of college students and non locals of all ages.

 

Objective

To study alcohol-related arrests during the time surrounding the introduction of an alcohol-related ordinance in the Iowa City, IA area.

Submitted by elamb on
Description

The increasing use of the Internet to arrange sexual encounters presents challenges to public health agencies formulating STD interventions, particularly in the context of anonymous encounters. These encounters complicate or break traditional interventions. In previous work [1], we examined a corpus of anonymous personal ads seeking sexual encounters from the classifieds website Craigslist and presented a way of linking multiple ads posted across time to a single author. The key observation of our approach is that some ads are simply reposts of older ads, often updated with only minor textual changes. Under the presumption that these ads, when not spam, originate from the same author, we can use efficient near-duplicate detection techniques to cluster ads within some threshold similarity. Linking ads in this way allows us to preserve the anonymity of authors while still extracting useful information on the frequency with which authors post ads, as well as the geographic regions in which they seek encounters. While this process detects many clusters, the lack of a true corpus of authorship-linked ads makes it difficult to validate and tune the parameters of our system. Fortunately, many ad authors provide an obfuscated telephone number in ad text (e.g., 867-5309 becomes 8sixseven5three oh nine) to bypass Craigslist filters, which prohibit including phone numbers in personal ads. By matching phone numbers of this type across all ads, we can create a corpus of ad clusters known to be written by a single author. This authorship corpus can then be used to evaluate and tune our existing near-duplicate detection system, and in the future identify features for more robust authorship attribution techniques.

Objective:

This paper constructs an authorship-linked collection or corpus of anonymous, sex-seeking ads found on the classifieds website Craigslist. This corpus is then used to validate an authorship attribution approach based on identifying near duplicate text in ad clusters, providing insight into how often anonymous individuals post sexseeking ads and where they meet for encounters.

Submitted by Magou on
Description

Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Recently, Wikipedia access logs (e.g., McIver 2014, Generous 2014) have been shown to be effective in this arena. Much richer Wikipedia data are available, though, including the entire Wikipedia article content and edit histories.

We study two different aspects of Wikipedia content as it relates to unfolding disease events: 1) we demonstrate how to capture case, death, and hospitalization counts from the article text, and 2) we show there are valuable time series data present in the tables found in certain articles.

We argue that Wikipedia data cannot only be used for disease surveillance but also as a centralized repository system for collecting disease-related data in near real-time.

Objective

To improve traditional outbreak surveillance systems by utilizing the content of Wikipedia articles.

Submitted by teresa.hamby@d… on