Skip to main content

Data Analytics

Description

Dengue Fever (DF) is a vector-borne disease of the flavivirus family carried by the Aedes aegypti mosquito, and one of the leading causes of illness and death in tropical regions of the world. Nearly 400 million people become infected each year, while roughly one-third of the world’s population live in areas of risk. Dengue fever has been endemic to Colombia since the late 1970s and is a serious health problem for the country with over 36 million people at risk. We used the Magdalena watershed of central Colombia as the site for this study due to its natural separation from other geographical regions in the country, its wide range of climatic conditions, the fact that it includes the main urban centers in Colombia, and houses 80% of the country’s population. Advances in the quality and types of remote sensing (RS) satellite imagery has made it possible to enhance or replace the field collection of environmental data such as precipitation, temperature, and land use, especially in remote areas of the world such as the mountainous areas of Colombia. We modeled the cases of DF by municipality with the environmental factors derived from the satellite data using boosted regression tree analysis. Boosted regression tree analysis (BRT), has proven useful in a wide range of studies, from predicting forest productivity to other vector-borne diseases such as Leishmaniosis, and Crimean-Congo hemorrhagic fever. Using this framework, we set out to determine what are the differences between using presence/absence and case counts of DF in this type of analysis?

Objective:

In this paper we used Boosted Regression Tree analysis coupled with environmental factors gathered from satellite data, such as temperature, elevation, and precipitation, to model the niche of Dengue Fever (DF) in Colombia.

Submitted by elamb on
Description

Global biosurveillance is an extremely important, yet challenging task. One form of global biosurveillance comes from harvesting open source online data (e.g. news, blogs, reports, RSS feeds). The information derived from this data can be used for timely detection and identification of biological threats all over the world. However, the more inclusive the data harvesting procedure is to ensure that all potentially relevant articles are collected, the more data that is irrelevant also gets harvested. This issue can become even more complex when the online data is in a non-native language. Foreign language articles not only create language-specific issues for Natural Language Processing (NLP), but also add a significant amount of translation costs. Previous work shows success in the use of combinatory monolingual classifiers in specific applications, e.g., legal domain. A critical component for a comprehensive, online harvesting biosurveillance system is the capability to identify relevant foreign language articles from irrelevant ones based on the initial article information collected, without the additional cost of full text retrieval and translation.

Objective:

The objective is to develop an ensemble of machine learning algorithms to identify multilingual, online articles that are relevant to biosurveillance. Language morphology varies widely across languages and must be accounted for when designing algorithms. Here, we compare the performance of a word embedding-based approach and a topic modeling approach with machine learning algorithms to determine the best method for Chinese, Arabic, and French languages.

Submitted by elamb on
Description

Indiana utilizes the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE) to collect and analyze data from participating hospital emergency departments. This real-time collection of health related data is used to identify disease clusters and unusual disease occurrences. By Administrative Code, the Indiana State Department of Health (ISDH) requires electronic submission of chief complaints from patient visits to EDs. Submission of discharge diagnosis is not required by Indiana Administrative Code, leaving coverage gaps. Our goal was to identify which areas in the state may see under reporting or incomplete surveillance due to the lack of the discharge diagnosis field.

Objective:

To identify surveillance coverage gaps in emergency department (ED) and urgent care facility data due to missing discharge diagnoses.

Submitted by elamb on
Description

Where we live' affects 'How we live'. Information about 'how one lives' collected from the public health surveillance data such as the Behavioral Risk Factor Surveillance System (BRFSS). Neighborhood environment surrounding individuals affects their health behavior or health status are influenced as well as their own traits. Meanwhile, geographical information of subjects recruited in the health behavior surveillance data is usually aggregated at administrative levels such as a county. Even if we do not know accurate addresses of individuals, we can allocate them to the random locations where is analogous to their real home within a locality using a geo-imputation method. In this study, we assess the association between obesity and built environment by applying random property allocation.

Objective:

This study aimed to assess the effects of urban physical environment on individual obesity using geographically aggregated health behavior surveillance data applying a geo-imputation method.

Submitted by elamb on
Description

National initiatives, such as Meaningful Use, are automating the detection and reporting of reportable disease events to public health, which has led to more complete, timely, and accurate public health surveillance data. However, electronic reporting has also lead to significant increases in the number of cases reported to public health. In order for this data to be useful to public health, it must be processed and made available to epidemiologists and investigators in a timely fashion for intervention and monitoring. To meet this challenge, the Utah Department of Health (UDOH)’s Disease Control and Prevention Informatics Program (DCPIP) has developed the Electronic Message Staging Area (EMSA). EMSA is a system capable of automatically filtering, processing, and evaluating incoming electronic laboratory reporting (ELR) messages for relevance to public health, and entering those laboratory results into Utah’s integrated disease surveillance system (UT-NEDSS) without impacting the overall efficiency of UT-NEDSS or increasing the workload of epidemiologists.

Objective:

The objective of this abstract is to illustrate how the Utah Department of Health processes a high volume of electronic data in an automated way. We do this by a series of rules engines that does not require human intervention.

Submitted by elamb on
Description

Timely and accurate syndromic surveillance depends on continuous data feeds from healthcare facilities. Typical outlier detection methodologies in syndromic surveillance compare predictions of counts for an interval to observed event counts, either to detect increases in volume associated with public health incidents or decreases in volume associated with compromised data transmission. Accurate predictions of total facility volume need to account for significant variance associated with the time of day and week; at the extreme are facilities which are only open during limited hours and on select days. Models need to account for the cross-product of all hours and days, creating a significant data burden. Timely detection of outages may require sub-hour aggregation, increasing this burden by increasing the number of intervals for which parameters need to be estimated. Nonparametric models for the probability of message arrival offer an alternative approach to generating predictions. The data requirements are reduced by assuming some time-dependent structure in the data rather than allowing each interval to be independent of all others, allowing for predictions at sub-hour intervals.

Objective:

Characterize the behavior of nonparametric regression models for message arrival probability as outage detection tools.

Submitted by elamb on
Description

The epidemiological situation of natural foci of tick-borne infections (TBI) in Ukraine, as well as globally, is characterized by significant activation of processes due to global climate change, growing human-induced factor and shortcomings in the organization and running of epidemiological surveillance. For the Western region of Ukraine, among all tick-borne zoonoses the most important are tick-borne viral encephalitis (TBVE), Lyme disease (LD), human granulocytic anaplasmosis (HGA) and some others. Taking into account the increased incidence rate for these infections, we have developed baseline criteria (indicators of natural contamination of the main carriers and levels of the serum layer among the population in relation to the TBI pathogens in the endemic areas) to identify areas with different risk of contamination through GIS-technologies.

Objective:

The main aim of this work is to estimate the projected risks based on the incidence rate of natural foci infections and to expand the list of criteria for the characterization of natural foci of tick-borne infections.

Submitted by elamb on
Description

The re-emergence of an infectious disease is dependent on social, political, behavioral, and disease-specific factors. Global disease surveillance is a requisite of early detection that facilitates coordinated interventions to these events. Novel informatics tools developed from publicly available data are constantly evolving with the incorporation of new data streams. Re-emerging Infectious Disease (RED) Alert is an open-source tool designed to help analysts develop a contextual framework when planning for future events, given what has occurred in the past. Geospatial methods assist researchers in making informed decisions by incorporating the power of place to better explain the relationships between variables.

Objective:

The application of spatial analysis to improve the awareness and use of surveillance data.

Submitted by elamb on
Description

Globally, there have been various studies assessing trends in Google search terms in the context of public health surveillance1. However, there has been a predominant focus on individual health outcomes such as influenza, with limited evidence on the added value and practical impact on public health action for a range of diseases and conditions routinely monitored by existing surveillance programmes. A proposed advantage is improved timeliness relative to established surveillance systems. However, these studies did not compare performance against other syndromic data sources, which are often monitored daily and already offer early warning over traditional surveillance methods. Google search data could also potentially contribute to assessing the wider population health impact of public health events by supporting estimation of the proportion of the population who are symptomatic but may not present to healthcare services.

Objective:

To carry out an observational study to explore what added value Google search data can provide to existing routine syndromic surveillance systems in England for a range of conditions of public health importance and summarise lessons learned for other countries.

Submitted by elamb on
Description

Effective clinical and public health practice in the twenty-first century requires access to data from an increasing array of information systems. However, the quality of data in these systems can be poor or “unfit for use.” Therefore measuring and monitoring data quality is an essential activity for clinical and public health professionals as well as researchers. Current methods for examining data quality largely rely on manual queries and processes conducted by epidemiologists. Better, automated tools for examining data quality are desired by the surveillance community.

Objective:

To extend an open source analytics and visualization platform for measuring the quality of electronic health data transmitted to syndromic surveillance systems.

Submitted by elamb on