Skip to main content

Data Mining


Infectious diseases, though initially tend to be limited geographically to a reservoir; a subsequent spatial variation in disease prevalence (including spread & intensity) arises from the underlying differences in physical-biological conditions that support pathogen, its vectors & reservoirs. Different factors like spatial proximity, physical & social connectivity, & local environmental conditions which add to its susceptibility influence the occurrence[2]. In Disease management, analysis of historical data over various aspects of geography, epidemiology, social structures & network dynamics need to be accounted for. Large amounts of data raise issues of data processing, storage, pattern identification, etc. In addition, identifying the source of disease occurrence & its pattern can be of immense value. ST-DM of disease data can be an effective tool for endemic preparedness[3], as it extracts implicit knowledge, spatial & temporal relationships, or other patterns inherent in such databases. Here, Core Region is defined as a set of spatial entities(eg.counties) aggregated over time, which occur frequently at places having high values in a defined region (considering areas of influence around them)[1].


This work leverages spatio-temporal data mining (ST-DM), the MiSTIC (Mining Spatio-Temporally Invariant Cores)[1,6] method for infectious disease surveillance, by identifying a) Extent of spatial spread of disease core regions across populations-scale of disease prevalence b) Possible causes of the observed patterns-for better prediction, detection & management of infectious disease & its outbreaks.

Submitted by Magou on

In recent years, individuals have been using social network sites like Facebook, Twitter, and Reddit to discuss health-related topics. These social media platforms consequently became new avenues for research and applications for researchers, for instance disease surveillance. Reddit, in particular, can potentially provide more in-depth contextual insights compared to Twitter, and Reddit members discuss potentially more diverse topics than Facebook members. However, identifying relevant discussions remains a challenge in large datasets like Reddit. Thus, much previous research using Reddit data focused on selected few topically-oriented sub-communities. Although such approach allows for topically focused datasets, a large portion of related data can be missed. In this research, we examine all sub-communities in which members are discussing e-cigarettes in order to determine if investigating these other sub-communities could result in a better smoking surveillance system.


We aim to explore how to effectively leverage social media for vaping electronic cigarette (e-cigarette) surveillance. This study examines how members of a social media platform called Reddit utilize topically-oriented sub-communities for e-cigarette discussions.

Submitted by elamb on

Nearly 100 people per day die from opioid overdose in the United States. Further, prescription opioid abuse is assumed to be responsible for a 15-year increase in opioid overdose deaths. However, with increasing use of social media comes increasing opportunity to seek and share information. For instance, 80% of Internet users obtain health information online, including popular social interaction sites like Reddit (, which had more than 82.5 billion page views in 20153. In Reddit, members often share information, and include URLs to supplement the information. Understanding the frequency of URL sharing and types of shared URLs can improve our knowledge of information seeking/sharing behaviors as well as domains of shared information on social media. Such knowledge has the potential to provide opportunities to improve public health surveillance practice. We use Reddit to track opioid related discussions and then investigate types of shared URLs among Reddit members in those discussions.


We aim to understand (1) the frequency of URL sharing and (2) types of shared URLs among opioid related discussions that take place in the social media platform called Reddit.

Submitted by elamb on