Skip to main content

Semantic Analysis of Open Source Data for Syndromic Surveillance

Description

Social media messages are often short, informal, and ungrammatical. They frequently involve text, images, audio, or video, which makes the identification of useful information difficult. This complexity reduces the efficacy of standard information extraction techniques1. However, recent advances in NLP, especially methods tailored to social media2, have shown promise in improving real-time PH surveillance and emergency response3. Surveillance data derived from semantic analysis combined with traditional surveillance processes has potential to improve event detection and characterization. The CDC Office of Public Health Preparedness and Response (OPHPR), Division of Emergency Operations (DEO) and the Georgia Tech Research Institute have collaborated on the advancement of PH SA through development of new approaches in using semantic analysis for social media.

Objective

The objective of this analysis is to leverage recent advances in natural language processing (NLP) to develop new methods and system capabilities for processing social media (Twitter messages) for situational awareness (SA), syndromic surveillance (SS), and event-based surveillance (EBS). Specifically, we evaluated the use of human-in-the-loop semantic analysis to assist public health (PH) SA stakeholders in SS and EBS using massive amounts of publicly available social media data.

Submitted by Magou on