Skip to main content

Expanding a Gazetteer-based Approach for Geo-Parsing Text from Media Reports on Global Disease Outbreaks


HealthMap ( is a freely accessible, automated real-time system that monitors, organizes, integrates, filters, and maps online news about emerging diseases. The system performs geographic parsing (“geo-parsing”) of disease outbreaks by assigning incoming alerts to low resolution geographic descriptions, such as  country, with the help of a purposely crafted gazetteer. However, the system is limited by the size of the gazetteer, precluding high resolution assignment of place. In this study, we use the prior knowledge encoded in the gazetteer to expand the capabilities of the geo-parsing system.



Discovering geographic references in text is a task that human readers perform using both their lexical and contextual knowledge. Automating this task for real-time surveillance of informal sources on epidemic intelligence therefore requires efforts beyond dictionary-based pattern matching. Here, we describe an automated approach to learning the particular context in which outbreak locations appear and by this means extending prior knowledge encoded in a gazetteer.

Submitted by elamb on