Skip to main content

Wikipedia Usage Estimates Prevalence of Influenza-like Illness in Near Real-Time

Description

The purpose of this work was to develop a novel method of estimating the amount of influenza-like illness (ILI) in a population, in near-real time, by using a source of information that is completely open to the public and free to access. We investigated the usefulness of data gathered from Wikipedia to estimate the prevalence of ILI in the United States, using data from the Centers for Disease Control and Prevention (CDC) as well as Google Flu Trends.

Introduction

Each year, there are an estimated 250,000–500,000 deaths worldwide that are attributed to seasonal influenza, with anywhere between 3,000–50,000 deaths occurring in the United States of America (US). In the US, the Centers for Disease Control and Prevention (CDC) continuously monitors the level of influenza-like illness (ILI) circulating in the population. While the CDC ILI data is considered to be a useful indicator of influenza activity, its availability has a known lag-time of between 7–14 days. To appropriately distribute vaccines, staff, and other healthcare commodities, it is critical to have up-to-date information about the prevalence of ILI in a population. To this end, we have created a method of estimating current ILI activity in the US by gathering information on the number of times particular Wikipedia articles have been viewed. Not only is the information held within Wikipedia articles very useful on its own, but statistics and trends surrounding the amount of usage of particular articles, frequency of article edits, region specific statistics, and countless other factors make the Wikipedia environment an area of interest for researchers. Furthermore, Wikipedia makes all of this information public and freely available, greatly increasing and expediting any potential research studies that aim to make use of their data.

 

Submitted by aising on