Skip to main content

Arnold James

Description

Most countries do not report national notifiable disease data in a machine-readable format. Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health. While most nations likely store incident data in a machine-readable format, governments are often hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational issues1. A survey conducted by LANL of notifiable disease data reporting in over fifty countries identified only a few websites that report data in a machine-readable format. The majority (>70%) produce reports as PDF files on a regular basis. The bulk of the PDF reports present data in a structured tabular format, while some report in natural language. The structure and format of PDF reports change often; this adds to the complexity of identifying and parsing the desired data. Not all websites publish in English, and it is common to find typos and clerical errors. LANL has developed a tool, Epi Archive, to collect global notifiable disease data automatically and continuously and make it uniform and readily accessible.

Objective:

LANL has built software that automatically collects global notifiable disease data, synthesizes the data, and makes it available to humans and computers within the Biosurveillance Ecosystem (BSVE) as a novel data stream. These data have many applications including improving the prediction and early warning of disease events.

Submitted by elamb on
Description

Most countries do not report national notifiable disease data in a machine-readable format. Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health as exemplified by the Biosurveillance Ecosystem (BSVE). While most nations do likely store their data in a machine-readable format, the governments are often hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational issues. For example, an attempt by LANL to obtain a weekly version of openly available monthly data, reported by the Australian government, resulted in an onerous bureaucratic reply. The obstacles to obtaining data included: paperwork to request data from each of the Australian states and territories, a long delay to obtain data (up to 3 months) and extensive limitations on the data’s use that prohibit collaboration and sharing. This type of experience when attempting to contact public health departments or ministries of health for data is not uncommon. A survey conducted by LANL of notifiable disease data reporting in 52 countries identified only 10 as being machine-readable and 42 being reported in pdf files on a regular basis. Within the 42 nations that report in pdf files, 32 report in a structured, tabular format and 10 in a non-structured way. As a result, LANL has developed a tool-Epi Archive (formerly known as EPIC)-to automatically and continuously collect global notifiable disease data and make it readily accesible.

Objective

LANL has built a software program that automatically collects global notifiable disease data—particularly data stored in files—and makes it available and shareable within the Biosurveillance Ecosystem (BSVE) as a new data source. This will improve the prediction and early warning of disease events and other applications.

Submitted by teresa.hamby@d… on