Applying classification and anomaly detection techniques to real-world data

Real-world public health data often provide numerous challenges. There may be a limited amount of background data, data dropouts, noise, and human error. The data from an emergency department (ED) in Urbana, IL includes a diagnosis field with multiple terms and notes separated by semicolons. There are over 7000 distinct terms, excluding the notes. Because it begins in April 2009, there is not yet adequate background data to use some of the regressionbased alerting algorithms. Values for some days are missing, so we also needed an algorithm that would tolerate data dropouts.

INDICATOR is a workflow-based biosurveillance system developed at the National Center for Supercomputing Applications. One of the fundamental concepts of INDICATOR is that the burden of cleaning and processing incoming data should be on the software, rather than on the health care providers.

Objective

This paper compares different approaches with classification and anomaly detection of data from an ED.

Referenced File

Applying_Classification_And_Anomaly_Detection_Techniques_To_Real_World_Data.pdf

Submitted by hparton on Tue, 06/18/2019 - 11:41