Skip to main content

Searching for Complex Patterns Using Disjunctive Anomaly Detection

Description

Modern biosurveillance data contains thousands of unique time series defined across various categorical dimensions (zipcode, age groups, hospitals). Many algorithms are overly specific (tracking each time series independently would often miss early signs of outbreaks), or too general (detections at state level may lack specificity reflective of the actual process at hand). Disease outbreaks often impact multiple values (disjunctive sets of zipcodes, hospitals, multiple age groups) along subsets of multiple dimensions of data. It is not uncommon to see outbreaks of different diseases occurring simultaneously (e.g. food poisoning and flu) making it hard to detect and characterize the individual events. We proposed Disjunctive Anomaly Detection (DAD) algorithm to efficiently search across millions of potential clusters defined as conjunctions over dimensions and disjunctions over values along each dimension. An example anomalous cluster detectable by DAD may identify zipcode = {z1 or z2 or z3 or z5} and age_group = {child or senior} to show unusual activity in the aggregate. Such conjunctive-disjunctive language of cluster definitions enables finding realworld outbreaks that are often missed by other state-of-art algorithms like What’s Strange About Recent Events (WSARE) or Large Average Submatrix (LAS). DAD is able to identify multiple interesting clusters simultaneously and better explain complex anomalies in data than those alternatives.

Objective

Disjunctive anomaly detection (DAD) algorithm can efficiently search across multidimensional biosurveillance data to find multiple simultaneously occurring (in time) and overlapping (across different data dimensions) anomalous clusters. We introduce extensions of DAD to handle rich cluster interactions and diverse data distributions

Submitted by ynwang@ufl.edu on