Exploring the Value of Learned Representations for Automated Syndromic Definitions

Comprehensive medical syndrome definitions are critical for outbreak investigation, disease trend monitoring, and public health surveillance. However, because current definitions are based on keyword string-matching, they may miss important distributional information in free text and medical codes that could be used to build a more general classifier. Here, we explore the idea that individual ICD codes can be categorized by examining their contextual relationships across all other ICD codes. We extend previous work in representation learning with medical data by generating dense vector embeddings of these ICD codes found in emergency department (ED) visit records. The resulting representations capture information about disease co-occurrence that would typically require SME involvement and support the development of more robust syndrome definitions.

Objective:

To better define and automate biosurveillance syndrome categorization using modern unsupervised vector embedding techniques.

Submitted by elamb on Thu, 01/25/2018 - 21:40