Syndromic surveillance of emergency department (ED) visit data is often based on computer algorithms which assign patient chief complaints (CC) and ICD code data to syndromes. The triage nurse note (NN) has also been used for surveillance. Previously we developed an “NGram” classifier for syndromic surveillance of ED CC in Italian for detection of natural outbreaks and bioterrorism. The classifier is developed from a set of ED visits for which both the ICD diagnosis code and CC are available by measuring the associations of text fragments within the CC (e.g. 3 characters for a “3-gram”) with a syndromic group of ICD codes. We found good correlation between daily volumes by the ICD10 classifier and estimated by NGrams. However, because the CC was limited to 23 options based on the pick list, it might be possible to obtain results as good as the NGram method or better using a simpler probabilistic approach. Also, in addition to the CC, the Italian data included a free-text NN note. We might be able achieve improved performance by applying the n-gram method to the NN or the CC supplemented by the NN.
Objective
Our objective was to compare the performance of the NGram CC classifier to two discrete classifiers based on probabilistic associations with the CC pick list items. Also, we wished to determine the performance of the NGram method applied to CC alone, NN alone, and CC plus NN.

