Optimizing Performance of an Ngram Method for Classifying Emergency Department Visits into the Respiratory Syndrome

A number of different methods are currently used to classify patients into syndromic groups based on the patient’s chief complaint (CC). We previously reported results using an “Ngram” text processing program for building classifiers (adapted from business research technology at AT&T Labs). The method applies the ICD9 classifier to a training set of ED visits for which both the CC and ICD9 code are known. A computerized method is used to automatically generate a collection of CC substrings (or Ngrams), with associated probabilities, from the training data. We then generate a CC classifier from the collection of Ngrams and use it to find a classification probability for each patient. Previously, we presented data showing good correlation between daily volumes as measured by the Ngram and ICD9 classifiers.

Objective

Our objective was to determine the optimized values for the sensitivity and specificity of the Ngram CC classifier for individual visits using a ROC curve analysis. Points on the ROC curve correspond to different classification probability cutoffs.

Referenced File

Optimizing_Performance_Of_An_Ngram_Method_For_Classifying_Emergency_Department_Visits_Into_The_Respiratory_Syndrome.pdf

Submitted by elamb on Mon, 07/30/2018 - 08:40