The NGram CC Classifier: A Novel Method of Automatically Creating CC Classifiers Based on ICD9 Groupings


Syndromic surveillance of emergency department (ED) visit data is often based on computer algorithms which assign patient chief complaints (CC) to syndromes. ICD9 code data may also be used to develop visit classifiers for syndromic surveillance but the ICD9 code is generally not available immediately, thus limiting its utility. However, ICD9 has the advantages that ICD9 classifiers may be created rapidly and precisely as a subset of existing ICD9 codes and that the ICD9 codes are independent of the spoken language. If a classifier based on ICD9 codes could be used to automatically create the code for a chief-complaint assignment algorithm then CC algorithms could be created and updated more rapidly and with less labor. They could also be created in multiple spoken languages. We had developed a method for doing this based on an “ngram” text processing program adapted from business research technology (AT&T Labs). The method applies the ICD9 classifier to a training set of ED visits for which both the CC and ICD9 code are known. A computerized method is used to automatically generate a collection of CC substrings with associated probabilities, and then generate a CC classifier program. The method includes specialized selection techniques and model pruning to automatically create a compact and efficient classifier.



Our objective was to determine how closely the performance of an ngram CC classifier for the gastrointestinal syndrome matched the performance of the ICD9 classifier.

Primary Topic Areas: 
Original Publication Year: 
Event/Publication Date: 
September, 2005

July 30, 2018

Contact Us

NSSP Community of Practice



This website is supported by Cooperative Agreement # 6NU38OT000297-02-01 Strengthening Public Health Systems and Services through National Partnerships to Improve and Protect the Nation's Health between the Centers for Disease Control and Prevention (CDC) and the Council of State and Territorial Epidemiologists. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC. CDC is not responsible for Section 508 compliance (accessibility) on private websites.

Site created by Fusani Applications