Improving System Ability to Identify Symptom Complexes in Free-Text Data


Text-based syndrome case definitions published by the Center for Disease Control (CDC)1 form the basis for the syndrome queries used by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT). Keywords within these case definitions were identified by public health epidemiologists for use as search terms with the goal of capturing symptom complexes from free-text chief complaint and triage note data for the purpose of early event detection and situational awareness. Initial attempts at developing SQL queries incorporating these search terms resulted in the return of many unwanted records due to the inability to control for certain terms imbedded within unrelated free text strings. For example, a query containing the search term “h/a”, a common abbreviation for headache, also returns false positives such as “cough/asthma”, “skin rash/allergic reaction” or “psych/anxiety”.  Simple abbreviations without punctuation, such as “ha”, were even more problematic.  Global wildcards ('%') indicate that zero or more characters of any type may substitute for the wildcard.2 The term “ha” as a synonym for "headache" appears frequently in the data, but searching this term bracketed by global wildcards returns any instance where the two letters appear together (e.g. pharyngitis, hand, hallucinations, toothache). Using global wild cards to search for common symptoms such as headache using simple abbreviations, with or without specialized punctuation, results in the return of many unwanted false positive records. We describe here the advanced application of SQL character set wildcards to address this problem.


This paper describes a novel approach to the construction of syndrome queries written in Structured Query Language (SQL). Through the advanced application of character set wildcards, we are able to increase the number of valid records identified by our queries while simultaneously decreasing the number of false positives.

Primary Topic Areas: 
Original Publication Year: 
Event/Publication Date: 
October, 2006

July 30, 2018

Contact Us

NSSP Community of Practice



This website is supported by Cooperative Agreement # 6NU38OT000297-02-01 Strengthening Public Health Systems and Services through National Partnerships to Improve and Protect the Nation's Health between the Centers for Disease Control and Prevention (CDC) and the Council of State and Territorial Epidemiologists. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC. CDC is not responsible for Section 508 compliance (accessibility) on private websites.

Site created by Fusani Applications