Automated Classification of Alcohol Use by Text Mining of Electronic Medical Records

EMRs are a potentially valuable source of information about a patient’s history of health risk behaviors, such as excessive alcohol consumption or smoking. This information is often found in the unstructured (i.e., free) text of physician notes. It may be difficult to classify and analyze health risk behaviors because there are no standardized formats for this type of information1. As well, the completeness of the data may vary across clinics and physicians. The application of automated classification tools for this type of information could be useful for describing patterns within the population and developing disease risk prediction models.

Natural Language Processing (NLP) tools are currently used to process EMR free text in an automated and systematic way. However, these tools have primarily been applied to classify information about the presence or absence of disease diagnoses. The application of NLP tools to health risk behaviors, particularly alcohol use information from primary care EMRs, has thus far received limited attention.

Objective

The research objective was to develop and validate an automated system to extract and classify patient alcohol use based on unstructured (i.e., free) text in primary care electronic medical records (EMRs).

Submitted by Magou on Fri, 06/02/2017 - 01:00