Performance of machine learning method to classify free-text medical causes of death

Mortality is an indicator of the severity of the impact of an event on the population. In France mortality surveillance is part of the syndromic surveillance system SurSaUD and is carried out by SantÃÂ© publique France, the French public health agency. The set-up of an Electronic Death Registration System (EDRS) in 2007 enabled to receive in real-time medical causes of death in free-text format. This data source was considered as reactive and valuable to implement a reactive mortality surveillance system using medical causes of death (1). The reactive mortality surveillance system is based on the monitoring of Mortality Syndromic Groups (MSGs). An MSG is defined as a cluster of medical causes of death (pathologies, syndromes, symptoms) that meet the objectives of early detection and impact assessment of events (2). Since causes of death are entered in free-text format, their automatic classifications into MSGs require the use of natural language processing methods. We observe a constant increase in the use of these methods to classify medical information and for health surveillance over the last two decades (3).

Objective: This study aims to implement and evaluate two automatic classification methods of free-text medical causes of death into Mortality Syndromic Groups (MSGs) in order to be used for reactive mortality surveillance.

Submitted by elamb on Tue, 06/18/2019 - 20:22