Patients with multiple disorders usually have long diagnosis lists, constitute by ICD-10 codes together with individual free-text descriptions. These text snippets are produced by overwriting standardized ICD-Code topics by the physicians at the point of care. They provide highly compact expert descriptions within a 50-character long text field frequently not assigned to a specific ICD-10 code. The high redundancy of these lists would benefit from content-based categorization within different hospital-based application scenarios. This work demonstrates how to accurately group diagnosis lists via a combination of natural language processing and hierarchical clustering with an overall F-measure value of 0.87. In addition, it compresses the initial diagnosis list up to 89%. The manuscript discusses pitfall and challenges as well as the potential of a large-scale approach for tackling this problem.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com