Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Autores da FMUP
Participantes de fora da FMUP
- Dong, TM
- Sunderland, N
- Nightingale, A
- Fudulu, DP
- Chan, J
- Zhai, B
- Caputo, M
- Dimagli, A
- Mires, S
- Wyatt, M
- Benedetto, U
- Angelini, GD
Unidades de investigação
Abstract
Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R-2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
Dados da publicação
- ISSN/ISSNe:
- 2306-5354, 2306-5354
- Tipo:
- Article
- Páginas:
- 1307-
- Link para outro recurso:
- www.scopus.com
BIOENGINEERING-BASEL MDPI AG
Citações Recebidas na Scopus: 4
Documentos
- Não há documentos
Filiações
Keywords
- electronic health records (EHR); Big Data; unstructured data; echo report; echocardiography analysis; natural language processing (NLP); data extraction; validation
Proyectos asociados
Stimulate continous monitoring in personal and physical health.
Investigador Principal: José Alberto da Silva Freitas
Estudo Observacional Académico (INNO4HEALTH) . FCT . 2021
Estudos de avaliação de exequibilidade, usabilidade e utilização de uma app para telemóvel para gestão da diabetes tipo 2.
Investigador Principal: José Alberto da Silva Freitas
Estudo Observacional Académico (FoodFriend) . FCT . 2022
Portuguese Public Hospitals Financial Performance between 2014-2020
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico (Financial Performance) . 2023
Tendências nas Hospitalizações por Insuficiência Cardíaca durante um Período de Dezasseis Anos: Dados de Abrangência Nacional para Portugal
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico (Hospitalizações IC) . 2022
The use of secondary data in Mental Health research
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2023
Health priorities in the European Union - a novel framework
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2023
Healthcare Human Resources and Quality Indicators: Approaches to Strengthening Primary Care.
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2022
A machine learning-based approach to support the assessment of clinical coded data quality in the context of Diagnosis-Related Groups classification systems
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2020
Citar a publicação
Dong TM,Sunderland N,Nightingale A,Fudulu DP,Chan J,Zhai B,Freitas A,Caputo M,Dimagli A,Mires S,Wyatt M,Benedetto U,Angelini GD. Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database. Bioeng. 2023. 10. (11):p. 1307-1307. IF:4,600. (2).