Development and initial validation of a data quality evaluation tool in obstetrics real-world data through HL7-FHIR interoperable Bayesian networks and expert rules

Autores da FMUP
Participantes de fora da FMUP
- Coutinho-Almeida, Joao
- Saez, Carlos
Unidades de investigação
Abstract
Background The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement.Objective This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data.Methods A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020.Results The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool's infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians' assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined.Discussion Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality.Conclusion This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool's utility across diverse medical domains. With the widespread use of healthcare information systems, a vast amount of health data are generated, stored in electronic health records (EHRs). These data have the potential to advance medical knowledge and improve patient care, but only if it is of high quality. Data quality varies depending on its use, such as daily patient care, research, or management purposes. Poor data quality in EHRs can lead to incorrect healthcare decisions. Errors can occur at various stages, from data entry to processing and interpretation. Different approaches are needed to assess data quality based on its intended use. This article focuses on developing a tool to improve data quality in obstetrics using 3 main categories: completeness, plausibility, and conformance. Tested with data from 9 Portuguese hospitals, the tool uses methods like Bayesian networks and rule-based systems. Initial real-world testing showed promising results. However, assessing data quality remains complex and context dependent. Future research will refine the tool and expand its application. This work is a significant step towards ensuring high-quality EHR data for clinical and research purposes.
Dados da publicação
- ISSN/ISSNe:
- 2574-2531, 2574-2531
- Tipo:
- Article
- Páginas:
- -
- Link para outro recurso:
- www.scopus.com
JAMIA Open Oxford University Press
Citações Recebidas na Web of Science: 1
Citações Recebidas na Scopus: 1
Documentos
- Não há documentos
Filiações
Keywords
- data quality; machine-learning; FHIR; real-world data; Bayesian networks
Proyectos asociados
Impacto da COVID-19 nas taxas de cesarianas por classificação de Robson nos hospitais portugueses.
Investigador Principal: Ricardo João Cruz Correia
Estudo Observacional Académico (Cesarianas) . 2020
Predição e análise do tipo de parto em gestantes portuguesas através de Redes Bayesianas.
Investigador Principal: Pedro Pereira Rodrigues
Estudo Observacional Académico (Redes Bayesianas) . 2021
Predição do resultado neonatal baseado na idade gestacional e peso ao nascimento: uma graduação de risco para cenários de nascimento com poucos recursos.
Investigador Principal: Ricardo João Cruz Correia
Estudo Observacional Académico (NEONATAL) . 2019
Hospitalização ou vigilância: ação precoce na orientação de pacientes com COVID-19.
Investigador Principal: Pedro Pereira Rodrigues
Estudo Observacional Académico (Orientação) . 2020
Desenvolvimento de uma escala de risco COVID-19 através de uma análise I&D probabilística de Monte Carlo de forma a dotar o Hospital de Ovar de planos de contingência adaptados para gestão de casos de Pandemia. (COVID-19)
Investigador Principal: Ricardo João Cruz Correia
Estudo Clínico Académico (COVID-19) . FCT . 2020
Excess mortality during COVID-19 in 5 european countries and a critique of mortality data analysis
Investigador Principal: Ricardo João Cruz Correia
Estudo Clínico Académico (Mortality) . 2020
Identifying problems in the appointment scheduling system of a major Portuguese public hospital - Is there room for improvement?
Investigador Principal: Pedro Pereira Rodrigues
Estudo Clínico Académico (Scheduling system) . 2020
Congenital Heart Disease Detection Using Clinical Data and Auscultation Heart Sounds: a Machine Learning Approach
Investigador Principal: Pedro Pereira Rodrigues
Estudo Clínico Académico . 2021
Impact of implementing Theory of Constrains combined with Lean thinking in healthcare services
Investigador Principal: Pedro Pereira Rodrigues
Estudo Clínico Académico . 2023
Development and validation of a diagnostic model for obstructive sleep apnea: a Bayesian network approach
Investigador Principal: Pedro Pereira Rodrigues
Estudo Clínico Académico . 2023
Citar a publicação
Coutinho J,Saez C,Correia R,Rodrigues PP. Development and initial validation of a data quality evaluation tool in obstetrics real-world data through HL7-FHIR interoperable Bayesian networks and expert rules. JAMIA Open. 2024. 7. (3):ooae062.