Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications

Autores da FMUP
Participantes de fora da FMUP
- Souza, J
- Caballero, I
- Santos, J.
- Viana, J.
- Saez, C
Unidades de investigação
Abstract
Background: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for variability assessment are usually available for these datasets. In this study, we aim to describe the process of discovering data quality implications by applying a set of methods for assessing variability between sources and over time in a large hospital database. Methods: We described and applied a set of multisource and temporal variability assessment methods in a large Portuguese hospitalization database, in which variation in condition-specific hospitalization ratios derived from clinically coded data were assessed between hospitals (sources) and over time. We identified condition-specific admissions using the Clinical Classification Software (CCS), developed by the Agency of Health Care Research and Quality. A Statistical Process Control (SPC) approach based on funnel plots of condition-specific standardized hospitalization ratios (SHR) was used to assess multisource variability, whereas temporal heat maps and Information-Geometric Temporal (IGT) plots were used to assess temporal variability by displaying temporal abrupt changes in data distributions. Results were presented for the 15 most common inpatient conditions (CCS) in Portugal. Main findings: Funnel plot assessment allowed the detection of several outlying hospitals whose SHRs were much lower or higher than expected. Adjusting SHR for hospital characteristics, beyond age and sex, considerably affected the degree of multisource variability for most diseases. Overall, probability distributions changed over time for most diseases, although heterogeneously. Abrupt temporal changes in data distributions for acute myocardial infarction and congestive heart failure coincided with the periods comprising the transition to the International Classification of Diseases, 10th revision, Clinical Modification, whereas changes in the DiagnosisRelated Groups software seem to have driven changes in data distributions for both acute myocardial infarction and liveborn admissions. The analysis of heat maps also allowed the detection of several discontinuities at hospital level over time, in some cases also coinciding with the aforementioned factors. Conclusions: This paper described the successful application of a set of reproducible, generalizable and systematic methods for variability assessment, including visualization tools that can be useful for detecting abnormal patterns in healthcare data, also addressing some limitations of common approaches. The presented method for multisource variability assessment is based on SPC, which is an advantage considering the lack of gold standard for such process. Properly controlling for hospital characteristics and differences in case-mix for estimating SHR is critical for isolating data quality-related variability among data sources. The use of IGT plots provides an advantage over common methods for temporal variability assessment due its suitability for multitype and multimodal data, which are common characteristics of healthcare data. The novelty of this work is the use of a set of methods to discover new data quality insights in healthcare data.
Dados da publicação
- ISSN/ISSNe:
- 1532-0480, 1532-0464
- Tipo:
- Article
- Páginas:
- -
- Link para outro recurso:
- www.scopus.com
Journal of Biomedical Informatics Academic Press Inc.
Citações Recebidas na Web of Science: 2
Citações Recebidas na Scopus: 3
Documentos
- Não há documentos
Filiações
Filiações não disponíveis
Keywords
- Data quality; Clinical coding; Data variability; Clinical classification software; International classification of diseases
Proyectos asociados
Stimulate continous monitoring in personal and physical health.
Investigador Principal: José Alberto da Silva Freitas
Estudo Observacional Académico (INNO4HEALTH) . FCT . 2021
Estudos de avaliação de exequibilidade, usabilidade e utilização de uma app para telemóvel para gestão da diabetes tipo 2.
Investigador Principal: José Alberto da Silva Freitas
Estudo Observacional Académico (FoodFriend) . FCT . 2022
Tendências nas Hospitalizações por Insuficiência Cardíaca durante um Período de Dezasseis Anos: Dados de Abrangência Nacional para Portugal
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico (Hospitalizações IC) . 2022
Sequelas pulmonares da COVID-19 após recuperação em pacientes críticos: avaliação clínica, radiológica e funcional pulmonar
Investigador Principal: André da Silva Marques Pinto
Estudo Clínico Académico . 2022
Healthcare Human Resources and Quality Indicators: Approaches to Strengthening Primary Care.
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2022
A machine learning-based approach to support the assessment of clinical coded data quality in the context of Diagnosis-Related Groups classification systems
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2020
Citar a publicação
Souza J,Caballero I,Santos J,Lobo M,Pinto A,Viana J,Saez C,Lopes F,Freitas A. Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications. J. Biomed. Inform. 2022. 136. 104242. IF:4,500. (2).