Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal

Autores da FMUP
Participantes de fora da FMUP
- Amaral, R.
- Costa, E.
Unidades de investigação
Abstract
Background Quantifying and dealing with lack of consistency in administrative databases (namely, under-coding) requires tracking patients longitudinally without compromising anonymity, which is often a challenging task. Objective This study aimed to (i) assess and compare different hierarchical clustering methods on the identification of individual patients in an administrative database that does not easily allow tracking of episodes from the same patient; (ii) quantify the frequency of potential under-coding; and (iii) identify factors associated with such phenomena. Method We analysed the Portuguese National Hospital Morbidity Dataset, an administrative database registering all hospitalisations occurring in Mainland Portugal between 2011-2015. We applied different approaches of hierarchical clustering methods (either isolated or combined with partitional clustering methods), to identify potential individual patients based on demographic variables and comorbidities. Diagnoses codes were grouped into the Charlson an Elixhauser comorbidity defined groups. The algorithm displaying the best performance was used to quantify potential under-coding. A generalised mixed model (GML) of binomial regression was applied to assess factors associated with such potential under-coding. Results We observed that the hierarchical cluster analysis (HCA) + k-means clustering method with comorbidities grouped according to the Charlson defined groups was the algorithm displaying the best performance (with a Rand Index of 0.99997). We identified potential under-coding in all Charlson comorbidity groups, ranging from 3.5% (overall diabetes) to 27.7% (asthma). Overall, being male, having medical admission, dying during hospitalisation or being admitted at more specific and complex hospitals were associated with increased odds of potential under-coding. Discussion We assessed several approaches to identify individual patients in an administrative database and, subsequently, by applying HCA + k-means algorithm, we tracked coding inconsistency and potentially improved data quality. We reported consistent potential under-coding in all defined groups of comorbidities and potential factors associated with such lack of completeness. Conclusion Our proposed methodological framework could both enhance data quality and act as a reference for other studies relying on databases with similar problems.
Dados da publicação
- ISSN/ISSNe:
- 1833-3575, 1833-3583
- Tipo:
- Article
- Páginas:
- 174-182
- Link para outro recurso:
- www.scopus.com
Health Information Management Journal SAGE Publications Inc.
Documentos
- Não há documentos
Filiações
Keywords
- data quality; public health informatics; medical records; evaluation; health information management; under-coding; comorbidities; administrative database; clustering algorithms; unsupervised machine learning
Proyectos asociados
Prevalence and Characterisation of Asthma Patients According to Disease Severity in Portugal (EPI-ASTHMA) - NCT05169619
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Observacional (EPI-ASTHMA) . AstraZeneca . 2021
Effect of a Mobile App on Improving Asthma Control in Adolescents and Adults With Persistent Asthma: A Pilot Randomized Multicentre, Superiority Clinical Trial (mINSPIRERS) - NCT05129527
Investigador Principal: João de Almeida Lopes da Fonseca
Ensaio Clínico Académico (mINSPIRERS) . 2021
Utilização em estudos observacionais do Registo de Asma Grave Portugal.
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Observacional Académico (RAG) . 2020
Predição e análise do tipo de parto em gestantes portuguesas através de Redes Bayesianas.
Investigador Principal: Pedro Pereira Rodrigues
Estudo Observacional Académico (Redes Bayesianas) . 2021
Stimulate continous monitoring in personal and physical health.
Investigador Principal: José Alberto da Silva Freitas
Estudo Observacional Académico (INNO4HEALTH) . FCT . 2021
Estudos de avaliação de exequibilidade, usabilidade e utilização de uma app para telemóvel para gestão da diabetes tipo 2.
Investigador Principal: José Alberto da Silva Freitas
Estudo Observacional Académico (FoodFriend) . FCT . 2022
Hospitalização ou vigilância: ação precoce na orientação de pacientes com COVID-19.
Investigador Principal: Pedro Pereira Rodrigues
Estudo Observacional Académico (Orientação) . 2020
Clinical Research Collaboration Severe Heterogenous Asthma Research collaboration, Patient-centered (CRC SHARP).
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Observacional (SHARP) . European Respiratory Society . 2021
Multidimensional phenotyping of severe asthma patients and its impact on disease control and therapeutic response - analysis from the Portuguese Severe Asthma Registry.
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Observacional (RAG-SPP-GSK) . SPPneumologia . 2022
BREATHE - An oBservational, pRimary data study to characterize severe AsThma pHenotypes and assEss disease burden across the EUCAN region.
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Observacional (RAG-AZ-BREATHE) . AstraZeneca . 2022
Identifying problems in the appointment scheduling system of a major Portuguese public hospital - Is there room for improvement?
Investigador Principal: Pedro Pereira Rodrigues
Estudo Clínico Académico (Scheduling system) . 2020
Seroprevalence of SARS-CoV-2 and assessment of epidemiologic determinants in Portuguese municipal workers
Investigador Principal: Bernardo Manuel De Sousa Pinto
Estudo Clínico Académico (SARS-CoV-2) . 2021
Congenital Heart Disease Detection Using Clinical Data and Auscultation Heart Sounds: a Machine Learning Approach
Investigador Principal: Pedro Pereira Rodrigues
Estudo Clínico Académico . 2021
Tendências nas Hospitalizações por Insuficiência Cardíaca durante um Período de Dezasseis Anos: Dados de Abrangência Nacional para Portugal
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico (Hospitalizações IC) . 2022
Efficiency in Spine Care ? Assessing outcomes and costs to inform healthcare improvement
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Académico . 2022
Healthcare Human Resources and Quality Indicators: Approaches to Strengthening Primary Care.
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2022
Use of secondary data, health technology assessment methods and economic modelling applied to penicillin allergy
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Académico . 2020
A machine learning-based approach to support the assessment of clinical coded data quality in the context of Diagnosis-Related Groups classification systems
Investigador Principal: José Alberto da Silva Freitas
Estudo Clínico Académico . 2020
Using different data sources for the identification of asthma patients and those at high risk of adverse outcomes
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Académico . 2020
Phenotypes of Chronic Diseases of the Airways: Towards Multidimensional Data -Driven Profiling
Investigador Principal: João de Almeida Lopes da Fonseca
Estudo Clínico Académico . 2020
Citar a publicação
Portela D,Amaral R,Rodrigues PP,Freitas A,Costa E,Fonseca J,Sousa B. Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal. Health Inf. Manage. J. 2023. 53. (3):p. 174-182. IF:3,200. (2).