Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal

Data de publicação: Data Ahead of Print:

Autores da FMUP

  • Diana Leite Portela Silva

    Autor

  • Pedro Pereira Rodrigues

    Autor

  • José Alberto Da Silva Freitas

    Autor

  • João De Almeida Lopes Da Fonseca

    Autor

  • Bernardo Manuel De Sousa Pinto

    Autor

Participantes de fora da FMUP

  • Amaral, R.
  • Costa, E.

Unidades de investigação

Abstract

Background Quantifying and dealing with lack of consistency in administrative databases (namely, under-coding) requires tracking patients longitudinally without compromising anonymity, which is often a challenging task. Objective This study aimed to (i) assess and compare different hierarchical clustering methods on the identification of individual patients in an administrative database that does not easily allow tracking of episodes from the same patient; (ii) quantify the frequency of potential under-coding; and (iii) identify factors associated with such phenomena. Method We analysed the Portuguese National Hospital Morbidity Dataset, an administrative database registering all hospitalisations occurring in Mainland Portugal between 2011-2015. We applied different approaches of hierarchical clustering methods (either isolated or combined with partitional clustering methods), to identify potential individual patients based on demographic variables and comorbidities. Diagnoses codes were grouped into the Charlson an Elixhauser comorbidity defined groups. The algorithm displaying the best performance was used to quantify potential under-coding. A generalised mixed model (GML) of binomial regression was applied to assess factors associated with such potential under-coding. Results We observed that the hierarchical cluster analysis (HCA) + k-means clustering method with comorbidities grouped according to the Charlson defined groups was the algorithm displaying the best performance (with a Rand Index of 0.99997). We identified potential under-coding in all Charlson comorbidity groups, ranging from 3.5% (overall diabetes) to 27.7% (asthma). Overall, being male, having medical admission, dying during hospitalisation or being admitted at more specific and complex hospitals were associated with increased odds of potential under-coding. Discussion We assessed several approaches to identify individual patients in an administrative database and, subsequently, by applying HCA + k-means algorithm, we tracked coding inconsistency and potentially improved data quality. We reported consistent potential under-coding in all defined groups of comorbidities and potential factors associated with such lack of completeness. Conclusion Our proposed methodological framework could both enhance data quality and act as a reference for other studies relying on databases with similar problems.

Dados da publicação

ISSN/ISSNe:
1833-3575, 1833-3583

Health Information Management Journal  SAGE Publications Inc.

Tipo:
Article
Páginas:
174-182
Link para outro recurso:
www.scopus.com

Documentos

  • Não há documentos

Métricas

Filiações mostrar / ocultar

Keywords

  • data quality; public health informatics; medical records; evaluation; health information management; under-coding; comorbidities; administrative database; clustering algorithms; unsupervised machine learning

Proyectos asociados

Prevalence and Characterisation of Asthma Patients According to Disease Severity in Portugal (EPI-ASTHMA) - NCT05169619

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Observacional (EPI-ASTHMA) . AstraZeneca . 2021

Effect of a Mobile App on Improving Asthma Control in Adolescents and Adults With Persistent Asthma: A Pilot Randomized Multicentre, Superiority Clinical Trial (mINSPIRERS) - NCT05129527

Investigador Principal: João de Almeida Lopes da Fonseca

Ensaio Clínico Académico (mINSPIRERS) . 2021

Utilização em estudos observacionais do Registo de Asma Grave Portugal.

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Observacional Académico (RAG) . 2020

Predição e análise do tipo de parto em gestantes portuguesas através de Redes Bayesianas.

Investigador Principal: Pedro Pereira Rodrigues

Estudo Observacional Académico (Redes Bayesianas) . 2021

Stimulate continous monitoring in personal and physical health.

Investigador Principal: José Alberto da Silva Freitas

Estudo Observacional Académico (INNO4HEALTH) . FCT . 2021

Estudos de avaliação de exequibilidade, usabilidade e utilização de uma app para telemóvel para gestão da diabetes tipo 2.

Investigador Principal: José Alberto da Silva Freitas

Estudo Observacional Académico (FoodFriend) . FCT . 2022

Hospitalização ou vigilância: ação precoce na orientação de pacientes com COVID-19.

Investigador Principal: Pedro Pereira Rodrigues

Estudo Observacional Académico (Orientação) . 2020

Clinical Research Collaboration Severe Heterogenous Asthma Research collaboration, Patient-centered (CRC SHARP).

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Observacional (SHARP) . European Respiratory Society . 2021

Multidimensional phenotyping of severe asthma patients and its impact on disease control and therapeutic response - analysis from the Portuguese Severe Asthma Registry.

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Observacional (RAG-SPP-GSK) . SPPneumologia . 2022

BREATHE - An oBservational, pRimary data study to characterize severe AsThma pHenotypes and assEss disease burden across the EUCAN region.

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Observacional (RAG-AZ-BREATHE) . AstraZeneca . 2022

Identifying problems in the appointment scheduling system of a major Portuguese public hospital - Is there room for improvement?

Investigador Principal: Pedro Pereira Rodrigues

Estudo Clínico Académico (Scheduling system) . 2020

Seroprevalence of SARS-CoV-2 and assessment of epidemiologic determinants in Portuguese municipal workers

Investigador Principal: Bernardo Manuel De Sousa Pinto

Estudo Clínico Académico (SARS-CoV-2) . 2021

Congenital Heart Disease Detection Using Clinical Data and Auscultation Heart Sounds: a Machine Learning Approach

Investigador Principal: Pedro Pereira Rodrigues

Estudo Clínico Académico . 2021

Tendências nas Hospitalizações por Insuficiência Cardíaca durante um Período de Dezasseis Anos: Dados de Abrangência Nacional para Portugal

Investigador Principal: José Alberto da Silva Freitas

Estudo Clínico Académico (Hospitalizações IC) . 2022

Efficiency in Spine Care ? Assessing outcomes and costs to inform healthcare improvement

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Académico . 2022

Healthcare Human Resources and Quality Indicators: Approaches to Strengthening Primary Care.

Investigador Principal: José Alberto da Silva Freitas

Estudo Clínico Académico . 2022

Use of secondary data, health technology assessment methods and economic modelling applied to penicillin allergy

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Académico . 2020

A machine learning-based approach to support the assessment of clinical coded data quality in the context of Diagnosis-Related Groups classification systems

Investigador Principal: José Alberto da Silva Freitas

Estudo Clínico Académico . 2020

Using different data sources for the identification of asthma patients and those at high risk of adverse outcomes

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Académico . 2020

Phenotypes of Chronic Diseases of the Airways: Towards Multidimensional Data -Driven Profiling

Investigador Principal: João de Almeida Lopes da Fonseca

Estudo Clínico Académico . 2020

Citar a publicação

Partilhar a publicação