Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects

Data de publicação: 01/10/2024

Autores da FMUP

José Alberto Da Silva Freitas

Autor

Participantes de fora da FMUP

Dong, Tim
Oronti, Iyabosola Busola
Sinha, Shubhra
Zhai, Bing
Chan, Jeremy
Fudulu, Daniel P.
Caputo, Massimo
Angelini, Gianni D.

Unidades de investigação

Laboratório Associado RISE- Rede de Investigação em Saúde

Investigador

Fernando Carlos De Landér Schmitt
Medicina da Comunidade, Informação e Decisão em Saúde
RISE-Health

Investigador

Fernando Carlos De Landér Schmitt

Abstract

Background: Ensemble tree-based models such as Xgboost are highly prognostic in cardiovascular medicine, as measured by the Clinical Effectiveness Metric (CEM). However, their ability to handle correlated data, such as hospital-level effects, is limited. Objectives: The aim of this work is to develop a binary-outcome mixed-effects Xgboost (BME) model that integrates random effects at the hospital level. To ascertain how well the model handles correlated data in cardiovascular outcomes, we aim to assess its performance and compare it to fixed-effects Xgboost and traditional logistic regression models. Methods: A total of 227,087 patients over 17 years of age, undergoing cardiac surgery from 42 UK hospitals between 1 January 2012 and 31 March 2019, were included. The dataset was split into two cohorts: training/validation (n = 157,196; 2012-2016) and holdout (n = 69,891; 2017-2019). The outcome variable was 30-day mortality with hospitals considered as the clustering variable. The logistic regression, mixed-effects logistic regression, Xgboost and binary-outcome mixed-effects Xgboost (BME) were fitted to both standardized and unstandardized datasets across a range of sample sizes and the estimated prediction power metrics were compared to identify the best approach. Results: The exploratory study found high variability in hospital-related mortality across datasets, which supported the adoption of the mixed-effects models. Unstandardized Xgboost BME demonstrated marked improvements in prediction power over the Xgboost model at small sample size ranges, but performance differences decreased as dataset sizes increased. Generalized linear models (glms) and generalized linear mixed-effects models (glmers) followed similar results, with the Xgboost models also excelling at greater sample sizes. Conclusions: These findings suggest that integrating mixed effects into machine learning models can enhance their performance on datasets where the sample size is small.

Dados da publicação

ISSN/ISSNe:: 2306-5354, 2306-5354
Tipo:: Article
Páginas:: -
DOI:: 10.3390/bioengineering11101039
PubMed:: 39451414
Link para outro recurso:: www.scopus.com

Documentos

Não há documentos

Métricas

Filiações

Dong, Tim:: Univ Bristol, Bristol Heart Inst, Translat Hlth Sci, Bristol BS2 8HW, England
Oronti, Iyabosola Busola:: Univ Warwick, Sch Engn, Dept Stat, Stat & Risk Unit AS&RU, Coventry CV4 7AL, England
Sinha, Shubhra:: Univ Bristol, Bristol Heart Inst, Translat Hlth Sci, Bristol BS2 8HW, England
Freitas, Alberto:: Universidade. Univ Porto, Fac Med, P-4200319 Porto, Portugal
Zhai, Bing:: Northumbria Univ, Newcastle Business Sch, Newcastle Upon Tyne NE1 8ST, England
Chan, Jeremy:: Univ Bristol, Bristol Heart Inst, Translat Hlth Sci, Bristol BS2 8HW, England
Fudulu, Daniel P.:: Univ Bristol, Bristol Heart Inst, Translat Hlth Sci, Bristol BS2 8HW, England
Caputo, Massimo:: Univ Bristol, Bristol Heart Inst, Translat Hlth Sci, Bristol BS2 8HW, England
Angelini, Gianni D.:: Univ Bristol, Bristol Heart Inst, Translat Hlth Sci, Bristol BS2 8HW, England