Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data

Data de publicação:

Autores da FMUP

  • Pedro Pereira Rodrigues

    Autor

Participantes de fora da FMUP

  • Pereira, RC
  • Abreu, P.

Unidades de investigação

Abstract

Missing data can pose severe consequences in critical contexts, such as clinical research based on routinely collected healthcare data. This issue is usually handled with imputation strategies, but these tend to produce poor and biased results under the Missing Not At Random (MNAR) mechanism. A recent trend that has been showing promising results for MNAR is the use of generative models, particularly Variational Autoencoders. However, they have a limitation: the imputed values are the result of a single sample, which can be biased. To tackle it, an extension to the Variational Autoencoder that uses a partial multiple imputation procedure is introduced in this work. The proposed method was compared to 8 state-of-the-art imputation strategies, in an experimental setup with 34 datasets from the medical context, injected with the MNAR mechanism (10% to 80% rates). The results were evaluated through the Mean Absolute Error, with the new method being the overall best in 71% of the datasets, significantly outperforming the remaining ones, particularly for high missing rates. Finally, a case study of a classification task with heart failure data was also conducted, where this method induced improvements in 50% of the classifiers.

Dados da publicação

ISSN/ISSNe:
2168-2208, 2168-2194

IEEE Journal of Biomedical and Health Informatics  Institute of Electrical and Electronics Engineers Inc.

Tipo:
Article
Páginas:
4218-4227
Link para outro recurso:
www.scopus.com

Citações Recebidas na Web of Science: 12

Citações Recebidas na Scopus: 18

Documentos

  • Não há documentos

Métricas

Filiações

Filiações não disponíveis

Keywords

  • Medical services; Task analysis; Data models; Principal component analysis; Neural networks; Mathematical models; Mice; Healthcare data; missing data; missing not at random; partial multiple imputation; variational autoencoder

Proyectos asociados

Predição e análise do tipo de parto em gestantes portuguesas através de Redes Bayesianas.

Investigador Principal: Pedro Pereira Rodrigues

Estudo Observacional Académico (Redes Bayesianas) . 2021

Hospitalização ou vigilância: ação precoce na orientação de pacientes com COVID-19.

Investigador Principal: Pedro Pereira Rodrigues

Estudo Observacional Académico (Orientação) . 2020

Identifying problems in the appointment scheduling system of a major Portuguese public hospital - Is there room for improvement?

Investigador Principal: Pedro Pereira Rodrigues

Estudo Clínico Académico (Scheduling system) . 2020

Congenital Heart Disease Detection Using Clinical Data and Auscultation Heart Sounds: a Machine Learning Approach

Investigador Principal: Pedro Pereira Rodrigues

Estudo Clínico Académico . 2021

Citar a publicação

Partilhar a publicação