Abstract
Background Systemic sclerosis (SSc) is an autoimmune disease with high mortality with lung involvement being the primary cause of death. Progressive interstitial lung disease (ILD) leads to a decline in lung function (forced vital capacity, FVC% predicted) with risk of respiratory failure. These patients could benefit from an early and tailored pharmacological intervention. However, up to date, tools for prediction of individual FVC changes are lacking. In this paper, we aimed at developing a trustworthy machine learning system that is able to guide SSc management by providing not only robust FVC predictions, but also uncertainty quantification (i.e. the degree of certainty of model prediction) as well as similarity-based explainability for any patient P (i.e. a list of past SSc patients with similar FVC trajectories like P). We further aimed to identify the key clinical factors influencing the model’s predictions and to use model-guided data representation to identify SSc patients with similar sequential FVC measurements.
Methods We trained and evaluated machine learning (ML) models to predict SSc-ILD trajectory as measured by FVC% predicted values using the international SSc database managed by the European Scleroderma Trials and Research group (EUSTAR), which comprises clinical, laboratory and functional parameters. EUSTAR records patients’ data in annual assessment visits, and, given any visit, we aimed at predicting the FVC value of a patient’s subsequent visit, taking into account all available patient data (i.e. baseline and follow-up visit data up to the time point where we make the prediction). For training of our ML models, we included 2220 SSc patients that had at least 3 recorded visits in the EUSTAR database, were at least 18 years old, had confirmed ILD and sufficient clinical documentation. We developed sequential ML models implementing the attentive neural process formalism with either a recurrent (ANP RNN) or transformer encoder (ANP transformer) architecture. We compared these architectures with baseline sequential models including gated recurrent neural networks (RNNs) and multi-head self-attention transformer-based networks. Baseline non-sequential models included tree-based models such as gradient boosting trees, and regression-based models with varying regularization schemes. Our experiments used stratified 5-fold cross-validation to train and test the models using the average root mean squared error (RMSE), weighted RMSE, and mean absolute error (MAE) as performance metrics. We computed the coverage and Winkler score for uncertainty quantification, SHAP values for grading the input features importance and used the data embeddings of the ANP architectures for both similarity-based explainability and the identification of similar SSc patient journeys.
Results Patients’ baseline FVC scores ranged from 22 to 150% predicted with a mean (SD) of 90.53% predicted (21.52). Our deep learning models showed better performance for FVC forecasting, compared to tree- and regression-based models. The top performing ANP RNN architecture was able to closely model future FVC values with average (SD) performance of 8.240 (0.168) weighted RMSE and 6.94 (0.190) MAE that was further used as feature generator for a logistic regression trained to predict a FVC% decline of at least 10% points achieving 0.704 AUC score. In comparison, a naïve baseline using the mean FVC value as a predictor achieved much lower FVC forecasting capabilities, with 18.718 (0.317) weighted RMSE, and 17.619 (0.599) MAE. SHAP value analysis indicated that prior FVC measurements, diffusion of carbon monoxide (DLCO) values, skin involvement, age, anti-centromere positivity, dyspnea and CRP-elevation contributed most to deep-learning-based FVC predictions. Regarding uncertainty quantification, ANP RNN achieved 79% coverage (i.e. the model would provide uncertainty estimates that included the true future FVC value in 79 out of 100 predictions) out of the box, and 90% using an additional conformal prediction module with an corresponding Winkler score of 892 (indicating the width of the uncertainty estimate plus penalty for mistakes), smaller than any other model at the same coverage level. We further demonstrate how the data abstraction provided by the ANP RNN model (embeddings) allows for deriving similar patient trajectories (for similarity-based explanation).
Conclusions Our study demonstrates the feasibility of FVC forecasting and thus the ability to predict ILD trajectories in individual SSc patients using deep learning. We show that model predictions can be paired with uncertainty quantification and similarity-based model explainability, which are crucial elements for deploying trustworthy ML algorithms. Our study is thus an important first step towards reliable automated ILD trajectory (i.e. FVC%) prediction system with potential clinical utility.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work is funded by the Swiss National Science Foundation (project number 201184)
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
European Scleroderma Trials and Research group (EUSTAR) gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
ahmed.allam{at}uzh.ch; aronnorbert.horvath{at}uzh.ch; matthias.dittberner{at}uzh.ch; cecileclaire.trottet{at}uzh.ch;
elise.siegert{at}charite.de;
vincent.sobanski{at}univ-lille.fr;
carreira{at}h12o.es;
dagna.lorenzo{at}unisr.it;
vanessa.smith{at}ugent.be;
serena.guiducci{at}unifi.it;
nico.hunzelmann{at}uni-koeln.de;
a.m.hoffmann-vold{at}medisin.uio.no;
simona.truglia{at}uniroma1.it;
ana.gherghe{at}gmail.com;
svetlana.agachi{at}usmf.md;
Camillo.Ribi{at}chuv.ch;
michael.krauthammer{at}uzh.ch;
britta.maurer{at}insel.ch;
Data Availability
The raw dataset is owned by the EUSTAR group, and may be obtained by request after the approval and permission from EUSTAR board.
Acronyms
- DLCO
- Diffusion Capacity of the Lungs for Carbon Monoxide
- FVC
- Forced Vital Capacity
- SSc
- Systemic Sclerosis
- ILD
- Interstitial Lung Disease
- ANP
- Attentive Neural Processes
- RMSE
- Root mean-squared error
- NLL
- Negative Log-Likelihood
- MAE
- Mean absolute error