Abstract
Machine learning (ML) models hold promise in precision medicine by enabling personalized predictions based on high-dimensional biomedical data. Yet, transitioning models from prototyping to clinical applications poses challenges, with confounders being a significant hurdle by undermining the reliability, generalizability, and interpretability of ML models. Using hand grip strength (HGS) prediction from neuroimaging data from the UK Biobank as a case study, we demonstrate that confounder adjustment can have a greater impact on model performance than changes in features or algorithms. An ubiquitous and necessary approach to confounding is by statistical means. However, a pure statistical viewpoint overlooks the biomedical relevance of candidate confounders, i.e. their biological link and conceptual similarity to actual variables of interest. Problematically, this can lead to biomedically not-meaningful confounder-adjustment, which limits the usefulness of resulting models, both in terms of biological insights and clinical applicability. To address this, we propose a two-dimensional framework, the Confound Continuum, that combines both statistical association and biomedical relevance, i.e. conceptual similarity, of a candidate confounder. The evaluation of conceptual similarity assesses on a continuum how much two variables overlap in their biological meaning, ranging from negligible links to expressing the same underlying biology. It thereby acknowledges the gradual nature of the biological link between candidate confounders and a predictive task. Our framework aims to create awareness for the imperative need to complement statistical confounder considerations with biomedical, conceptual domain knowledge (without going into causal considerations) and thereby offers a means to arrive at meaningful and informed confounder decisions. The position of a candidate confoudner in the two-dimensional grid of the Confound Continuum can support informed and context-specific confounder decisions and thereby not only enhance biomedical validity of predictions but also support translation of predictive models into clinical practice.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Project-ID 431549029, Collaborative Research Centre CRC1451 on motor performance, project B05.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The used human data of this study was retrieved from the The UK Biobank. The UK Biobank has approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB) (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us/ethics) and is allowed to obtain and disseminate data and samples from the participants. The UK Biobank ethical regulations cover the work in this manuscript. All participants had given written informed consent.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Clarification of essential content; Shortening; More focus on key aspects; Updating of figures; Updating of supplementary material
1 Indication of absolute correlation value for comparability of association strengths.
Data Availability
All individual data used in this study were obtained from the UK Biobank, a major biomedical database (www.ukbiobank.ac.uk) under application number 41655, and are available to all approved UK Biobank researchers.