PT - JOURNAL ARTICLE AU - Liu, Yuntian AU - Herrin, Jeph AU - Huang, Chenxi AU - Khera, Rohan AU - Dhingra, Lovedeep Singh AU - Dong, Weilai AU - Mortazavi, Bobak J. AU - Krumholz, Harlan M. AU - Lu, Yuan TI - Non-exercise Machine Learning Models for Maximal Oxygen Uptake Prediction in National Population Surveys AID - 10.1101/2022.09.30.22280471 DP - 2022 Jan 01 TA - medRxiv PG - 2022.09.30.22280471 4099 - http://medrxiv.org/content/early/2022/10/04/2022.09.30.22280471.short 4100 - http://medrxiv.org/content/early/2022/10/04/2022.09.30.22280471.full AB - Background Maximal oxygen uptake (VO2 max), an indicator of cardiorespiratory fitness (CRF), requires exercise testing and, as a result, is rarely ascertained in large-scale population-based studies. Non-exercise algorithms are cost-effective methods to estimate VO2 max, but the existing models have limitations in generalizability and predictive power. This study aims to improve the non-exercise algorithms using machine learning (ML) methods and data from U.S. national population surveys.Methods We used the 1999-2004 data from the National Health and Nutrition Examination Survey (NHANES), in which a submaximal exercise test produced an estimate of the VO2max. We applied multiple supervised ML algorithms to build two models: a parsimonious model that used variables readily available in clinical practice, and an extended model that additionally included more complex variables from more Dual-Energy X-ray Absorptiometry (DEXA) and standard laboratory tests. We used Shapley additive explanation (SHAP) to interpret the new model and identify the key predictors. For comparison, existing non-exercise algorithms were applied unmodified to the testing set.Results Among the 5,668 NHANES participants included in the final study population, the mean age was 32.5 years and 49.9% were women. Light Gradient Boosting Machine (LightGBM) had the best performance across multiple types of supervised ML algorithms. Compared with the best existing non-exercise algorithms that could be applied in NHANES, the parsimonious LightGBM model (RMSE: 8.51 ml/kg/min [95% CI: 7.73 -9.33]) and the extended model (RMSE: 8.26 ml/kg/min [95% CI: 7.44 -9.09]) significantly reducing the error by 15% (P <0.01) and 12% (P<0.01 for both), respectively.Conclusion Our non-exercise ML model provides a more accurate prediction of VO2 max for NHANES participants than existing non-exercise algorithms.What is KnownAlthough cardiorespiratory fitness is recognized as an important marker of cardiovascular health, it is not routinely measured because of the time and resources required to perform exercise tests.Non-exercise algorithms are cost-effective alternatives to estimate cardiorespiratory fitness, but the existing models are restricted in generalizability and predictive power.What the Study AddsWe improve non-exercise algorithms for cardiorespiratory fitness prediction using advanced ML methods and a more comprehensive and representative data source from U.S. national population surveys.More health factors that are associated with cardiorespiratory fitness are newly identified.Nationally representative estimates for cardiorespiratory fitness in the U.S. over the recent 20 years are generated.Competing Interest StatementIn the past three years, Harlan Krumholz received expenses and/or personal fees from UnitedHealth, Element Science, Aetna, Reality Labs, Tesseract/4Catalyst, F-Prime, the Siegfried and Jensen Law Firm, Arnold and Porter Law Firm, and Martin/Baughman Law Firm. He is a co-founder of Refactor Health and HugoHealth, and is associated with contracts, through Yale New Haven Hospital, from the Centers for Medicare & Medicaid Services and through Yale University from Johnson & Johnson. Bobak Mortazavi received expenses and/or personal fees from HugoHealth, as a consultant. Dr. Khera receives support from the National Heart, Lung, and Blood Institute of the National Institutes of Health under award, 1K23HL153775, and is a founder of Evidence2Health, a precision health and digital health analytics platform. The other co-authors report no potential competing interests.Funding StatementThis study did not receive any fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study used (or will use) ONLY openly available human data that were originally located at:https://wwwn.cdc.gov/nchs/nhanes/Default.aspxI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced are available online at https://wwwn.cdc.gov/nchs/nhanes/Default.aspx https://wwwn.cdc.gov/nchs/nhanes/Default.aspx CRFCardiorespiratory fitnessVO2maxMaximal oxygen uptakeCPXCardiopulmonary exercise testingMLMachine learningNHANESNational Health and Nutrition Examination SurveySTROBEStrengthening the Reporting of Observational Studies in EpidemiologyCOVID-19coronavirus disease 2019MECMobile Examination CenterKNNK-Nearest NeighborsLASSOLeast Absolute Shrinkage and Selection OperatorSVRSupport Vector RegressionRFRandom ForestGBDTGradient Boosting decision treeXGBoostExtreme Gradient BoostingLightGBMLight Gradient Boosting MachineSHAPShapley additive explanation