Abstract
Background Primary knee osteoarthritis (KOA) is a heterogeneous disease with clinical and molecular contributors. Biofluids contain microRNAs and metabolites that can be measured by omic technologies. Deep learning captures complex non-linear associations within multimodal data but, to date, has not been used for multi-omic-based endotyping of KOA patients. We developed a novel multimodal deep learning framework for clustering of multi-omic data from three subject-matched biofluids to identify distinct KOA endotypes and classify one-year post-total knee arthroplasty (TKA) pain/function responses.
Materials and Methods In 414 KOA patients, subject-matched plasma, synovial fluid and urine were analyzed by microRNA sequencing or metabolomics. Integrating 4 high-dimensional datasets comprising metabolites from plasma (n=151 features), along with microRNAs from plasma (n=421), synovial fluid (n=930), or urine (n=1225), a multimodal deep learning variational autoencoder architecture with K-means clustering was employed. Features influencing cluster assignment were identified and pathway analyses conducted. An integrative machine learning framework combining 4 molecular domains and a clinical domain was then used to classify WOMAC pain/function responses post-TKA within each cluster.
Findings Multimodal deep learning-based clustering of subjects across 4 domains yielded 3 distinct patient clusters. Feature signatures comprising microRNAs and metabolites across biofluids included 30, 16, and 24 features associated with Clusters 1-3, respectively. Pathway analyses revealed distinct pathways associated with each cluster. Integration of 4 multi-omic domains along with clinical data improved response classification performance, with Cluster 3 achieving AUC=0·879 for subject pain response classification and Cluster 2 reaching AUC=0·808 for subject function response, surpassing individual domain classifications by 12% and 15% respectively.
Interpretation We have developed a deep learning-based multimodal clustering model capable of integrating complex multi-fluid, multi-omic data to assist in KOA patient endotyping and test outcome response to TKA surgery.
Funding Canada Research Chairs Program, Tony and Shari Fell Chair, Campaign to Cure Arthritis, University Health Network Foundation.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Funding for this project was provided by the Canada Research Chairs Program (MK), Tony and Shari Fell Platinum Chair in Arthritis Research (MK), Campaign to Cure Arthritis, University Health Network Foundation. AVP is supported by the Arthritis Society Canada STAR Award-20-0000000012 and YRR is supported by J. Bernard Gosevitz Chair in Arthritis Research at University Health Network. Computational analysis was supported in part by funding from Natural Sciences and Engineering Research Council of Canada (NSERC RGPIN-2024-04314), Canada Foundation for Innovation (CFI #225404, #30865), and Ontario Research Funds (RDI #34876, RE010-020). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Research Ethics Board of the University Health Network, Toronto, ON gave approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
De-identified subject primary microRNA sequencing datasets are available on the Gene Expression Omnibus under accession number GSE222979. Software code and the dataset of processed miRNA counts, metabolite concentrations and demographic, anthropometric and clinical questionnaire responses used in this study is available at GitHub.
https://github.com/divya031090/DeepLearning_KOA
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE222979