Abstract
In the context of precision medicine, multi-omics data integration provides a comprehensive understanding of underlying biological processes and is critical for disease diagnosis and biomarker discovery. One commonly used integration method is early integration through concatenation of multiple dimensionally reduced omics matrices due to its simplicity and ease of implementation. However, this approach is seriously limited by information loss and lack of latent feature interaction. Herein, a novel multi-omics early integration framework (IE-MOIF) based on information enhancement and image representation learning is thus presented to address the challenges. IE-MOIF employs the self-attention mechanism to capture the intrinsic correlations of omics-features, which make it significantly outperform the existing state-of-the-art methods for multi-omics data integration. Moreover, visualizing the attention embedding and identifying potential biomarkers offer interpretable insights into the prediction results. All source codes and model for IE-MOIF are freely available https://github.com/idrblab/IE-MOIF.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
National Natural Science Foundation of China (81872798 & U1909208); Natural Science Foundation of Zhejiang Province (LR21H300001)
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The ROSMAP and BRCA datasets can be freely and openly accessed via https://github.com/txWang/MOGONET. The PRAD and LUSC datasets can be freely and openly accessed via https://xenabrowser.net/datapages. The COVID-19 dataset can be freely and openly accessed via https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp (accession=MSV000085703).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present work are contained in the manuscript.
https://github.com/txWang/MOGONET/tree/main/BRCA
https://github.com/txWang/MOGONET/tree/main/ROSMAP