Abstract
Background Dysbiosis of gut microbial community is associated with the pathogenesis of CD and may serve as a promising non-invasive diagnostic tool. We aimed to compare the performances of the microbial markers of different biological levels by conducting a multidimensional analysis on the microbial metagenomes of CD.
Methods We collected fecal metagenomic datasets generated from eight cohorts that altogether include 870 CD patients and 548 healthy controls. The microbial alterations in CD patients were assessed at multidimensional levels including species-, gene- and SNV- level, and then diagnostic models were constructed using artificial intelligence algorithm.
Results A total of 227 species, 1047 microbial genes and 21877 microbial SNVs were identified that differed between CD and controls. The species-, gene- and SNV- models achieved an average AUC of 0.97, 0.95 and 0.77, respectively. Notably, the gene model exhibited superior diagnostic capability, achieving average AUCs of 0.89 and 0.91 in internal and external validations, respectively. Moreover, the gene model was specific for CD against other microbiome-related diseases. Further, we found that phosphotransferase system (PTS) contributed substantially to the diagnostic capability of the gene model. The outstanding performance of PTS was mainly explained by genes celB and manY, which demonstrated high predictabilities for CD with the metagenomic datasets and was validated in an independent cohort by qRT-PCR analysis.
Conclusions Our global metagenomic analysis unravels the multidimensional alterations of the microbial communities in CD, and identifies microbial genes as robust diagnostic biomarkers across geographically and culturally distinct cohorts.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was funded by the National Natural Science Foundation of China (82170542 to RZ, 92251307 to RZ, 32200529 to DW, 82000536 to NJ), the National Key Research and Development Program of China (2021YFF0703700/2021YFF0703702 to RZ), and Guangdong Province "Pearl River Talent Plan" Innovation and Entrepreneurship Team Project (2019ZT08Y464 to LZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee/IRB of the Shanghai Tenth People's Hospital, Tongji University, Shanghai (No. 20KT863) gave ethical approval for this work and each participant provided informed consent.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵# Co-first authors.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Abbreviations
- ABC.PE.P
- peptide/nickel transport system permease protein;
- agaF
- N-acetylgalactosamine PTS system EIIA component;
- AKR1A1
- alcohol dehydrogenase (NADP+);
- ALDH
- aldehyde dehydrogenase (NAD+);
- allA
- ureidoglycolate lyase;
- AUC
- area under the ROC curve;
- CD
- Crohn ‘s disease;
- celB
- cellobiose PTS system EIIC component;
- CRC
- colorectal cancer;
- EIIC
- enzyme IIC component;
- ENA
- European Nucleotide Archive;
- fliC
- flagellin;
- FNN
- Feedforward neural network;
- GSEA
- gene set enrichment analysis;
- IBD
- inflammatory bowel disease;
- impB
- type VI secretion system protein ImpB;
- KO
- KEGG Orthology;
- LC
- liver cirrhosis;
- LOCO
- leave-one-cohort-out;
- maeB
- malate dehydrogenase (oxaloacetate-decarboxylating) (NADP+);
- manY
- mannose PTS system EIIC component;
- nirK
- nitrite reductase (NO-forming);
- pckA
- phosphoenolpyruvate carboxykinase (GTP);
- PD
- Parkinson ‘s disease;
- PTS
- phosphotransferase system;
- ReLU
- rectified linear unit;
- rfbJ
- CDP-abequose synthase;
- ROC
- receiver operating characteristic;
- SHAP
- SHapley Additive exPlanations;
- SNVs
- single nucleotide variants;
- sucD
- succinyl-CoA synthetase alpha subunit;
- T2D
- type-2 diabetes;
- tcPp
- toxin coregulated pilus biosynthesis protein P;
- tmoC
- toluene monooxygenase system ferredoxin subunit;
- trbJ
- type IV secretion system protein TrbJ;
- ttuC
- artrate dehydrogenase/decarboxylase / D-malate dehydrogenase;
- UC
- ulcerative colitis;
- WMS
- whole metagenome sequencing