RT Journal Article SR Electronic T1 Microbial genes outperform species and SNVs as diagnostic markers for Crohn’s disease on multicohort fecal metagenomes empowered by artificial intelligence JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2023.02.09.23285672 DO 10.1101/2023.02.09.23285672 A1 Gao, Sheng A1 Gao, Xiang A1 Zhu, Ruixin A1 Wu, Dingfeng A1 Feng, Zhongsheng A1 Jiao, Na A1 Sun, Ruicong A1 Gao, Wenxing A1 He, Qing A1 Liu, Zhanju A1 Zhu, Lixin YR 2023 UL http://medrxiv.org/content/early/2023/02/10/2023.02.09.23285672.abstract AB Background Dysbiosis of gut microbial community is associated with the pathogenesis of CD and may serve as a promising non-invasive diagnostic tool. We aimed to compare the performances of the microbial markers of different biological levels by conducting a multidimensional analysis on the microbial metagenomes of CD.Methods We collected fecal metagenomic datasets generated from eight cohorts that altogether include 870 CD patients and 548 healthy controls. The microbial alterations in CD patients were assessed at multidimensional levels including species-, gene- and SNV- level, and then diagnostic models were constructed using artificial intelligence algorithm.Results A total of 227 species, 1047 microbial genes and 21877 microbial SNVs were identified that differed between CD and controls. The species-, gene- and SNV- models achieved an average AUC of 0.97, 0.95 and 0.77, respectively. Notably, the gene model exhibited superior diagnostic capability, achieving average AUCs of 0.89 and 0.91 in internal and external validations, respectively. Moreover, the gene model was specific for CD against other microbiome-related diseases. Further, we found that phosphotransferase system (PTS) contributed substantially to the diagnostic capability of the gene model. The outstanding performance of PTS was mainly explained by genes celB and manY, which demonstrated high predictabilities for CD with the metagenomic datasets and was validated in an independent cohort by qRT-PCR analysis.Conclusions Our global metagenomic analysis unravels the multidimensional alterations of the microbial communities in CD, and identifies microbial genes as robust diagnostic biomarkers across geographically and culturally distinct cohorts.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was funded by the National Natural Science Foundation of China (82170542 to RZ, 92251307 to RZ, 32200529 to DW, 82000536 to NJ), the National Key Research and Development Program of China (2021YFF0703700/2021YFF0703702 to RZ), and Guangdong Province "Pearl River Talent Plan" Innovation and Entrepreneurship Team Project (2019ZT08Y464 to LZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Ethics committee/IRB of the Shanghai Tenth People's Hospital, Tongji University, Shanghai (No. 20KT863) gave ethical approval for this work and each participant provided informed consent.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authorsABC.PE.Ppeptide/nickel transport system permease protein;agaFN-acetylgalactosamine PTS system EIIA component;AKR1A1alcohol dehydrogenase (NADP+);ALDHaldehyde dehydrogenase (NAD+);allAureidoglycolate lyase;AUCarea under the ROC curve;CDCrohn ‘s disease;celBcellobiose PTS system EIIC component;CRCcolorectal cancer;EIICenzyme IIC component;ENAEuropean Nucleotide Archive;fliCflagellin;FNNFeedforward neural network;GSEAgene set enrichment analysis;IBDinflammatory bowel disease;impBtype VI secretion system protein ImpB;KOKEGG Orthology;LCliver cirrhosis;LOCOleave-one-cohort-out;maeBmalate dehydrogenase (oxaloacetate-decarboxylating) (NADP+);manYmannose PTS system EIIC component;nirKnitrite reductase (NO-forming);pckAphosphoenolpyruvate carboxykinase (GTP);PDParkinson ‘s disease;PTSphosphotransferase system;ReLUrectified linear unit;rfbJCDP-abequose synthase;ROCreceiver operating characteristic;SHAPSHapley Additive exPlanations;SNVssingle nucleotide variants;sucDsuccinyl-CoA synthetase alpha subunit;T2Dtype-2 diabetes;tcPptoxin coregulated pilus biosynthesis protein P;tmoCtoluene monooxygenase system ferredoxin subunit;trbJtype IV secretion system protein TrbJ;ttuCartrate dehydrogenase/decarboxylase / D-malate dehydrogenase;UCulcerative colitis;WMSwhole metagenome sequencing