ABSTRACT
Objective We aimed to investigate the relation of blood bacterial DNA load and profiling with intestinal adenoma (IA) and colorectal cancer (CRC) patients.
Design We performed 16S rRNA gene analysis of blood from 100 incident histologically confirmed CRC cases, 100 IA and 100 healthy subjects, matched to cases by centre, sex and age. Bacterial load was analysed using multiple conditional logistic regression. Differences in terms of abundance of bacteria between groups were estimated through analysis based on negative binomial distribution normalization. Random Forest was applied to predict the group assignment.
Results We found an overrepresentation of blood 16S rRNA gene copies in colon cancer as compared to tumor-free controls (IA and healthy subjects). The odds ratio of colon cancer for the highest versus the lowest three quintiles of gene copies was 2.62. (95% confidence interval=1.22-5.65). No difference was found for rectal cancer and IA. For high 16S rRNA, community diversity was higher in colon cancers than controls. CRC cases had an enrichment of Peptostreptococcaceae and Acetobacteriaceae and a reduced abundance of Bacteroidaceae, Lachnospiraceae, and Ruminococcaceae. Identified variables predicted CRC from control and IA patients with an accuracy of 0.70.
Conclusion Colon cancer patients had a higher DNA bacterial load and a different bacterial profiling as compared to healthy subjects, IA and rectal cancers, indicating a higher passage of bacteria from gastrointestinal tract to bloodstream. Further studies are needed to confirm this result and exploit it to conceive new non-invasive techniques for an early diagnosis of CRC.
INTRODUCTION
Colorectal cancer (CRC) is the 3rd more common cancer, and ranks second in terms of mortality, worldwide [1]. Although mortality trends have been favourable in Europe during the last decades, with a rate of 15.4/100 000 in men and 8.6/100 000 in women in 2020, in most eastern European countries CRC mortality is still increasing [2, 3, 4].
CRC derives from a sequential accumulation of genetic alterations that involves the transition from normal mucosa to pre-malignant lesions with progression to intestinal adenoma (IA) and invasive CRC [5]. Inflammation and immunity are inextricably linked to all phases of CRC development [6]. Numerous studies identified chronic intestinal inflammation as a risk factor for CRC, as also confirmed by the increase of the incidence of this tumour in inflammatory bowel disease patients [7, 8]. IA and CRC have also been associated with increased circulating inflammation [9, 10, 11, 12, 13], and more recently with dysfunction of the gut mucosal barrier [14]. The community structure of the intestinal microbial ecosystem influences the risk of IA and CRC [14, 15, 16, 17]. There is evidence that in gut mucosal microbiota Fusobacterium spp. were increased in CRC patients [15, 16, 17]; Bacteroides fragilis and the genus Porphyromonas have been also associated with an increased risk of CRC [16, 17], while the increase in members of the genus Escherichia was associated with a higher IA risk [17]. Numerous studies analyzed fecal microbiome, not always reporting consistent results [15, 18, 19, 20], but recent meta-analyses including geographically and technically different shotgun metagenomic studies showed higher fecal microbiota richness in CRC cases as compared to controls [21], and identified a set of bacterial taxa significantly enriched in CRC cases [21, 22, 23].
Increasing evidence has pointed out the presence of bacterial DNA in blood [24, 25]. Microbial signatures have been reported for gut dysbiosis among diabetic subjects and for liver fibrosis in obese patients [26, 27]. Microbiome analysis of blood has also been proposed as a tool to discriminate between cancer patients and healthy subjects [28]. In particular, differences in circulating bacterial factors can occur in CRC patients, in whom epithelial barrier dysfunction can lead to increased intestinal permeability, plausibly resulting into a greater bacterial translocation from the gastrointestinal tract to bloodstream in IA and/or CRC. Some differences in relative abundance of the bacterial DNA in plasma between healthy, IA and CRC subjects have been reported in a case-control study from China, including 57 participants, but differences in terms of total bacterial load have not been analyzed yet [29].
In this study, we aimed to compare the load of bacterial DNA in blood and taxonomic profile between CRC, IA and healthy controls, using data from an ad hoc developed study.
MATERIALS AND METHODS
We conducted an observational study between May 2017 and November 2019 in the metropolitan area of Milan, Italy. Recruitment centres included two general hospitals of Milan: the Digestive and Interventional Endoscopy Unit, Azienda Socio Sanitaria Territoriale (ASST) Grande Ospedale Metropolitano Niguarda, the coordinator centre, and the Gastroenterology and Endoscopy Unit, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ca’ Granda Ospedale Maggiore Policlinico. Both hospitals included a colonoscopy screening referral centre of the CRC screening program, managed by Health Protection Agency.
CRC cases were enrolled together with non-cancer adenomatous polyps and healthy controls, frequency-matched with cases by center, age ± 5 years and sex. Trained interviewers recruited study participants among eligible outpatients or inpatients who were scheduled for a colonoscopy, including patients referred for the CRC screening program. Excluded criteria were: 1) colonoscopy in the last 5 days; 2) reported previous cancers; 3) inflammatory chronic bowel diseases, 4) liver or kidney failure (creatinine ≥1,7 mg/dl, dialysis); 5) NYHA grade III or IV heart failure; 6) primary or secondary immunodeficiency; 7) recent hospitalization (1 month) for immune, inflammatory, autoimmune diseases, or bacterial/viral infections, 8) blood transfusions in the previous year; 9) celiac disease and a relevant diet modification during the last month. IA and control subjects with previous colonoscopy/sigmoidoscopy with endoscopic resection of a colonic lesion were also excluded.
A total of 620 patients were contacted by the trained interviewers. Of these, around 25% did not meet the eligibility criteria and less than 2% refused to participate in the study. Furthermore, 49 subjects were excluded due to some inaccuracies in the enrolment procedures and 42 due to previous cancers or to other ineligible conditions that were discovered after further data check. From the remaining 347 patients, the final sample after matching included 300 subjects: 100 CRC cases, 100 IA patients and 100 healthy controls.
Colonoscopy and histological examinations were revised by two pathologists who determined CRC cases and their clinical characteristics (e.g. stage, lymph node), as well as IA and their major features (e.g. morphology, measure), and healthy controls.
Cases included 100 incident, histologically confirmed CRC: 62 men and 38 women (mean age: 67, range 31–85 years). Of these, 21 were in the right colon (International Classification of Diseases, 10th Edition, ICD-10, C18.0, C18.2, C18.3), 12 in the transverse colon, in the splenic flexure, and in the descending colon (ICD-10, C18.4, C18.5, C18.6), 17 in the sigmoid colon (ICD-10, C18.7), and 50 in the rectum, including the rectosigmoid junction (ICD-10, C20, C19.9).
One hundred IA patients (mean age: 66, range 34–84 years) and 100 healthy controls (mean age: 66, range 26–85 years) were included.
The study protocol was revised and approved by the ethical committees of the hospitals involved in data recruitment: ASST Grande Ospedale Metropolitano Niguarda (No. 477-112016) and Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico (No. 742-2017).
Interview
After written consent, a face to face interview was performed. The questionnaire included information on socio-demographics, smoking habits, physical activity, anthropometric measures, occupational exposures, medical history, selected drug and supplement use, family history of cancer, sleeping habits and dental care. A food frequency questionnaire (FFQ), based on an Italian reproducible and valid FFQ [30, 31], was used to assess the past patients’ usual diet.
Blood collection
Blood samples were collected before the colonoscopy in order to avoid possible bacteria contamination after the colonscope insertion and to keep the same setting for each participant. An aliquot of 7 ml of blood was collected in a tube with EDTA and an aliquot of 3 ml in a blank (without anticoagulant) tube. Three microvials of 1 ml from EDTA tube were immediately stored at −80 ºC for the microbiomic analysis. The remaining blood was processed and centrifuged, and then stored at −80º C. At the end of data recruitment, blood samples were sent to Vaiomer SAS, Labège, France, for the analysis of the microbiome. To avoid the possibility that differences between groups could be due to experimental biases, the operators were blind to the group assignment and the samples were analyzed in the same experiment, with the same reagent batches and manipulator, in order to keep the signal to noise ratio optimal and to reduce technical variability.
DNA extraction, qPCR experiments and sequencing of 16S rRNA gene amplicons
Bacterial DNA quantification and sequencing reactions were performed by Vaiomer SAS using an optimized blood-specific technique [24, 32, 33]. DNA was extracted from 0.25 ml of whole blood and collected in a final 50 μl extraction volume. Real-time polymerase chain reaction (PCR) amplification was performed using panbacterial primers EUBF 5’-TCCTACGGGAGGCAGCAGT-3’ and EUBR 5’-GGACTACCAGGGTATCTAATCCTGTT-3’ [34], which target the V3-V4 hypervariable regions of the bacterial 16S rRNA gene with 100% specificity (i.e., no eukaryotic, mitochondrial, or Archaea DNA is targeted) and high sensitivity (16S rRNA of more than 95% of bacteria in Ribosomal Database Project database are amplified). The abundance of the 16S rRNA gene in blood samples was measured by qPCR in triplicate and normalized using a plasmid-based standard range. The results were reported as number of copies of 16S rRNA gene per µl of blood. DNA from whole blood was also used for 16S rRNA gene taxonomic profiling using MiSeq Illumina technology using the 2 x 300 paired-end MiSeq kit V3. The samples 20056, 10251, 10248 and 20086 (referring to 2 IA and 2 control subjects) were excluded from the diversity analyses as they did not reach the threshold of 5,000 reads.
Then, sequences were analyzed using Vaiomer bioinformatic pipeline to determine bacterial community profiles. Briefly, after demultiplexing of the bar-coded Illumina paired reads, single read sequences were trimmed (reads R1 and R2 to respectively 290 and 240 bases) and paired for each sample independently into longer fragments, non-specific amplicons were removed and remaining sequences were clustered into operational taxonomic units (OTUs) using FROGS v1.4.0 [35] with default parameters. A taxonomic assignment was performed by Blast+ v2.2.30 against the Silva 132 Parc database. The OTUs were clustered based on 97% sequence similarities by two steps through swarm algorithm v2.1.6. The first step consisted of a clustering with an aggregation distance equal to 1. The second step consisted of a clustering with an aggregation distance equal to 3. OTUs with relative abundance lower than 0.005% of the whole dataset of reads were removed. All the reads are publicly available in the European Nucleotide Archive (ENA) with the accession number: PRJEB46474.
Bacterial DNA contamination assessment
To assess the potential bacterial DNA contamination from environment and reagents, several negative controls were included to the analyses (Methods Supplementary). This analysis showed that the background noise and blood contamination did not impact the results of this study.
Statistical analyses
Two-tailed Wilcoxon signed-rank tests and Friedman tests were used to compare 16S rRNA gene between groups. Odds ratios (OR) of CRC cases and their corresponding 95% confidence intervals (CI) were estimated as compared to control and IA subjects through logistic regression models conditioned on the matching variable. The number of 16S rRNA gene copies was included in the models as quintiles (categorically) based on the distribution of control and IA subjects, and as continuous variables, with the measurement unit sets to the difference between the upper cut-points of the 4th and 1st quintile, equal to 4328. Tests for trend were based on the likelihood ratio test between models with and without a linear term. Multinomial logistic regression was used to estimate separate ORs for colon and rectal cancer and to test for heterogeneity between the two sites.
Analysis on alpha-diversity and beta-diversity indices, as well as other taxonomic variables were computed among 296 subjects (because of 4 missing data due to technical reasons described above). To assess the samples diversity in terms of richness and evenness, various alpha-diversity indices, including Observed, Chao1, Shannon, Simpson and InvSimpson, were calculated by R PhyloSeq v1.14.0 package. Two-tailed Mann-Whitney tests were used to determine differences in terms of alpha-diversity between groups.
To estimate the beta-diversity, Permutational Multivariate Analysis of Variance Suing Distance Matrices (PERMANOVA) was applied based on the UniFrac distances, and Principal Coordinates Analysis (PCoA) was applied to visualize possible differences between groups.
Differences in terms of bacterial taxa and OTUs were evaluated through Welch test after DESeq2 normalization of data, based on negative binomial distribution (R package “DESeq2” v1.26.0).
For each statistical analysis, a post-hoc p-value adjustment was performed using the Hochberg-Benjamin correction, when appropriate.
Random Forest (R libraries: “randomforest”, “caret”, “Boruta”) was used to infer whether there was a set of variables that were able to discriminate which group the samples belong to. For supervised methods, in order to decrease the background noise due to the different library size of the samples sequenced, the data were normalized: the relative abundance of taxa was multiplied by 16S rRNA gene abundance (determined by qPCR) in each blood sample.
RESULTS
Table 1 gives the distribution of 100 healthy controls, 100 IA patients, and 100 cases of CRC according to sex, age, study centre and education. By design, the three groups had the same sex and centre distributions and were similar in terms of age. Cases tended to be less educated than IA subjects and controls in absence, however, of a significant difference (χ2 test p=0.18).
16S rRNA gene copies
We found an overall mean of 7687 16S rRNA gene copies per µl of blood, with a mean of 7628 among controls, 7586 among IA and 8387 among CRC subjects (9145 in colon and 7629 in rectal cancers), with no significant differences between the three groups (p for heterogeneity=0.482) (Figure 1, Supplementary). Since 16S rRNA gene copy distribution was very similar in control and IA subjects (p for heterogeneity=0.95), we grouped them as reference group and compared their 16S rRNA gene copy distribution with that of CRC, colon and rectal cancer cases. We did not find heterogeneity between CRC and control/IA groups (p=0.336), whereas significant differences emerged between colon cancer and control/IA (p =0.025; Figure 1).
Table 2 shows the distribution of control/IA subjects, CRC, colon and rectal cancer cases, the ORs and the corresponding 95% CIs according to quintiles of 16S rRNA gene copies, as well as the continuous OR for an increment of around 4300 gene copies. We found a direct association of 16S rRNA gene copies with colon cancer. Subjects in the highest quintile of gene copies (≥ 9707copies) had an OR of 2.62 for colon cancer as compared to those in the first three quintiles (<7618 copies). The association significantly increased after the fourth quintile (p for trend=0.013) and became stronger for levels higher than the fifth quintile cut-off. The OR was 7.22 (95%CI= 2.18-23.9) for the 90th centile (>11 265 copies) and 17.08 (95%CI= 3.36-86.87) for the 95th centile (>13 000 copies) as compared to the lowest three quintiles. In addition, the continuous OR indicated a two-fold increased risk for an increment equal to 4328 copies (OR=2.02; 95%CI=1.26-3.25).
In contrast, no association was found between 16S rRNA gene copies and rectal cancer. The OR for the highest versus the lowest three quintiles was 0.81 (95%CI=0.32-2.03) and the continuous OR was 0.86 (95%CI=0.51-1.42). The heterogeneity between colon and rectal was significant across quintiles (p=0.021) and in continuous (p=0.037).
The OR of CRC for the highest versus the lowest three quintiles was 1.59 (95%CI=0.89-2.82) and the continuous OR was 1.39 (95%CI=1.00-1.92).
When the association with 16S rRNA gene copies was further examined according to the colon subsites, the positive association appeared to be more pronounced for right colon, with an OR for the highest versus the lowest three quintiles of 10.75 (95%CI=2.16-53.42) as compared to 1.25 (95%CI=0.46-3.38) for other colon sites, in absence, however, of heterogeneity (p=0.123). Among 21 right colon cancers, 11 (53%) were in the highest quintile of gene copies, whereas among 29 other colon cancer subsites, 8 (26%) were in the highest quintile, in comparison to 40 out of 200 (20%) among control/IA subjects. In particular, among 11 cancers of the ascending colon, 7 (64%) were in the highest quintile (Table 3).
Alpha and beta diversity
No differences between groups were found in terms of α-diversity indices (Supplementary Table 1). When we restricted the analyses to the subjects in the two highest quintiles of 16S rRNA gene copies, we found a higher diversity in colon cancer cases as compared to controls in terms of Observed taxa and Chao1 indices for both genera (median of 32 vs 28, p=0.054 and median of 49 vs 40.6, p=0.059, respectively) and OTUs (median of 40 vs 34, p=0.039 and median of 71.1 vs 53.4, p=0.067), and as compared to rectal cancer cases in terms of both Observed genera and OTUs (median of 32 vs 29, p=0.023 and median of 40 vs 37, p=0.029) (Table 4, Figure 2). Colon cancers also appeared to be higher than IA in terms of Observed genera (median of 32 vs 28, p=0.071), and when we compared the Observed genera index in colon cancer cases versus control/IA subjects together, the p value for heterogeneity decreased to 0.035.
Concerning the beta diversity, no differences between groups were found overall and among subjects into the two highest quintiles of 16S rRNA gene copies. However, when we further restricted the analyses to the highest quintile of 16S rRNA, we observed significant differences between control/IA, colon and rectal cancer patients (Weighted UniFrac, p= 0.026; Unweighted UniFrac, p=0.051; Generalized UniFrac, p=0.031) (Figure 3 – A, B, C). Post-hoc analyses splitting the groups 2 by 2 showed a trend between controls/IA and colon cancers in Weighted UniFrac (p=0.073) (Figure 3 – D).
Taxonomic profiling of blood bacterial DNA between groups
We detected a total of 1081 OTUs that were taxonomically classified into 15 phyla, 34 classes, 87 orders, 164 families and 325 genera.
Pseudomonadaceae, Micrococcaceae, Burkholderiaceae, Caulobacteraceae, Moraxellaceae and Flavobacteriaceae were the six most represented families, which together accounted for more than 50% of all reads assigned to bacterial taxa (Supplementary Figure 2).
The mean of DESeq2 normalized data and the adjusted p values from the Welch test comparing every two groups (CRC versus control, CRC versus IA and IA versus control) and CRC versus control/IA on all the taxonomy levels and OTUs are shown in Supplementary Figure 3 and Figure 4, respectively. Several taxa differed between the groups. In particular, CRC samples were characterized by the increase of sequencing reads assigned to the bacterial families Peptostreptococcaceae and Acetobacteriaceae, together with a lower representation of the bacterial families Bacteroidaceae, Lachnospiraceae, and Ruminococcaceae (Figure 4).
Through the Random Forest supervised method, we found a set of variables that predict a CRC case versus controls/IA subjects with an accuracy of 0.70 (Sensitivity = 0.45; Specificity = 0.87) and another model that inferred the location of CRC discriminating colon from rectal cancer with an accuracy of 0.77 (Sensitivity = 0.71; Specificity = 0.82) (Figure 5). The first model inferred that the families Acetobacteraceae, Peptostreptococcaceae and Oligoflexaceae, the genus Melittangium, and the OTUs belonging to the genera Acinetobacter, Pelomonas, Novosphingobium and Pajaroellobacter were the most important variables to predict between CRC or control/IA group (Figure 5-A). The biplot in Figure 5-A shows a separation between CRC and control/IA subjects due to a higher dispersion of CRC cases. The second model inferred that the families Peptosteptococcaceae, Streptococcaceae and Ruminococcaceae, the genera Arthrobacter, Clostridium sensu stricto and Kocuria, and the OTUs belonging to the genera Legionella, Kocuria and Lepisosteus oculatus were the most important variables to predict the group between colon and rectal cancer, together with other variables that contributed to increase the accuracy of the model, including 16S rRNA gene copies, the phylum Proteobacteria, the order Rhizobiales and the genus Bacteroides (Figure 5 – B).
When the Random Forest algorithm was applied to the sample belonging to the highest quintile of 16S rRNA gene copies, the accuracy to predict between CRC and control/IA group increased to 0.79.
DISCUSSION
This study shows that colon cancer patients have an overrepresentation of bacterial DNA in blood as compared to tumor-free controls, including IA or healthy subjects. These results appeared stronger for cancers in the right colon, whereas no difference in terms of bacterial load was found for rectal cancer. For high levels of 16S rRNA gene copies (> 7618), colon cancers had increased community diversity but did not differ on community evenness from controls.
To our knowledge, no other study conducted an ad hoc data collection to systematically investigate blood microbiota in relation to CRC and/or IA to date. In a Chinese study, circulating bacterial DNA of 25 CRC, 10 IA and 22 healthy subjects was analysed through whole genome sequencing techniques on plasma samples [29], suggesting that the Flavobacterium DNA relative abundance was reduced in CRC and IA (<1%) as compared to control subjects (9.4%); on the contrary, there was a 10-fold increase DNA abundance of genus Ruminococcus in CRC (0.2%) as compared to controls (0.02%). In the same study, various attempts to identify bacterial biomarkers of CRC or IA through Random Forest algorithms were proposed, reporting a set of 28 species as important features to discriminate between CRC/IA group and controls, but results were based on small samples (23 and 34 subjects for discovery and validation cohort, respectively). Messaritakis et al. used PCR for the amplification of genomic DNA on blood in order to compare 397 adjuvant or metastatic CRC patients with 32 healthy controls in terms of the presence of 3 bacterial genes [36]. Significantly higher rates of glutamine synthase gene of Bacteroides fragilis and 5.8S rRNA of Candida albicans were observed in CRC patients (p < 0.001), especially in metastatic disease, suggesting a prognostic value of the detection of microbial translocation in blood. No association was found for the genus E. coli.
In our data, CRC patients had an enrichment of Peptostreptococcaceae and Acetobacteriaceae and a reduction of Bacteroidaceae, Lachnospiraceae, and Ruminococcaceae. The latter families are the most represented in the fecal and intestinal microbiota [37], while Peptostreptococcaceae and Acetobacteriaceae are less represented in the human intestine. These two families were found to be more abundant in chronic kidney disease (CKD) than healthy subjects in a case-control study on blood microbiome including 20 CKD cases and 20 controls [38]. In a study involving 99 CRC cases and 103 controls, high abundance of Lachnospiraceae was negatively correlated with colonic colonisation by oral bacteria, including oral pathogens associated with CRC, suggesting a protective role of Lachnospiraceae, potentially influenced by Western dietary patterns [39]. Ruminococcaceae abundance was found to decrease in CRC tissue as compared to tumor-adjacent biopsies and stool samples from the same case in a study including 294 subjects [40]. Moreover, the members of the families Lachnospiraceae and Ruminococcaceae were recognized as the most active members of the human colonic microbiota [41], able to efficiently convert fibres into butyrate [42], a bacterial catabolite widely demonstrated to regulate T-reg lymphocyte priming preventing CRC [43, 44].
Gut microbiota, inflammation and nutrition play an important role on intestinal permeability which may influence the risk of CRC [45, 46]. Bacteria have shown capacity to interact directly with immune system cells and to impact in multiple host functions [47, 48], but it is unclear which one comes first between local inflammation, intestinally permeability and changes in resident microbiota [49, 50]. In this context, it has also been shown that bacteria can disseminate to liver through the disruption of gut vascular barrier in colorectal cancer with hepatic metastasis [51].
Our data corroborate the hypothesis of a greater bacterial translocation from gastrointestinal tract to bloodstream in colon cancer, especially in right colon cancer, but not in rectal cancer patients. Along this line, various factors and biological aspects were different according to CRC sites. Physical activity, antibiotic use and family history of CRC were relevant in colon but not in rectal cancer [52, 53, 54, 55], and associations of some dietary components were found to be stronger for the cancer of some colon subsites [56, 57]. Moreover, the proximal and distal colon have a different embryological origin, also resulting in a distinct vascular supply [58]. Differences in underlying genetics, including genetic expression and immunological activity have been highlighted [59], with a negative gradient of immune cells from proximal colon to distal colon and rectum [58]. Gut microbiota and mechanisms of carcinogenesis also vary along the lower intestinal tract [54, 60, 61]. Supporting our hypothesis of a differential translocation, an important role may also be played by mucus layers, which were found to vary along the colon in mice in terms of O-glycosylated entities of Muc2 and of the host-microbiota symbiosis regulation [62].
Recent meta-analyses on faecal microbiome suggested universal, validated predictive taxonomic and functional microbiome CRC signatures, revealing potential mechanisms behind the intestinal carcinogenesis processes, and putting the basis for non-invasive CRC diagnosis through metagenomic analysis of faecal microbiome [21, 22, 23]. CRC screening programs can suffer from limited sensitivity and specificity of the tests used and from possible low adherence with CRC screening recommendations – mainly due to the refusal of faecal test and colonoscopy – in some countries. Innovative, non-invasive diagnostic tests would be of support for CRC control [63, 64]. Various tests based on blood analyses have been proposed, including techniques estimating the presence of Streptococcus bovis [65] or evaluating the antibody level against Fusobacterium nucleatum [66] in blood, and, more recently, the use of blood microbiome profile has been suggested [28].
Our data showed various taxa associated with CRC and identified a set of covariates of taxa and OTUs that is able to predict CRC from control/IA subjects and colon from rectal cancers with an accuracy around 0.7. When restricting the analyses to subjects with high levels of 16S rRNA gene copies, we found more accurate models, but we were not able to separate analysis for colon cancer only and results should be interpreted with caution given the small numbers. However, these findings can serve as a basis to conceive new non-invasive techniques for an early diagnosis of CRC based on bacterial DNA circulating in peripheral blood. In particular, they can be relevant for the detection of right colon cancers, which often have a subtle presentation and a more advanced stage at diagnosis, partly because right colon is more difficult to be explored, when compared to rectal and distal colon [67].
One of the strengths of this study is the conduction of an ad-hoc data collection, which includes a new developed standardized protocol, fully observed by the recruitment centres. CRC and the corresponding IA and control subjects were comparable in terms of setting since they derived from the same catchment area and recruitment procedures. Moreover, interviewers and investigators were blinded to the group assignment, as data collection was performed before endoscopy and diagnosis. Most cases were detected at the first CRC-diagnosing colonoscopy, allowing us to recruit truly incident cases, characterized by a minimal time between participant’s recruitment data and cancer diagnosis and with available clinical data from the very beginning of the diagnostic process. Moreover, the presence of healthy controls allowed a clean comparison with CRC and IA, and the inclusion of IA allowed to investigate an important phase on the mechanisms behind the process of the adenoma-carcinoma sequence. We were also able to adjust for study centre, sex and age, eliminating possible confounding effects of these covariates on 16s rRNA gene copies results.
In conclusion, our data confirm the presence of bacterial DNA in blood in healthy adults and indicate that colon cancer patients had a higher DNA bacterial load and a different bacterial profiling as compared to healthy, IA and rectal cancer subjects, revealing a higher passage of bacteria from gastrointestinal tract to bloodstream in colon cancer. Further studies are needed to confirm this result and possibly exploit it for the development of innovative early techniques for colon cancer diagnosis.
Data Availability
The epidemiological data that support the findings of this study are available upon reasonable request from the corresponding author (MR). Raw metagenomic reads are publicly available in the European Nucleotide Archive (ENA) with the accession number: PRJEB46474.
Funding
This work was supported by the Italian Foundation for Cancer Research (AIRC) (My First AIRC grant no. 17070).
Contributors
MM and RP contributed equally to this paper.
Conception and design: MR, SG, CLV, MP, RP, AA, MC, MauV and MM. Analysis of data: MR, GG and SG. Interpretation of data: MR, SG, CLV, RP, AA, MM, MC, GG and RB. Drafting the manuscript: MR, SG, GG, CLV, MS and RB. All authors contributed to data collection, and critical revision and final approval of the manuscript.
Competing interests
The authors declare that there are no conflicts of interest
Acknowledgments
The authors would like to express their most sincere gratitude and appreciation to all participants and collaborators to this study, without whose effort this work would not have been feasible. A special thanks to Margherita Cozzi, Elena Tansi, Cinzia Della Noce, Rosa Restieri, Nadia Zaretti for their valuable involvement in this study. We thank all the nursing staff at the Digestive and Interventional Endoscopy Unit, ASST Grande Ospedale Metropolitano Niguarda, Milan, and at the Gastroenterology and Endoscopy Unit, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, and Valentina Taverniti, Guido Basilisco, Luca Elli, Francesca Ferretti, Gian Lorenzo Scacchi, Gian Eugenio Tontini, Giuseppe Torgano for their contribution in the data collection. A thankful mention to Annalisa Pascarella for the help in the graphic preparation and to Patrizia Riso for participating in the proof of concept of the project.