Abstract
Complex polymicrobial communities inhabit the lungs of individuals with cystic fibrosis (CF) and contribute to the decline in lung function. However, the severity of lung disease and its progression in CF patients are highly variable and imperfectly predicted by host clinical factors at baseline, CFTR mutations in the host genome, or sputum polymicrobial community variation. The opportunistic pathogen Pseudomonas aeruginosa (Pa) dominates airway infections in the majority of CF adults. Here we hypothesized that genetic variation within Pa populations would be predictive of lung disease severity. To quantify Pa genetic variation within whole CF sputum samples, we used deep amplicon sequencing on a newly developed custom Ion AmpliSeq panel of 209 Pa genes previously associated with the host pathoadaptation and pathogenesis of CF infection. We trained machine learning models using Pa single nucleotide variants (SNVs), clinical and microbiome diversity data to classify lung disease severity at the time of sputum sampling, and to predict future lung function decline over five years in a cohort of 54 adult CF patients with chronic Pa infection. The models using Pa SNVs alone classified baseline lung disease with good sensitivity and specificity, with an area under the receiver operating characteristic curve (AUROC) of 0.87. While the models were less predictive of future lung function decline, they still achieved an AUROC of 0.74. The addition of clinical data to the models, but not microbiome community data, yielded modest improvements (baseline lung function: AUROC=0.92; lung function decline: AUROC=0.79), highlighting the predictive value of the AmpliSeq data. Together, our work provides a proof-of-principle that Pa genetic variation in sputum is strongly associated with baseline lung disease, moderately predicts future lung function decline, and provides insight into the pathobiology of Pa’s effect on CF.
Importance Cystic fibrosis (CF) is among the most common, life-limiting inherited disorder, caused by mutations in the CF transmembrane conductance regulator (CFTR) gene. CF causes progressive damage to the lungs, the major cause of morbidity and mortality in CF patients. However, the rate of lung function decline is highly variable across CF patients, and cannot be fully explained using existing biomarkers in the human genome or patient co-morbidities. Pseudomonas aeruginosa (Pa) is known to evolve and adapt within chronic CF infections. We hypothesized that within-patient Pa diversity could affect lung disease severity. In a CF cohort study, we demonstrate the utility of machine learning tools for predictive modeling of baseline lung function and subsequent decline in CF patients using deep within-patient Pa amplicon sequencing. Our findings show the potential of these models to identify high-risk CF patients based on Pa diversity within the lung.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The project was supported by funding from CIHR (PJT-148827 to DN) and a Vertex Research Innovation Award (DN), and salary support from the Cystic Fibrosis Canada Research Fellowship (Award ID 558850 to JD), the Leopoldina Foundation (German National Academy of Sciences Leopoldina, Award ID LPDS 2017-17), the Reseau en Sante respiratoire (IL), and the Fonds de Recherche en Sante Quebec (IL, DN). MMS and BJS were supported by a Genome Canada and Genome Quebec Bioinformatics and Computational Biology grant.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was carried out with the approval from the Research Ethics Boards from the University of Calgary (15-0854) and McGill University Health Centre (15-623).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All amplicon sequencing data generating in this project are deposited in NCBI GenBank under BioProject PRJNA763719.