Abstract
Cancer diagnosis using cell-free DNA (cfDNA) has potential to improve treatment and survival but has several technical limitations. Here, we show that tumor-associated mutations create neomers, DNA sequences 13-17 nucleotides in length that are predominantly absent from genomes of healthy individuals, that can accurately detect cancer, including early stages, and distinguish subtypes and features. Using a neomer-based classifier, we show that we can distinguish twenty-one different tumor-types with higher accuracy than state-of-the-art methods. Refinement of this classifier using a handcrafted set of kmers identified additional cancer features with greater precision. Generation and analysis of 451 cfDNA whole-genome sequences demonstrates that neomers can precisely detect lung and ovarian cancer with an area under the curve (AUC) of 0.93 and 0.89, respectively. In particular, for early stages, we show that neomers can detect lung cancer with an AUC of 0.94 and ovarian cancer, which lacks an early detection test, with an AUC of 0.93. Finally, testing over 9,000 sequences with either promoter or massively parallel reporter assays, we show that neomers can identify cancer-associated mutations that alter regulatory activity. Combined, our results identify a novel, sensitive, specific and simple diagnostic tool that can also identify novel cancer-associated mutations in gene regulatory elements.
Competing Interest Statement
I.G.S., O.Y.B., I.M., M.H. and N.A. are co-founders of Neomer Diagnostics and have filed patent applications covering embodiments and concepts disclosed in the manuscript.
Funding Statement
This work was supported in part by the Benioff Initiative for Prostate Cancer Research, the UCSF Catalyst award, the UCSF Innovations Ventures Philanthropy Fund and National Human Genome Research Institute grant number UM1HG011966 (N.A). MH was supported by core funding from the Wellcome Trust and core funding from the Evergrande Center.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All data for this study is deidentified and publicly available. This includes: TCGA data: publicly available mutation calls were obtained at: https://portal.gdc.cancer.gov/ Permission to use the data from https://ega-archive.org/studies/EGAS00001003206 was obtained from the Data Access Committee after contacting Dr Ellen Heitzer whose study was approved by the Ethics Committee of the Medical University of Graz (approval number 21-228 ex 09/10 [prostate cancer] and 29-272 ex 16/17 [high-resolution analysis of plasma DNA]), conducted according to the Declaration of Helsinki and written informed consent was obtained from all patients and healthy probands, respectively.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
We have added the following to the revision: 1)Generation and analysis of 451 cell free DNA whole-genome sequences using a neomer classifier. 2)Massively parallel reporter assays for over 9,000 sequences showing that neomers affect regulatory activity.
Data Availability
For the TCGA data, mutation calls are publicly available in: https://portal.gdc.cancer.gov/ Permission to use the data from https://ega-archive.org/studies/EGAS00001003206 was obtained from the DAC after contacting Dr Ellen Heitzer. WGS sequencing data was submitted to dbGAP. MPRA sequencing data was deposited in the NCBI short read archive (SRA) as Bioproject PRJNA917083.