Abstract
Many drug discovery projects are started, but few progress fully through clinical trials to approval. Previous work has shown that human genetics support for the therapeutic hypothesis increases the chance of trial progression. Here, we applied natural language processing to classify the freetext reasons for 28,842 clinical trials that stopped before their endpoints were met. We then evaluated these classes in the light of the underlying evidence for the therapeutic hypothesis and target properties. We show that trials are more likely to stop due to lack of efficacy in the absence of strong genetic evidence from human populations or genetically-modified animal models. Furthermore, trials are more likely to stop for safety reasons if the drug target gene is highly constrained in human populations and if the gene is not selectively expressed. These results support the growing use of human genetics to evaluate targets for drug discovery programmes.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was funded in part by a Wellcome Trust grant [Grant number 206194]. For the purpose of Open Access, the authors have applied a CC-BY public copyright licence to any author-accepted manuscript version arising from this submission.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data and methods in the present work are available online as described in the respective sections of the manuscript.
Abbreviations
- BERT
- Bidirectional Encoder Representations from Transformers
- ClinGen
- The Clinical Genome
- ClinVar
- Clinically relevant Variants
- COSMIC
- Catalogue of Somatic Mutations in Cancer
- gnomAD
- The Genome Aggregation Database
- WAS
- Genome-Wide Association Studies
- IntOgen
- Integrative Onco Genomics
- LOEUF
- Loss-Of-Function Observed/Expected Upper Bound Fraction
- LOF
- Loss-Of-Function
- NLP
- Natural Language Processing OR - Odds Ratio
- PI
- Principal Investigator
- UniProt
- Universal Protein resource