Abstract
Estimating the time since HIV infection (TSI) at population level is essential for tracking changes in the global HIV epidemic. Most methods for determining duration of infection classify samples into recent and non-recent and are unable to give more granular TSI estimates. These binary classifications have a limited recency time window of several months, therefore requiring large sample sizes, and cannot assess the cumulative impact of an intervention. We developed a Random Forest Regression model, HIV-phyloTSI, that combines measures of within-host diversity and divergence to generate TSI estimates from viral deep-sequencing data, with no need for additional variables. HIV-phyloTSI provides a continuous measure of TSI up to 9 years, with a mean absolute error of less than 12 months overall and less than 5 months for infections with a TSI of up to a year. It performed equally well for all major HIV subtypes based on data from African and European cohorts. We demonstrate how HIV-phyloTSI can be used for incidence estimates on a population level.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
PANGEA is funded by the Bill & Melinda Gates Foundation (consecutive grants OPP1084362 and OPP1175094). BEEHIVE was funded by a European Research Council Advanced Grant (PBDR-339251). HPTN 071 is sponsored by the National Institute of Allergy and Infectious Diseases (NIAID) under Cooperative Agreements UM1-AI068619, UM1-AI068617, and UM1-AI068613, with funding from the US President's Emergency Plan for AIDS Relief (PEPFAR). Additional funding is provided by the International Initiative for Impact Evaluation (3ie) with support from the Bill & Melinda Gates Foundation, as well as by the Division of Intramural Research, NIAID, the National Institute on Drug Abuse (NIDA), and the National Institute of Mental Health (NIMH), all part of NIH. HPTN 071-02 Phylogenetics is sponsored by NIAID, NIMH, and the Bill & Melinda Gates Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIAID, NIMH, NIDA, PEPFAR, 3ie, or the Bill & Melinda Gates Foundation. The computational aspects of this research were supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z and the NIHR Oxford BRC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All participants gave informed written consent, and ethics approvals were granted to the institutions that generated the data by in-country institutional review boards. In-country review board approvals were obtained from the following countries: Uganda for the Rakai Community Cohort study, the MRC-UVRI, the BEEHIVE collaboration and the UW cohorts, Zambia for the HPTN 071-02 PopART Phylogenetics study, Botswana, Kenya, South Africa and Tanzania for the UW cohorts and Belgium, Finland, France, Germany, the Netherlands, Sweden, Switzerland and the United Kingdom for the BEEHIVE cohorts. The design of the African studies was discussed and agreed with community advisory boards. The BEEHIVE study, which only accessed fully anonymised data, was approved by the ethics panel of the European Research Council. The University of Washington International Clinical Research Studies (ICRC) program was approved by IRB committees of the following institutions: University of Cape town, Research Ethics Committee; University of Witwaterstrand, HREC; Kenya Medical Research Institute; University of California San Francisco; Moi University; Indiana University; Kenyatta National Hospital; University of Washington; Kilimanjaro Christian Medical University College; London School of Hygiene and Tropical Medicine; Uganda National Council for Science and Technology (UNCST); Botswana Harvard Partnership (Republic of Botswana Ministry of Health, and Harvard School of Public Health). The Rakai Community Cohort study was approved by the Ugandan Virus Research Institute Scientific Research and Ethics Committee; the Ugandan National Council of Science and Technology; and the Western Institutional Review Board. The HPTN 071 (PopART) trial and the HPTN 071-02 (PopART) Phylogenetics Study was granted by ethics committees at the London School of Hygiene and Tropical Medicine, University of Zambia, and Stellenbosch University, South Africa.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Model, example data, and code to obtain estimates from input data are freely available at https://github.com/BDI-pathogens/HIV-phyloTSI to facilitate application of HIV-phyloTSI to external datasets. Sequencing data and metadata are available by application to the PANGEA consortium, https://www.pangea-hiv.org.
Abbreviations used
- FRR
- false recency rate
- LRTT
- longest root-to-tip distance
- MAA
- Multi-Assay Algorithms (for measuring TSI)
- MAE
- mean absolute error
- MAF
- minor allele frequency
- MAF12c
- MAF in the 1st and 2nd codons
- MAF3c
- MAF in the 3rd codon
- MRCA
- most recent common ancestor
- NGS
- next-generation sequencing
- OLS
- ordinary least squares (regression)
- PrEP
- Pre-Exposure Prophylaxis
- TSI
- time since infection (with HIV)
- PreP
- pre-exposure prophylaxis
- UWP
- University of Washington International Clinical Research Studies (ICRC) program datasets
- RAK
- Rakai Community Cohort Study dataset
- BEE
- BEEHIVE Consortium dataset
- MRC
- MRC-UVRI Uganda dataset