Author Response:
Reviewer #1:
Weaknesses:
For me, most of the weaknesses of this manuscript are related to the cluster detection:
- There is no consensus on the definition of transmission clusters in the field. However, the rational of taking the union (rather than the intersection) of two different methods (HIV-TRACE and cluster picker) did not become clear to me.
- HIV-TRACE defines clusters based on pairwise genetic distances and cluster picker identifies clusters using pairwise genetic distance with the guidance of a phylogenetic tree (and node support / bootstrap values). Given the underlying sample size and that the phylogeny was constructed already, the rationale for the purely distance related criterion of HIV-TRACE did not become clear.
We thank the reviewer for their comments and are happy to provide additional results that motivate our decision to use the union of clusters detected with HIV-TRACE and Cluster Picker to estimate HIV transmissions within and between demographic sub-groups in the Botswana - Ya Tsie trial population. The primary motivation was that a filtering step was required to save time and computational resources from evaluating sequences that were too distantly related, before applying the “gold standard” of Phyloscanner to detect directed (when possible) transmission pairs. Accordingly, clustering algorithms plus a distance threshold helped to achieve this filtering. Because we shared what we take to be the reviewers’ concerns about either of the algorithms alone, we sought to maximize the number of transmission pairs that could be identified between participants in the Botswana – Ya Tsie trial with Phyloscanner by using the union of clusters detected with HIV-TRACE and Cluster Picker. This also served as a sensitivity analysis that allowed us to evaluate the extent to which the clustering patterns observed were specific to a single algorithm.
Furthermore, a previous study done by Rose and colleagues (PMID: 27824249) to compare the number and size of clusters identified with HIV-TRACE and Cluster Picker clustering algorithms revealed that HIV-TRACE generally identified larger but fewer clusters, compared with clusters identified with Cluster Picker that were typically more numerous and mostly small 2-person clusters (Please see Figure 3B below extracted from Rose and colleagues (PMID: 27824249)). This suggested that HIV-TRACE would be helpful in detecting potentially larger transmission chains and Cluster Picker would be valuable in revealing potential transmission events between pairs of individuals.

Of the 236 genetic clusters detected with the two algorithms, we identified 19 full or partial clusters (including 41 sequences) that included members that were only detected with HIV- TRACE and 122 full or partial clusters (including 242 sequences) that were unique to Cluster Picker. Moreover, of the 82 directed male-female transmission pairs inferred from the sample, (n = 5) were from genetic clusters that were unique to HIV-TRACE compared with (n = 27) that were from clusters unique to Cluster Picker. Of the five transmission events unique to HIV- TRACE clusters, three occurred in intervention communities originating from control communities. By contrast, four of the twenty-seven transmission events unique to Cluster Picker clusters occurred in intervention communities from control communities.
In summary, estimates of HIV transmissions in the trial population based on the full overlap of clusters detected with HIV-TRACE and Cluster Picker would have excluded 32 of the 82 male- female pairs used for the primary analysis.
- For a phylogeny of this size it is feasible to calculate real bootstrap values instead of using (in my experience more liberal) Shimodaira-Hasegawa support values.
We value the reviewer suggestion and agree that real bootstrap values could be ideal. However, the likely benefit of computing the suggested bootstrap values and thereafter repeating the entire analysis inferring transmission pairs with Phyloscanner and estimating transmission flows would be minimal. As noted above, liberality in a filtering step is a virtue (avoiding filtering out pairs of interest) as long as it does not lead to unfeasibly large computational burden, as this did not.
- In Supplementary Note 2.5 it is described how the linkage and direction of transmission score threshold of 57% was chosen. However, the finding that almost half of the accordingly selected probable source-recipient pairs were same-sex and had to be excluded from the analysis questions the reliability of the threshold.
We apologize for the insufficient clarity in our description and would like the reviewer to kindly note that the threshold in of itself is insufficient to distinguish between Female-Female pairs separated by a single Male intermediate, but rather by design can distinguish between direct Male-Female pairs and Male-Female pairs separated by several intermediates. Once again, the threshold was meant to be a filter that would allow us to run Phyloscanner on a feasible number of sequences, thus appropriately should let through some pairs that are rejected by later steps in the pipeline. Also, kindly note that all previous Supplementary Notes are now presented in the methods section in line with the reviewer’s suggestions.