Abstract
Background Analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequence data from household infections should aid its detailed epidemiological understanding. Using viral genomic sequence data, we investigated household SARS-CoV-2 transmission and evolution in coastal Kenya households.
Methods We conducted a case-ascertained cohort study between December 2020 and February 2022 whereby 573 members of 158 households were prospectively monitored for SARS-CoV-2 infection. Households were invited to participate if a member tested SARS-CoV-2 positive or was a contact of a confirmed case. Follow-up visits collected a nasopharyngeal/oropharyngeal (NP/OP) swab on days 1, 4 and 7 for RT-PCR diagnosis. If any of these were positive, further swabs were collected on days 10, 14, 21 and 28. Positive samples with an RT-PCR cycle threshold of <33.0 were subjected to whole genome sequencing followed by phylogenetic analysis. Ancestral state reconstruction was used to determine if multiple viruses had entered households.
Results Of 2,091 NP/OP swabs that were collected, 375 (17.9%) tested SARS-CoV-2 positive. Viral genome sequences (>80% coverage) were obtained from 208 (55%) positive samples obtained from 61 study households. These genomes fell within 11 Pango lineages and four variants of concern (Alpha, Beta, Delta and Omicron). We estimated 163 putative transmission events involving members of the sequenced households, 40 (25%) of which were intra-household transmission events while 123 (75%) were infections that likely occurred outside the households. Multiple virus introductions (up-to-5) were observed in 28 (47%) households with the 1-month follow-up period.
Conclusions We show that a considerable proportion of SARS-CoV-2 infections in coastal Kenya occurred outside the household setting. Multiple virus introductions frequently occurred into households within the same infection wave in contrast to observations from high income settings, where single introduction appears to be the norm. Our findings suggests that control of SARS-CoV-2 transmission by household member isolation may be impractical in this setting.
Introduction
Households are a fundamental unit of social structure and the frequent locale of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission 1,2. The household secondary attack rate for SARS-CoV-2 has been estimated to be about 21.1% (95%CI: 17.4-24.8%) with considerable heterogeneity observed over geographic regions and time periods 3-6. Improved understanding SARS-CoV-2 household transmission dynamics including the frequency of virus transmitting from within a household compared to from outside the household, may help refine local control measures. However, to date, such data are limited for sub-Saharan Africa.
SARS-CoV-2 genomic analysis has played a key role throughout the coronavirus disease 2019 (COVID-19) pandemic in elucidating its transmission dynamics 7-11. Genomic analysis has helped uncover multiple virus introductions into close living environments e.g., hospitals 12,13, prisons14, cruise ship 15, long-term care facilities 16, and learning institutions 17 and has also uncovered superspreading events 13,18. It is however still unclear if analysing SARS-CoV-2 genomic data from household clusters can delineate transmission chains 19,20. Unlike many RNA viruses, SARS-CoV-2 replication is believed to be under some level of proof-reading21, limiting its substitution rate (9.90 × 10−4 substitutions/site/year; 95% Bayesian credible interval: 6.29 × 10−4 to 1.35 × 10−3) 22. A previous genomic analysis of a family infection cluster in Ireland, found only a limited number of mutations between family members testing positive 20.
In the present study, we sought to document SARS-CoV-2 transmission patterns within households in coastal Kenya by analysis of infections identified in a case-ascertained cohort during the local waves of infection. Until September 2022, Kenya had experienced six major waves of SARS-CoV-2 infections23. The current study coincided with national waves three, four, and five, a period during which Alpha (B.1.1.7), Delta (B.1.617.2) and Omicron (B.1.1.529) variants of concern (VOC) predominated, respectively 24. We undertook detailed genomic analysis to identify independent SARS-CoV-2 introductions into households during clustered infections, and understand frequency of infection spread within households in coastal Kenya.
Methods
Study design and recruitment
We conducted a case-ascertained study in coastal Kenya, where new households were recruited via five local health facilities or County Department of Health rapid response team (RRTs). Households were defined as dwellings or groups of dwellings that share the same kitchen or cooking space. Many of the recruited households were from within the Kilifi Health and Demographic Surveillance System (KHDSS) area located in Kilifi, Coastal Kenya 25. To get enrolled, a household needed to have at least two occupants be accessible by road and permission obtained from the household head. In the initial study period, only households whose members were contacts of confirmed cases within 2-5 days were recruited and but due to slow enrolment, this was revised to include households with confirmed cases. A household was exempted if at the time of recruitment: two or more members had already developed COVID-19 symptoms (e.g. fever, sore throat, cough etc), a member had been hospitalized due to COVID-19, or the household had been enrolled in a trial of therapeutic COVID-19 product.
Follow-up
During each household visit, a nasopharyngeal and/or oropharyngeal (NP/OP) swab was obtained for real-time RT-PCR testing. The study had two follow-up arms: “reduced follow-up” and “intense follow-up”. Households in the “reduced follow-up” arm were those where all the members tested SARS-CoV-2 negative at day 1, 4 and 7; therefore, and follow-up was discontinued henceforth. The “intense follow-up arm” was activated when a household member tested positive on day 1, 4, or 7, and the household was sampled again on day 10, 14, 21 and 28. Data on baseline household and demographic characteristics were collected by the study team at enrolment. During all households’ visits, data on presence of acute respiratory illness (ARI) symptoms (e.g., fever, cough, runny nose, sore throat, headache) were collected.
Laboratory procedures
SARS-CoV-2 diagnosis
SARS-CoV-2 testing of study samples was undertaken alongside samples collected in six coastal counties of Kenya as part of the national COVID-19 tests as previously described 26. Four different viral RNA extraction kits were deployed in combination with five different RT-PCR kits/protocol namely, Da An Gene Co. detection Kit, European Virus Archive-Global (EVAg) E gene protocol, Standard M Kit, Sansure Biotech Novel Coronavirus (2019-nCoV) Nucleic Acid Diagnostic Real-time RT-PCR kit26. Positives were determined using the kit/protocol-defined cycle thresholds (Ct). In kits where multiple SARS-CoV-2 genomic regions were targeted, the average cycle threshold (Ct) was calculated from the individual Cts.
Genome sequencing
We aimed to whole genome sequence all the RT-PCR positive samples with a cycle threshold of < 33.0. Viral RNA was re-extracted from the specimens using QIAamp viral RNA mini-Kit following the manufacturer’s instructions and converted to cDNA using Lunascript kit with ARTIC protocol primers 27. Genome amplification was conducted using Q5 PCR kit and ARTIC protocol primers (initially v3 and then v4). Sequencing libraries were prepared using Oxford Nanopore Technologies (ONT) ligation sequencing kit SQK-LSK109 and the ONT Native Barcoding Expansion kit as described in the ARTIC protocol 27. Sequencing was performed on Oxford Nanopore Technologies’ MinION or GridION devices using R9.4.1 flow cells.
Bioinformatic analysis
Genome assembly and lineage assignment
The raw sequencing reads (FAST5) were base-called and demultiplexed using ONT’s Guppy v3.5-4.2. The resultant files (FASTQ) were assembled into consensus genomes using ARTIC bioinformatic pipeline reference-based approach (https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html; last accessed 2022-09-17). Only nucleotides with a read depth of more than × 20 were included into the consensus sequence. High-quality genomes were assigned Pango lineages by using pangolin v4.0.5, PUSHER-v1.3, scorpio v0.3.16 and constellation v0.1.6 28,29.
Phylogenetic analysis
Multiple sequence alignments were generated using Nextalign v.1.10.1 referenced-based aligner within the Nextclade tool v0.14.2 30. Alignments were visualized using a custom Python script and “snipit” tool (https://github.com/aineniamh/snipit; last accessed 2022-05-20). Pairwise distances were calculated using pairsnp.py (https://github.com/gtonkinhill/pairsnp/; last accessed 2022-05-20). Phylogenetic relationships between all recovered genomes and between viruses classified under the same VOC were inferred using maximum likelihood (ML) methods in IQTREE v2.1.3 under the general time reversible (GTR) substitution model. We included contemporaneous genomes from the six coastal Kenya counties (Mombasa, Kilifi, Kwale, Taita Taveta, Tana River and Lamu) that were sequenced as part of the national SARS-CoV-2 genomic surveillance to provide phylogenetic context to the household study genomes. The phylogenetic trees were combined with metadata and visualized with the R package “ggtree” v2.4.2.
Virus introductions
The number of independent virus introductions into the households was inferred using two approaches; (i) comparing observed nucleotide differences between pairs of household genomes with the number of mutations expected over the time interval between the two sampling dates, and (ii) using ancestral state reconstruction (ASR) to count the transitions into a household 10.
Statistical analysis
Summary statistics were computed for key demographic characteristics including mean, median, standard deviation as appropriate. Infection prevalence was expressed using proportions and comparison between groups included appropriate statistical tests (e.g., chi-square or Fisher’s exact). All statistical analyses were performed in R packages.
Ethical consideration
The study protocol was reviewed and approved by both the Scientific and Ethics Research Unit (SERU) at Kenya Medical Research Institute (KEMRI), Nairobi, Kenya (SERU protocol # 4077) and the University of Warwick, Biomedical and Scientific Research Ethics Committee, Coventry, United Kingdom (REF: BSREC 150/19-20 AM01). Prior to data and sample collection, written informed consent was obtained from all participants aged 18 years or older, while for participants aged less than 18 years consent was obtained from their parents or legal guardians. Assent was also sought for adolescents (11-17 years of age).
Role of funding source
The funders of the study had no role in study design, data collection, data analysis, data interpretation or writing of the report.
Results
Baseline characteristics
Of 2,091 nasopharyngeal/oropharyngeal (NP/OP) swabs collected from 573 participants from 158 households between 10th December 2020 and 22nd February 2022, 375 (17.9%) samples tested SARS-CoV-2 positive (Fig. 1)). The positives arose from 171 infected participants in 80 households with temporal distribution as shown in S1 Fig.
The positive cases had a median age of 27 years (IQR: 13.0-46.0; S1 Table), with 104 (60.8%) being females. Compared to participants who remained SARS-CoV-2 negative during the follow-up period; positive cases were more likely to report at least one ARI symptom (63.2% vs 22.6%; p <0.001). The bulk of household recruitments coincided with the national waves 3 and 4 (Fig. 2A & B) with only one household recruited during wave 2. The Kenyan government COVID-19 counter-measures during the study period fluctuated as depicted by the Oxford stringency index (Fig. 2C) 31.
Genomic sequencing and lineage/VOC classification
We recovered near complete genomes (over 80% coverage) from 208 (55.4% of positive samples) from 111 participants from 61 households (Fig. 2D). The samples that failed sequencing (n = 167) had either high Ct values on re-extraction (>33.0; S2 Fig.) or yielded poor quality PCR products during library preparation. The recovered genomes were classified into Pango lineage B.1 (n = 11), Alpha variant of concern (VOC; n = 70), Beta VOC (n = 22), Delta VOC (n = 86) and Omicron VOC (n = 19). Within the Delta VOC, five Pango lineages were identified namely, B.1.617.2 (n = 16), AY.16 (n = 5), AY46 (n = 3), AY.116 (n = 58) and AY.122 (n = 4) while within the Omicron VOC three Pango lineages were identified, namely, BA.1.1 (n = 14), BA.1.1.1 (n = 4) and BA.1.9. (n = 1). A summary of the distribution 12 Pango lineages that were identified across the households and sequenced cases and their history is presented in (S2 Fig and S2 Table).
Phylogenetic clustering of the household study genomes
To investigate the genetic diversity in the household study genome sequences, we reconstructed a ML phylogeny that included background coastal Kenya co-circulating viruses (n = 2,382). As expected, the genome sequences clustered by VOC and Pango lineages (S3 Fig). Notably, lineage B.1 sequences were found in multiple branches of the phylogeny, including some at the base of branches leading to Beta and Delta VOCs. To assess the genetic relatedness of the recovered genomes within and between various households, we reconstructed VOC-specific phylogenies with tips coloured by the household of sampling (Fig. 3). Here we observed both intra- and inter-household clustering. For a few households, tip nodes corresponding to genomic sequences were inferred in distinct clades, already indicating multiple introductions into the same household (Fig. 3).
Estimating the number of introductions into the households
SARS-CoV-2 has been reported to have an evolutionary rate of ∼2 substitutions per genome per month. A heterogenous distribution of the pairwise nucleotide differences of specimens identified in the same household was observed (S4 Fig.). More than two nucleotide differences were however seen in 17 households, implying multiple introductions. We investigated the potential number of virus introductions into the households using the ASR approach performed along the dated ML phylogeny. A total of 113 virus introductions were predicted into the 61 households where we recovered sequence data. On classifying the introduction events by origin (“non-household” events - those from populations that are not part of the household study - and “household” events - those from recruited households) we found that most introductions came from non-household populations (75.2% vs. 24.8%; Fig. 4A). Overall, we estimated that a single introduction occurred for 33 households (54%), two introductions for 15 households (25%), three introductions for six households (10%), four introductions for three households (5%), and five introductions for four households (7%) (Fig. 4B).
Discussion
We provide evidence of frequent multiple SARS-CoV-2 introductions to rural coastal Kenyan households, a finding that was unexpected. The conventional view has been that households with concurrently infected members acquired the infection from one index case. This assumption has repeatedly supported by a number of genomic studies, for instance, by a Dutch study following 85 households, where phylogenetic analysis showed a single introduction into all study households 4. However in this study, only about half of infected households (54%) had a single introduction.
A variety of factors may explain the differences in household virus introduction patterns in our study from previous observations. First, in our setting, multiple families may live in one compound and eat together in one kitchen. Second, the larger household size increases the chances of multiple viruses being introduced, especially at the height of epidemic waves. Third, the dominance of informal jobs in this setting where effective contacts outside household set-ups might give as much chance to infection transmission as within household.
Our study followed-up participants for a period of up to 1 month with serial sample collection and recovered genomes were analysed in the context of contemporaneous locally circulating diversity in coastal Kenya 32. Despite observing minimal nucleotide variation between samples from members of the same household infection clusters, when we incorporated sampling dates through the ASR analysis, we were able to partially reconstruct potential within HH transmission events. This allowed identification of virus multiple introductions into the households of closely related viruses, observation of within household transmission and detection of potential short-interval reinfections.
Few studies have examined SARS-CoV-2 households transmission dynamics within Africa 33-35, and these have resulted in diverse findings. In rural Egypt, a 6-month study reported a SAR of 89.8% 33, in South Africa a 13-month study reported a 25% infection rate among vulnerable household contacts 34, in Madagascar, a SAR of 38.8% (CI:19.5-57.2) 36 was reported. None of these studies included genome analysis to confirm that the inferred household transmission clusters were epidemiologically linked within the household.
The Kenyan government countermeasures in place during the study period may have had an impact on the way SARS-COV-2 spread within the study households. In June 2020, the Kenyan government announced guidelines for home-based care for asymptomatic or mildly symptomatic patients without co-morbidities. Kenya started immunizing its population in March 2021, but the coverage was low (<10%) during the study period, and it is unlikely that it affected transmission during our study. The stringency index in the country during the study period fluctuated from 35% to 75%. However, we did not detect variation in the pattern of introductions over time, which could suggest that the various restrictions had minimal impact at the household level. However, concluding on this aspect would likely require more advanced investigations.
This study presents some limitations. First, our sampling interval, especially after week two, may have missed persons who had been positive for less than the 7 days sample collection interval. High density sampling has previously been associated with a higher attack rate4. Second, several positive NP/OP samples (44.5%) failed to sequence or had large gaps due to PCR amplicon drop-offs. With this data missingness, overall phylogenetic signal was reduced in trying to establish who infected whom or directionality of transmission. Third, we cannot rule out that a few of the sequence changes could be sequencing or assembly artifacts. Forth, the case-ascertained study design we used had the drawback that by the time of the first sample collection, multiple positive cases had already occurred in households. Most of the index cases were recruited following presentation to a health facility with ARI. This complicated our effort of fully working out who infected whom back in the household. To overcome this challenge, future studies should observe members before entry of the virus into households and genomic data co-analyzed with other relevant epidemiological data 37. Fifth, intra-patient minority variants may also be examined to provide insights of potential transmission linkages through examination of shared intra-host variation with caveats38.
In conclusion, our study highlights the importance of examining genomic data for accurate estimation and interpretation of SARS-CoV-2 household epidemiological parameters in these settings. We identified unusually high number of independent virus introductions into households in coastal Kenya during clustered infections. Our findings suggests that control of SARS-CoV-2 transmission by household member isolation alone may not stop community transmission in this setting.
Data Availability
The consensus genome sequences obtained in this study that passed our quality control filters have been submitted to GISAID database (accession numbers available in appendix pages of the supplementary material). The code for the analyses presented in this manuscript is available from the corresponding author upon request. For more detailed information beyond the metadata used in the paper, there is a process of managed access requiring submission of a request form for consideration by our Data Governance Committee (http://kemri-wellcome.org/about-us/#ChildVerticalTab_15).
Author’s contributions
The project was conceived and designed by CNA, KEG, JUN, MC, and DJN; Laboratory processing of specimens was conducted by JUN, LIO, NM, AWL, KSM, LN, JMM, MWM, EMO, and TOM; Management and analysis of data were handled by AWL, CNA, NM, SD and GG. CNA wrote the first draft; MP, MC, LIO, SD, PB, and DJN critically reviewed the manuscript to produce the final draft.
Data availability
The consensus genome sequences obtained in this study that passed our quality control filters have been submitted to GISAID database (accession numbers available in appendix pages of the supplementary material). The code for the analyses presented in this manuscript is available from the corresponding author upon request. For more detailed information beyond the metadata used in the paper, there is a process of managed access requiring submission of a request form for consideration by our Data Governance Committee (http://kemri-wellcome.org/about-us/#ChildVerticalTab_15).
List of members of the COVID-19 Testing Team at KWTRP
Agnes Mutiso, Alfred Mwanzu, Angela Karani, Bonface M. Gichuki, Boniface Kaaria, Brian Bartilol, Brian Tawa, Calleb Odundo, Caroline Ngetsa, Clement Lewa, Daisy Mugo, David Amadi, David Ireri, Debra Riako, Domtila Kimani, Edwin Machanja, Elijah Gicheru, Elisha Omer, Faith Gambo, Horace Gumba, Isaac Musungu, James Chemweno, Janet Thoya, Jedida Mwacharo, John Gitonga, Johnstone Makale, Justine Getonto, Kelly Ominde, Kelvias Keter, Lydia Nyamako, Margaret Nunah, Martin Mutunga, Metrine Tendwa, Moses Mosobo, Nelson Ouma, Nicole Achieng, Patience Kiyuka, Perpetual Wanjiku, Peter Mwaura, Rita Warui, Robinson Cheruiyot, Salim Mwarumba, Shaban Mwangi, Shadrack Mutua, Susan Njuguna, Victor Osoti, Wesley Cheruiyot, Wilfred Nyamu, Wilson Gumbi and Yiakon Sein.
Acknowledgements
We thank (a) the members of the Kilifi County rapid response team who worked with our field study team in collecting the samples analysed here; (b) the members of the COVID-19 KWTRP Testing Team who undertook real-time RT-PCR processing of the samples received at KWTRP to identify positives (see full list of members below).
This work was supported by the National Institute for Health and Care Research (NIHR) (project reference 17/63/82) using UK aid from the UK Government to support global health research, The UK Foreign, Commonwealth and Development Office and Wellcome Trust (grant# 220985). Members of COVID-19 Testing Team at KWTRP are supported by multiple funding sources including UNITAD (BOHEMIA study received by Dr Marta Maia funded UNITAID), EDCTP (Senior Fellowship and Research and Innovation Action (RIA) grants received by Dr Francis Ndungu), GAVI (PCIVS grant received by Prof. Anthony Scott). Dr Simon Dellicour acknowledges support from the Fonds National de la Recherche Scientifique (F.R.S.-FNRS, Belgium; grant n°F.4515.22), from the Research Foundation - Flanders (Fonds voor Wetenschappelijk Onderzoek-Vlaanderen, FWO, Belgium; grant n°G098321N), and from the European Union Horizon 2020 project MOOD (grant agreement n°874850).The views expressed in this publication are those of the author (s) and not necessarily those of NIHR, the Department of Health and Social Care, Foreign Commonwealth and Development Office, Wellcome Trust or the UK government.