Abstract
Background Approximately half of patients discharged following COVID-19 related hospitalisation are reported to suffer from persisting respiratory symptoms. We assess the prevalence of long term radiological and functional pulmonary sequelae in survivors from COVID-19 and other viral pneumonia in published literature.
Methods We performed systematic review and meta-analysis of all original studies in adults admitted to hospital with SARS-CoV-2, SARS-CoV, MERS-CoV, or Influenza pneumonia and followed within 12 months from discharge. Searches were run on MEDLINE and Embase, with the last update on 1st March 2021. Primary outcomes were presence of 1) radiologic sequelae at CT scans; 2) restrictive impairment; 3) reduced diffusing capacity for carbon monoxide (DLCO). This review is registered on PROSPERO, CRD42020183139.
Results Sixty studies were included for qualitative synthesis, of which 41 were suitable for meta-analysis. On follow up CT scans, the overall estimated proportion was 0·56 (95%CI 0·44 to 0·66, I2= 94·44%) for inflammatory changes, and 0·40 (95%CI 0·29 to 0·52, I2=95·19%) for fibrotic findings. In SARS-CoV-2 specifically, proportions were estimated at 0·43 (95%CI 0·32 to 0·56, I2=94.60%) and 0·30 (95%CI 0·19 to 0·43, I2=94.89%) for inflammatory and fibrotic findings, respectively. Overall proportion for restrictive impairment was 0·19 (95%CI 0·12 to 0·27, I2=94·46%), DLCO reduction was estimated at 0·45 (95%CI 0·38 to 0·52, I2=90·10). Elevated radiological and functional estimates persisted across follow-up times. Confidence in the estimates was deemed very low as studies were largely observational without control groups, heterogeneity in estimates was high but was not clearly attributable to between-study differences of severity or design.
Conclusion Although estimates of prevalence are likely limited by differences in case mix and initial severity, a substantial proportion of radiological and functional sequelae are observed following viral pneumonitis, including COVID-19. This highlights the importance of vigilant radiological and functional follow up.
Funding National Institute for Health Research (NIHR)
Introduction
COVID-19, the disease caused by SARS-CoV-2, was declared a global pandemic by the World Health Organization (WHO) on 11th March 2020,1 since then over 118 million individuals have been infected with approximately 2.6 million deaths (March 2021).2 Whilst there has been substantial mortality, the vast majority of people have survived the acute infection with many people experiencing long term symptoms, so-called Long COVID.3,4 Emerging data suggest that approximately half of COVID-19 survivors experience ongoing breathlessness in the months following infection.3,5,6 Chronic breathlessness can suggest the development of pulmonary fibrosis, a potentially life limiting disease with substantial morbidity. Fibrotic lung disease has been reported follow previous coronavirus infections such as Severe Acute Respiratory Syndrome, caused by SARS-CoV in 2002, and the Middle East respiratory syndrome, caused by MERS-CoV in 2012.7-9 Similarly, Influenza viruses, including those leading to epidemics such as H1N1 in 1918 and 2009, H2N2 in 1957, H3N2 in 1968, H5N1 in 2005, and H7N9 in 2013, have also been proposed to promote the development of pulmonary fibrosis although systematic evidence is lacking.10,11
Pulmonary fibrosis is characterised by the development of fibrotic tissue in the alveolar parenchyma and can occur after lung injury. In some cases, such as Acute Respiratory Distress Syndrome (ARDS), the insult may be clear and resulting fibrosis does not progress after the acute phase, whilst in other cases the trigger is less apparent but the fibrosis is progressive. Although viral agents are considered important insults with scientific rationale to implicate their role in disease pathogenesis, empirical evidence that suggests they can promote progressive pulmonary fibrosis is limited.12,13
Given the exceptional rate of COVID-19 spread and the longer-term impact on survivors’ quality of life, particularly breathlessness, we undertook a systematic review and meta-analysis to assess the prevalence and characteristics of radiological and functional sequelae following viral pneumonia. We focus on respiratory viruses that have been associated with severe viral pneumonitis during previous epidemics.
Methods
Search strategy and selection criteria
The systematic review and meta-analysis were conducted in accordance with a protocol registered with the International Prospective Register of Systematic Reviews (PROSPERO) on 30th April 2020 (registration number CRD42020183139). The review has been reported following PRISMA guidelines14.
All original studies and research letters reporting outcomes in hospitalized adult patients (aged >18) with presumed or confirmed viral infection by SARS-CoV-2, SARS-CoV, MERS-CoV or Influenza viruses were considered eligible for inclusion. No language criteria were applied. Pre-prints, commentaries, expert opinions, editorials, conference abstracts, and non-original studies were excluded.
The pre-specified primary outcomes within 12 months of hospitalisation were: 1) presence of radiologic sequelae at follow-up CT scans; 2) presence of restrictive lung function impairment; 3) presence of reduced diffusing capacity for carbon monoxide (DLCO). Radiological sequelae were defined as inflammatory (ground-glass opacification, consolidation) or fibrotic (reticulation, lung architectural distortion, interlobular septal thickening, traction bronchiectasis, honeycombing). Restrictive lung impairment was defined as a total lung capacity (TLC) <80% predicted value or forced vital capacity (FVC) < 80% predicted value with normal-to-high FEV1/FVC ratio. Reduced DLCO was defined as percent predicted DLCO < 80%.
Studies were identified by searching MEDLINE (1946 to latest), Embase (1974 to latest), and Google Scholar. In addition, hand searches were conducted of the reference lists of eligible primary studies, and relevant review articles. Searches were last updated on 1st of March 2021. Searches were carried out using patient-related, treatment-related, and outcomes-related terms (Supplementary Figure 1). Two reviewers (LF, FK) screened the records by titles and abstracts, followed by full-text review. Disagreements between reviewers were resolved by consensus, with unsolved conflicts determined by a third reviewer (IS). Non-English language records were screened by appropriate native speakers (WC).
Data analysis
Data from the selected articles were extracted using a pre-defined proforma independently by reviewers and mutually confirmed (LF, SM, WC). Case reports and case series with fewer than ten cases were excluded from quantitative synthesis owing to the inherent risk of selection bias. Extracted data included study design, viral agent and methods of diagnosis, participant demographics (age, gender, smoking status), ventilatory requirements and CT and lung function findings at both baseline and follow-up. Baseline investigations were defined as those performed during hospitalization, and follow-up as obtained after discharge; baseline data were only extracted where studies reported follow-up. If more than one follow-up visit was reported, the latest examination within 12 months from discharge was extracted. For quantitative synthesis, follow up visits performed within 4 weeks of discharge were categorised as 1 month, subsequent timepoints within three months were categorised as 3 months, similarly for 6 and 12 months.
All selected studies were included in the narrative or qualitative synthesis, with summary tables for study characteristics. Where available, CT findings and pulmonary function tests (PFTs) were extracted or calculated for quantitative synthesis. Where data were not reported in the text, we contacted corresponding authors and estimated values from figures (Plot Digitizer; Free Software Foundation). Absolute values of the number of people meeting outcome criteria and number of people with exam results available were extracted as numerator and denominator, respectively. Meta-analyses of proportions were performed where sufficient studies reported data, studies were excluded where descriptive proportions could not be extracted. Analyses were performed exclusively on observational, descriptive data, and no separation according to prospective study design was applied. Separate analyses were performed according to the type of radiological (inflammatory, fibrotic) or physiological (restrictive impairment, reduced DLCO) change and subsequently by viral agent (SARS-CoV-2, SARS-CoV, MERS-CoV, Influenza), with summary estimates also provided by follow up time (1 month, 3 months, 6 months, 12 months). Quantitative synthesis and random effect meta-analysis were performed in Stata SE16 (TX: StataCorp LLC) using the metaprop command, which computes 95% confidence intervals based on binomial distribution and applies the Freeman-Tukey double arcsine transformation to support inclusion of observations of 0% and 100%.15 Heterogeneity was assessed with I2, we report all estimates regardless of statistical heterogeneity. Where study numbers were low in subgroups of three or less, summary estimates and heterogeneity could not be computed. Significant heterogeneity between subgroup estimates was defined as a p-value ≤0.05.
The risk of bias in individual studies was assessed by two authors independently using appropriate assessment tools available from the CLARITY Group at McMaster University,16 through criteria specific for study design. As none of the included studies had an unexposed control group, tools were adapted. For cohort studies we assessed exposure, the outcomes of interest, prognostic factors, interventions, adequacy of follow-up, and co-interventions. Randomised controlled trials were included if they reported our pre-specified outcomes and were evaluated for adequacy of follow up, selective reporting, and other possible causes of risks of bias. Any disagreements were resolved by consensus, or by decision of a third reviewer if necessary. All studies were included, regardless of their risk of bias score.
The quality of the evidence was evaluated using the GRADE guidance.17 Retrospective observational studies were considered weak but could be upgraded, whilst randomised controlled trials were deemed to be strong and could be downgraded. Analytical and publication risk of bias, inconsistency, indirectness, and imprecision were assessed. An overall judgement of ‘high’, ‘moderate’, ‘low’, or ‘very low’ was provided for the quality of the cumulative evidence for review outcomes.
Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Results
A total of 5549 records were identified from databases and bibliography searches. After title and abstract screening, 88 unique full-text manuscripts were assessed for eligibility, and 60 were included for qualitative synthesis (54 in English, 6 in Chinese). A total of 41 studies were included in the quantitative synthesis (Figure 1). Among the manuscripts included, 25 reported infections by SARS-CoV-2;5,18-41 18 by SARS-CoV;8,9,42-57 1 by MERS-CoV;58 16 by Influenza (11 subtype H1N1, 1 subtype H5N1, 1 subtype H3N2, 2 subtype H7N9 and 1 study both H1N1 and H7N9).59-74 All studies were observational in design and included case reports and case series, with the exception of a single randomised control trial.55 Individual studies’ characteristics are presented in Table 1.
Risk of bias assessment identified a number of limitations and possible causes of biases. Albeit the majority of the studies listed the diagnostic tests used to assess the viral infections, 15 did not specify whether any serological or molecular testing was performed, referring to national guidelines at the time the study was conducted.8,21,23,34,35,38,42-45,50,54,55,60,65 Inclusion and exclusion criteria differed among studies, indicating that the severity of patients enrolled and possible interventions administered were inconsistent, which represent a possible selection bias. Few studies reported the presence of previous respiratory diseases or mechanical ventilation as exclusion criteria;21,36 others were restricted to include only symptomatic patients or perform follow-up CT where there was a clinical indication, such as abnormalities on the chest X-Ray (CXR) or if the DLCO was reduced.9,22,31,47 In each study, we can be confident in the assessment of the proportion of people with the outcomes of interest (Supplementary Tables 1-2, Supplementary Figure 2).
A total of 43 studies described thoracic CT finding, and 28 were included in meta-analysis of radiological outcome. Causes of exclusion are listed in Figure 1. The median follow-up time was 3 months (range 1-12). Within 12 months from discharge, the overall estimated proportion of chest CT inflammatory findings was 0·56 (95% CI 0·45 to 0·66. I2= 94·44%) on a total of 1727 CT scans, whilst fibrotic findings had an estimated prevalence of 0·40 (95% CI 0·29 to 0·52. I2=95·08%) assessed on 1625 exams. Severe heterogeneity was observed overall, within summary estimates by viral agent, and between viral agents (Figure 2). When stratified by viral agent, the proportion of patients with inflammatory sequelae was 0·43 (95% CI 0·32 to 0·56. I2=94·60%), 0·81 (95% CI 0·58 to 0·97. I2=91·84%), and 0·61 (95% CI 0·27 to 0·90. I2=93·29%) following SARS-CoV2, SARS-CoV and Influenza infections, respectively. Estimates of fibrotic sequelae were 0·30 (95% CI 0·19 to 0·43. I2=94·89%), 0·66 (95% CI 0·43 to 0·86. I2=92·83%), and 0·27 (95% CI 0·15 to 0·40. I2= 57·06%, p=0·07) following SARS-CoV2, SARS-CoV and Influenza infections, respectively (Figure 2).
A further subgroup analysis according to reported follow up time was performed for SARS-CoV-2 specifically (Figure 3). Inflammatory changes were observed in 0·93 of patients at baseline (95% CI 0·87 to 0·97. I2=90·92%), falling to 0·54 (95% CI 0·45 to 0·63. I2=51·68% p=0·10) within one month of discharge, 0·35 (95% CI 0·15 to 0·58) and 0·49 (95% CI 0·28 to 0.70) at three months and six months, respectively. Significant heterogeneity was observed between subgroup estimates of follow up time (p<0·0001). In contrast, fibrotic estimates were 0·27 (95% CI 0·07 to 0·53. I2=98·17%) at baseline, with similar proportions observed at one, three and six months of follow up and no significant heterogeneity between subgroup estimates (p=0·536). In follow up time subgroup analysis of other viral agents, a similar temporal evolution according to radiological subtype was observed in a limited number of Influenza studies, whilst high proportions of both inflammatory and fibrotic sequelae were reported across follow up times in the majority of SARS-CoV studies (Supplementary Figure 3).
Lung function sequelae were described in a total of 45 papers, with 25 reaching criteria for inclusion in quantitative synthesis, and a total sample of 2202 for restrictive impairment and 2185 for DLCO reduction. Follow-up lung function tests were performed at a median of 3 months after discharge (range 1-12). The overall estimated proportion of individuals with restrictive impairment during follow-up was 0·19 (95% CI 0·12 to 0·27. I2=94·46%), which was similar across viral agent subgroup with estimated proportion for SARS-CoV-2 at 0·21 (95% CI 0·12 to 0·33. I2=95·59%), 0·15 (95% CI 0·07 to 0·24. I2=88·5%) for SARS-CoV, and 0·15 (95% CI 0·04 to 0·42) for Influenza (Figure 4). Heterogeneity between viral agent subgroup was not significant (p=0·206). The overall proportion of individuals with a reduction in DLCO during follow-up was estimated at 0·45 (95% CI 0·38 to 0·52. I2=90·10) with the SARS-CoV-2 estimate at 0·45 (95% CI 0·34 to 0·56. I2=92·45%), similar for SARS-CoV at 0·41 (95% CI 0·30 to 0·52. I2=86·8%) and higher in influenza (0·58, 95% CI 0·26 to 0·87), although the heterogeneity between viral agent subgroups was not significant (p=0·403). One study reported lung function data in MERS-CoV follow-up, which showed similar proportions to other viral agents. Heterogeneity in the overall estimate and within subgroup summary estimates was high for both restrictive impairment and reduced DLCO.
In subgroup analysis by follow up time, the majority of restrictive impairment observed within one and three months had resolved by 12 months (0·07 95% CI 0·04 to 0·11) with significant heterogeneity observed between subgroup estimates of follow up time (p=0.008). Similarly, the proportion of patients with a reduced DLCO appeared to reduce over follow up time with significant heterogeneity between subgroups (p=0.017), but still remained elevated at twelve months (0·39 95% CI 0·24 to 0·55. I2=89·88%) (Figure 5).
As subgroup estimates of viral agent and timing of follow up showed high levels of heterogeneity we additionally explored whether patient severity or study design features (where reported) were significant sources (Supplementary Figures 4-11). In overall analysis of all viral agents, we observed no significant between-group heterogeneity according to prospective study design for any outcome tested. Inclusion of ventilated patients at study-level suggested larger estimates in more severe cohorts but significant heterogeneity was only observed for estimates of restrictive impairment (p=0.006), with similar findings where people with ARDS were included but heterogeneity between groups did not reach significance. In SARS-CoV-2 studies, estimates were frequently larger in prospective study designs compared to retrospective, with significant heterogeneity between groups for estimates of reduced DLCO (p=0.003). Estimates of outcomes were frequently larger in SARS-CoV-2 studies reporting on ventilated patients or people with ARDS, with significant between group heterogeneity observed for fibrotic findings (p=0.034 and p<0.001, respectively). Similarly, estimates of proportion in SARS-CoV-2 were frequently larger in cohorts with a median age of 50 or over, or over 50% male. Limited study numbers precluded further stratification beyond viral agent and follow up time.
Based on the GRADE framework, we report the confidence in estimates as very low for all outcomes. All studies included in the quantitative synthesis had an observational design and moderate risk of bias as possible confounding factors were not extensively assessed and could not be modelled in estimates of proportion. Inconsistency between studies was considered high due to the considerable statistical heterogeneity (I2>90%) and wide confidence intervals of the summary estimates. No causes of indirectness were detected since all study subjects had confirmed viral pneumonia, although severity and eligibility criteria were inconsistent. We judged the risk of imprecision as moderate, due to the possible influence of sample size on proportion. Risk of publication bias was evaluated based on timing of publication from the epidemic outbreak, and was deemed moderate as most papers were published within 12 months from their outbreak, which may influence research integrity and reproducibility, while 13 manuscripts were published after 12 months or more (Supplementary Table 3).
Discussion
To our knowledge this is the first systematic review and meta-analysis investigating the prevalence of radiological and functional consequences post-hospitalisation for viral pneumonitis, particularly for that caused by SARS-CoV-2, SARS-CoV, MERS-CoV, and Influenza. Heterogeneity in summary estimates was frequently considerable and therefore results should be interpreted with caution. Nevertheless, this study demonstrates that a considerable proportion of patients discharged following viral pneumonitis, who received a follow up CT scan or underwent lung function tests, had evidence consistent with lung parenchymal abnormalities in all viral agent subgroups. Furthermore, radiological and functional characteristics of pulmonary fibrosis remained elevated over increasing follow-up time. We demonstrate that parenchymal lung damage by viral insult may be common and has the potential to explain a substantial proportion of Long COVID related breathlessness.
A high proportion of inflammatory findings such as ground glass opacities and consolidation were observed at baseline in COVID-19 and Influenza, consistent with the radiological signs commonly described in literature for viral pneumonitis,75,76 however these inflammatory consequences of viral infections tended to reduce over the course of follow-up. Although features of pulmonary fibrosis were present less frequently, observed in approximately 20% of tested individuals with COVID-19 or influenza, fibrotic sequelae were still observed in a similar proportion of people across follow up times, suggesting that the pulmonary fibrosis associated with viral pneumonitis does not resolve substantially in the first year following infection. Moreover, radiological and functional sequelae have also been described up to five years after Influenza infections,10,71,77 and up to fifteen years after SARS-CoV.8,78,79
The proportion of functional tests performed indicated approximately one quarter of participants had restrictive impairment in the acute phase of recovery, however considerably more had impaired DLCO regardless of viral aetiology. Subsequent time points demonstrated improved restrictive impairment, although the proportion of tests reporting a reduced DLCO remained high, affecting approximately 40% of patients 12 months following infection. In individuals with SARS-CoV-2, it has been shown that TLC and DLCO impairment are significantly different between categories of infection severity,21,23,24,34,80 with similar findings reported in SARS-CoV.46,52
There are a number of limitations associated with this review. As our search strategy focused on follow-up, the number of included articles that reported baseline findings was limited, particularly regarding SARS-CoV infection, whilst contemporary SARS-CoV-2 papers had limited follow-up length. Furthermore, estimates of proportion are based on the number of tests performed, and not patients infected, which could be affected by selection bias. Similarly, estimates represent people hospitalised with infection, which may not reflect prevalence in non-hospitalised cases. Caution is required in interpreting overall and summary estimates as heterogeneity in estimated proportions was frequently considerable, which was not completely attributable to the study-level features evaluated. It is likely that variability in case mix and severity within studies contributes to the heterogeneity between them, which may be addressed by individual patient data approaches. We defined radiological sequelae extracted from studies regarded to be attributable to inflammatory and fibrotic responses, however these were not always reported specifically or exclusively, and there are limitations of our predefined radiological outcomes. Ground-glass opacities do not necessarily reflect inflammation and could also reflect retractile fibrosis during follow up. Unfortunately, it is not possible to discriminate between the two without histopathology. Internationally standardised approaches to reporting of post-COVID radiological change would support patient management and epidemiological study.
We have demonstrated the presence of substantial radiological and functional sequelae following viral pneumonias in the published literature, comprising over 1600 CT scans and over 2000 lung function tests. These parenchymal sequelae of viral infection are likely to have a considerable clinical impact given the large numbers of people discharged from hospital with COVID-19. Whilst the certainty of the presented estimates is very low, they justify vigilant radiological and functional follow up of individuals hospitalised with viral pneumonia.
Data Availability
Data are publicly available in published manuscript, and can be shared upon reasonable request
Footnotes
PROSPERO registration number: CRD42020183139