Abstract
With 15% of severe cases among hospitalized patients1, the SARS-COV-2 pandemic has put tremendous pressure on Intensive Care Units, and made the identification of early predictors of future severity a public health priority. We collected clinical and biological data, as well as CT scan images and radiology reports from 1,003 coronavirus-infected patients from two French hospitals. Radiologists’ manual CT annotations were also available. We first identified 11 clinical variables and 3 types of radiologist-reported features significantly associated with prognosis. Next, focusing on the CT images, we trained deep learning models to automatically segment the scans and reproduce radiologists’ annotations. We also built CT image-based deep learning models that predicted future severity better than models based on the radiologists’ scan reports. Finally, we showed that including CT scan features alongside the clinical and biological data yielded more accurate predictions than using clinical and biological data alone. These findings show that CT scans provide insightful early predictors of future severity.
Previous studies have demonstrated that risk factors for severe evolution include demographic variables such as age, comorbidities, and biological variables measured within 2 days of patient admission2–4. Beyond clinical and biological variables, computerized tomography (CT) scans are also potential sources of information: the degree of pulmonary inflammation is associated with clinical symptoms and severity5,6, and the extent of lung abnormality is predictive of severe disease evolution7,8. Here we evaluated to what extent visual or AI-based analysis of CT scans at patient admission added information about future severe disease evolution once clinical and biological data had been taken into account.
A total of 1,003 patients from Kremlin-Bicêtre (KB, Paris, France) and Gustave Roussy (IGR, Villejuif, France) were enrolled in the study. Clinical, biological, and CT scan images and reports were collected at hospital admission. Additionally, 292 CT scans were later annotated manually by radiologists (see supplementary materials). Summary statistics for the clinical, biological, and CT scan data are provided in Figure 1.
Coronavirus progression is evaluated by the World Health Organization on a 1 to 10 scale, severe scores of 5 or more corresponding to an oxygen flow rate of 15 L/min or higher, or the need for mechanical ventilation, or patient death9. We first evaluated how clinical and biological variables measured at admission were associated with future severe progression (score of 5 or more). These variables were available for 989 individuals, and we computed the severity odds ratios for each individual variable, and at each hospital center (Figure 1). When combining association results from the two centers, we found 11 variables significantly associated with severity (P <0.05/63 to account for testing 63 variables, Figure 1): age (Odds Ratio [OR] KB 1.66 (1.41–1.96), OR IGR 1.04 (0.50–2.15), OR IGR 1.32 (0.90–1.93), PStouffer = 5.75e-10), sex (OR KB 1.95 (1.41–2.69), OR IGR 1.04 (0.50–2.15), PStouffer = 6.10e-05), hypertension (OR KB 1.84 (1.35–2.51), OR IGR 1.09 (0.50–2.36), PStouffer = 1.15e-04), chronic kidney disease (OR KB 2.51 (1.62–3.69), OR IGR 16.29 (1.89–140.12), PStouffer = 6.66e-06), respiratory rate (OR KB 1.34 (1.13–1.59), OR IGR 3.37 (1.28–8.86), PStouffer = 2.10e-04), oxygen saturation (OR KB 0.38 (0.31–0.47), OR IGR 0.35 (0.20–0.63), PStouffer = 2.79e-21), diastolic pressure (OR KB 0.70 (0.53–0.83), OR IGR 0.76 (0.51–1.11), PStouffer = 1.35e-05), CRP (OR KB 1.47 (1.25–1.72), OR IGR 1.50 (1.04–2.16), PStouffer = 4.13e-07), LDH (OR KB 2.05 (1.65–2.54), OR IGR 2.53 (1.42–4.53), PStouffer = 4.38e-12), polynuclear neutrophil (OR KB 1.36 (1.13–1.60), OR IGR 1.15 (0.81–1.64), PStouffer = 1.25e-04), and urea (OR KB 1.70 (1.43–2.01), OR IGR 2.13 (1.33–2.42), PStouffer = 9.49e-11). This confirms the literature reported prognostic value of these 11 clinical and biological markers.2,4,10–14
We then assessed the predictive value of features from admission radiology reports, and found three significant features: (i) extent of disease (OR KB 2.37 (1.97–2.86), OR IGR 1.64 (1.12–2.38), PStouffer = 8.50e-21) and (ii) crazy paving ( OR KB 2.50 (1.82–3.44), OR IGR 2.28 (1.07–4.88), PStouffer = 3.10e-09), associated with greater severity, and (iii) peripheral topography, associated with lesser severity (OR KB 0.54 (0.39–0.74), OR IGR 0.61 (0.26–1.42), PStouffer = 9.47e-05). This confirms the reported negative impact of disease extent7,15,16. We hypothesize that peripheral topography has a positive impact on prognosis because peripheral lesions could be less extended.
We next trained a deep neural network called AI-segment (Supp Figure 1) to segment radiological patterns and provide automatic quantification18,19 of their volume, expressed as a percentage of the full lung volume. These patterns included the three distinguishable features that appear as disease severity progresses17: ground glass opacity or GGO, crazy paving, and finally consolidation. AI-segment was trained on 161 patients from KB and evaluated on 132 patients from IGR, of which 14 fully annotated, and 118 partially annotated. The mean absolute error in volume prediction for the fully annotated scans was 6.94% for GGO, 1.01% for consolidation, and 7.21% for sane lung (no crazy paving was present in these scans). On the larger cohort of partially annotated scans, the accuracy with respect to the radiologist score was 78% for GGO, 67% for crazy paving, and 74% for consolidation (for a 1% detection threshold on the AI-segment result, Supp Table 1). AI-segment also accurately quantified the disease extent (Supp Figure 3). AI-segment visual results were also consistent with radiologist observations (See Figure 2 for three representative cases). We lastly evaluated to what extent the AI-segment trained on CT scans provided finer information about future severity compared to radiologists’ scan reports. Using predicted volumes from AI-segment, we found that GGO (OR KB 1.8 (1.5–2.16), OR 1.7(1.18–2.43), PStouffer = 3.45e-11), crazy paving (OR KB 1.57 (1.26–1.97), OR IGR 1.38 (0.95–1.99), PStouffer = 7.27e-05) consolidation (OR KB 1.86 (1.53–2.25), OR IGR 1.87 (1.26–2.77), PStouffer = 1.43e-11) and extent of disease (OR KB 2.14 (1.77–2.6), OR IGR 1.87 (1.28–2.73), PStouffer = 3.13e-16) were all associated with severity (accounting for multiple testing). This confirms that automatic estimation of lesion volumes can add more precise measures of future severity to the radiologists’ scan reports (Supp Table 2)8.
We next evaluated the prognostic value of CT scans alone through three different models. The first model called report included variables from the radiological report only. The second was based on the automatic lesion volumes measured by AI-segment. The third called AI-severity used a weakly supervised approach with no radiologist-provided annotations (Supp Figure 2)20. All three models were trained on 646 KB patients, tested on 150 KB validation patients, and validated on the independent IGR dataset of 137 patients (Figure 3). On the validation set from KB hospital, report was outperformed by AI-severity but not by AI-segment (AUCAI-severity = 0.76, AUCAI-segment = 0.68, AUCreport = 0.72). On the independent IGR validation set, both AI-segment and AI-severity outperformed the model report (AUCAI-severity = 0.70, AUCAI-segment = 0.68, AUCreport = 0.66). Our follow up analyses revealed that the predictive performance of AI-severity was strong in part because the internal representation of the neural network captures clinical features from the lung CTs, such as age, on top of the known COVID-19 radiology features (see interpretability of AI-severity in Supp Material).
Lastly, we evaluated whether CT scans have prognostic value beyond what can be inferred from clinical and biological characteristics alone. We therefore compared the performance of trimodal CT scan / clinical / biological models to bimodal clinical / biological models. We compared model performances for three outcomes: our initial WHO-defined high severity outcome of “oxygen flow rate of 15 L/min or higher, or need for mechanical ventilation, or death”, as well as two other outcomes studied in the literature, “death or ICU admission”, and “death”. We built a trimodal version of report, AI-segment, and AI-severity, adding clinical and biological information to the original CT scan-based models by implementing a greedy search approach to include optimal variables (Supp Figure 4). All three trimodal models performed consistently better than the bimodal biological/clinical model (Figure 3 and Supp Table 3), whether it be trimodal report, AI-segment, or AI-severity ( mean AUC increase of 0.02–0.03). They also outperformed clinical/biological models from literature (Colombi at al model7 and MIT COVID analytics model). Of note, the fact that the models trained with patients from the KB hospital had good performances when evaluated on IGR hospital is evidence of their robustness, especially since these two hospitals receive patients with very different comorbidities (85% of cancer patients at IGR and 7% at KB). Taken together, these consistent results confirm the added prognostic value of CT scans. Importantly, while trimodal AI-severity generally outperformed trimodal report across all outcomes, and trimodal AI-segment sometimes outperformed report, the AUC difference was always modest (max increase of 0.03 for AI-severity vs report, and max increase of 0.02 for AI-segment vs report), showing that the incorporation of CT-scan analyses, no matter what the method, is the strongest performance booster. Therefore beyond AI modeling, our study shows that a composite scoring system integrating selected radiological measurements with key clinical and biological variables provides accurate predictions and can rapidly become a reference scoring approach for severity prediction.
Our retrospective study conducted on two French hospitals shows that future disease severity markers are present within routine CT scans performed at admission, and these can be identified and quantified via AI-based scoring, providing useful and interpretable elements for prognosis.
Data Availability
Data are not publicly available and are securely stored on a certified server.
Author Contributions
N.L., S.A., E.C.,P.H.,R.M.,N.L.,P.T., E.B.,M.S., A.S., F.C.,S.J., M.S., I.B., J.D.,JC.P., H.T.,E.P.,G.W., T.C., F.B.,MF.B.,M.B conceived the idea of this paper
N.L., S.A., E.C., H.G.,P.H., M.D., S.S., O.M., MP.T., JP.L.,R.M.,N.L.,P.T., E.B.,G.G, C.B.,S.J., F.G.,N.T.,Y.L., T.D., K.G., A.N., M.T., S.V., M.S., I.B., Y.B, E.P., M.A., J.D.,F.B., A.G.,J.D.,JC.P., H.T.,E.P.,G.W., T.C., F.B.,MF.B.,M.B participated to the acquisition and treatment of data
N.L., S.A., E.C.,P.H.,R.M.,N.L.,P.T., E.B.,S.J., M.S., P.J., I.B., J.D.,JC.P.,H.T.,E.P.,G.W., T.C., MF.B.,M.B.implemented the analysis
N.L., S.A., E.C.,P.H.,R.M.,N.L.,P.T., E.B.,S.J., M.S., I.B., J.D.,JC.P., H.T.,E.P.,G.W., T.C., MF.B.,M.B.contributed to the writing of the manuscript
Competing Interests statement
The authors declare the following competing interests:
Employment: Michael Blum, Paul Herent, Rémy Dubois, Nicolas Loiseau, Paul Trichelair, Etienne Bendjebbar, Simon Jégou, Meriem Sefta, Paul Jehanno, Fabien Brulport, Olivier Dehaene, Jean-Baptiste Schiratti, Kathryn Schutte, Elodie Pronier, Jocelyn Dachary, Adrian Gonzalez, employed by Owkin
Co-founders of Owkin Inc : Thomas Clozel, Gilles Wainrib.
Supplementary material
Description of the retrospective study
Data were collected at two French hospitals (Kremlin Bicêtre Hospital (KB), APHP, Paris, and Gustave Roussy Hospital (GR), Villejuif). CT scans, clinical, and biological data were collected in the first 2 days after hospital admission.
This study has received the approval of both hospitals ethic committees and we submit a declaration to the National Commission of Data Processing and Liberties (N° INDS MR5413020420, CNIL) in order to get registered in the medical studies database and respect the General Regulation on Data Protection (RGPD) requirements. Also an information letter was sent to all patients included in the study.
Inclusion criteria were (1) date of admission at hospital (from the 12th of February to the 20th of March at Kremin Bicêtre and from the 2nd of March to the 24th of April at Institut Gustave Roussy) and (2) a positive diagnosis of COVID-19. Patients were considered positive either because of a positive RT-PCR (real-time fluorescence polymerase chain reaction) based on nasal or lower respiratory tract specimens or a CT scan with a typical appearance of COVID-19 as defined by the ACR criteria for negative RT-PCR patients1. Children and pregnant women were excluded from the study.
The clinical and laboratory data were obtained from detailed medical records, cleaned and formatted retrospectively by 10 radiologists with 3 to 20 years of experience (5 radiologists at GR and 5 at KB). Data from the clinical examination include: sex, age, body weight and height, body mass index, heart rate, body temperature, oxygen saturation, blood pressure, respiratory rate, and a list of symptoms including cough, sputum, chest pain, muscle pain, abdominal pain or diarrhoea, and dyspnea. Health and medical history data include presence or absence of comorbidities (systemic hypertension, diabetes mellitus, asthma, heart disease, emphysema, immunodeficiency) and smoker status. Laboratory data include conjugated alanine, bilirubin, total bilirubin, creatine kinase, CRP, ferritin, haemoglobin, LDH, leucocytes, lymphocyte, monocyte, platelet, polynuclear neutrophil, and urea.
Chest Thoracic (CT) imaging
CT scan acquisition
Three different models of CT scanners were used : two General Electric CT scanners (Discovery CT750 HD and Optima 660 GE Medical Systems, Milwaukee, USA) and a Siemens CT scanner (Somatom Drive; Siemens Medical Solutions, Forchheim). All the patients were scanned in a supine position during breath-holding at full inspiration. The acquisition and reconstruction parameters were of 120kV tube voltage with automatic tube current modulation (100-350 mAs), 1mm slice thickness without interslice gap, using filtered-back-projection (FBP) reconstruction (SOMATOM Drive) or blended FBP/iterative reconstruction (Discovery or Optima) . Axial images with slice thickness of 1 mm were used for coronal and sagittal reconstructions.
The scans performed were independently examined by experienced radiologists using a standard workstation in the clinical image archiving and transmission system. All radiologists were informed of patients clinical status (suspicion of COVID-19, clinical signs of severity).
Definition of CT Features
COVID-19 associated CT imaging features identified by radiologists were defined following ACR recommendation1. The term parenchymal opacification is applied to any homogeneous increase in lung density on chest CT. When this parenchymal opacification is dense enough to obscure the vessels margins and airway walls and other parenchymal structures, it is called consolidation. Ground-glass attenuation is defined as an increase in lung density not sufficient to obscure vessels or preservation of bronchial and vascular margins crazy-paving pattern was defined as ground-glass opacification with associated interlobular septal thickening2.
For 959 patients, CT imaging characteristics were evaluated and the following findings were reported: ground glass opacity (rounded / non rounded / absent), consolidation (rounded / non rounded / absent) interlobular septal thickening or “crazy paving” (present / absent), subpleural line, lymph node enlargement, pleural effusion, and pericardial effusion, according to morphological descriptors based on recommendations of the Fleischner Nomenclature Committee2.
The results of the CT were examined in terms of location, distribution, size and type. The location refers to the different lobes and segments involved (lower or medium or upper). The distribution was described as peripheral (1/3 external of the lung), central (2/3 internal), or both central and peripheral.
The assessment of the size and extent of lung involvement was based on a visual classification of lung anatomy according to the evaluation criteria established by the French Society of Radiology (SFR)3. The size of the lesion was assessed; the volume of the lung affected absent / minimal (<10%) / moderate (10-25%) / extensive (25-50%) / severe (>50%) / critical >75%. The coding absent / minimal / moderate extensive / severe / critical was based on a quantitative variable with values of 0 / 1 / 2 / 3 / 4 / 5.
Automatic extraction from radiological report
Radiological features from radiological reports were automatically extracted using Optical Character Recognition and regular expression functions.
Annotation scenario of CT scans by radiologists in order to train the AI-Volumetry model
Two radiologists (4 and 9 years of experience) examined and annotated 292 anonymized chest scans independently and without access to the patient’s clinic or COVID-19 PCR results. All CT images were viewed with lung window parameters (width, 1500 HU; level, −550 HU) using the SPYD software developed by Owkin. Regions of interest were annotated by the radiologists in four distinct classes : healthy pulmonary parenchyma, ground glass opacity, consolidation, crazy-paving. One AI and imaging PhD student provided full 3D annotation of the four classes on 22 anonymized chest scans using the 3D Slicer software.
The presence of organomegaly was also notified when present, as a binary class. When multiple CT images were available for a single patient, the scan to analyze was selected using the SPYD software.
Machine learning models
Models for segmentation of CT scans (AI-segment)
In the proposed pipeline called AI- segment for lesion segmentation from CT scans, we deployed 3 segmentation networks: 3D Resnet504, 2.5D U-Net, and 2D U-Net4. These are three powerful convolutional neural networks that have achieved state of the art performance in numerous medical image segmentation tasks. U-Net consists of convolution, max pooling, ReLU activations, concatenation and up-sampling layers with sections: contraction, bottleneck, and expansion. ResNet contains convolutions, max pooling, batch normalization, and ReLU layers that are grouped in multiple bottleneck blocks.
All models were trained on CT scans provided by Kremlin-Bicĉtre (KB) and evaluated on annotated CT scans Institut Gustave Roussy (IGR). The dataset was divided into two categories: Fully Annotated Scans (FAS) composed of 22 scans (8 from KB and 14 from IGR) and Partially Annotated Scans (PAS) composed of 292 scans (153 from KB and 118 from IGR)
2D U-Net was trained for left/right lung segmentation while 3D ResNet and 2.5D U-Net were used for lesion segmentation. 3D ResNet50 was trained on 8 KB FAS. We used Stochastic Gradient Descent for parameter optimization and a learning rate starting of 0.1 with a decay factor of 0.1 every 20 epochs. The network was trained for a total of 100 epochs. As for 2.5D U-Net, Adam optimization algorithm was used with learning rate, weight decay, gradient clipping and learning rate decay parameters set respectively to 1e-3, 1e-8, 1e-1, and 0.1 (applied at epochs 90 and 150) for 300 epochs. While the validation set remains the same as 3D resnet50, 153 KB PAS scans were added to the 8 KB FAS, in the training set. PAS were only added to the 2.5D U-Net training set due to the incompleteness of the annotated volume (on average 16 slices are annotated per PAS) in the scans which would not satisfy the volumetric requirements of the 3D ResNet50 input. Finally, for the left/right lung segmentation, the 2DU-Net was trained on the 8 KB FAS. Similarly to 2.5D U-Net, Adam optimization algorithm was used with learning rate, weight decay, gradient clipping and learning rate decay parameters set respectively to 1e-3, 1e-8, 1e-1, and 0.1 at epoch 70 over 104 epochs. Both 2.5D U-Net and 2D U-Net use affine transformation and contrast change for data augmentation while 3D resnet50 uses affine transformation, contrast change, thin plate splines, and flipping. 3D ResNet and 2.5D U-Net are trained through the minimization of the cross entropy loss and 2D U-Net minimizes the binary cross entropy loss. All training was performed on NVIDIA Tesla V100 GPUs and Pytorch is the used framework. During the validation phase, ensemble inference6 is performed on all the available scans.
Models for severity classification based of CT scans (AI-severity)
The AI –severity model is defined as an ensemble of four sub-models, as illustrated in Supp Fig 2. Each of these sub-models is designed to predict the disease severity from CT scans. Since they do not require expert annotations at the slice level, these sub-models fall in the scope of weakly supervised learning. The preprocessing of the data consisted in resizing the CT scans to 10mm pixel spacing along the vertical axis and obtaining a segmentation of the lungs using a pre-trained U-Net algorithm7. Each sub-model is composed of two blocks: a deep neural network called feature extractor and a logistic regression. CT scans may contain biases such as catheters (EKG monitoring, oxygenation tubing…) that are easily detectable in a CT and can bias the prediction of severity (i.e. predict the presence of a technical device associated with severity instead of predicting the radiological features associated with severity). In order to ensure that these biases do not affect the features, the lung segmentation mask was applied before the features were extracted. As a result, only the lungs were visible to the feature extractor.
Two of the sub-models used an EfficientNet-B08 pre-trained on the ImageNet public database as feature extractor while the other two used a ResNet509 pre-trained with MoCo v210 on one million CT scan slices from both Deep Lesion11 and LIDC12. Each of these networks provide an embedding of the slices of the input CT scans into a lower-dimensional (1280 for EfficientNet-B0 and 2048 for ResNet50 with MoCo v2) feature space. A windowing used for selecting specific ranges of intensities was also applied on the CT scans before the features extraction. For the two sub-models based on the EfficientNet-B0, the image intensities were respectively clipped in the (−1000 HU, 200 HU) and (−1000 HU, 600 HU) range. For one of the remaining two sub-models (based on ResNet50 with MoCo v2), the (−1350 HU, 150 HU) range was used whereas for the last one, a combination of the following ranges was used: (−1000 HU, 0 HU), (0 HU, 1000 HU) and (−1000 HU, 4000 HU). Finally, for each of these sub-model, a Logistic Regression (with ridge penalty) was used to predict the disease severity from the averaged features. For the ResNet50-based sub-models, a Principal Component Analysis (PCA) with 40 components was used to reduce the dimensionality of the feature space before the Logistic Regression was applied. All the sub-models were equally weighted in the ensemble and the disease severity predictions of the AI-severity model were obtained by averaging the prediction of the models in the ensemble.
Interpretability of AI-severity
An interpretability study was conducted on AI-severity to get a better understanding of its performances. The correlation between the internal representation of the sub-models (i.e. the input of the logistic regression),radiological and clinical variables were analyzed. By replacing the output of the logistic regression by variables from the radiology reports, AUC on the KB validation set of 150 patients were 94.1% for disease extent (threshold > 2), 71.4% for crazy paving, 67.1% for condensation and 74.8% for GGO, showing that the feature extractors correctly captured part of the radiology signal. More interestingly, it was also possible to correlate internal representations with clinical variables such as age (AUC 85.1% with a threshold of 60 years old), sex (AUC 85.2%) or oxygen saturation (AUC 76.2%, threshold 90%). As a comparison, a logistic regression trained on the radiology report variables only gets respectively AUC scores of 70.0%, 59.9% and 67.8%. This gap shows that the AI-severity internal representations present within the neural network capture clinical information directly from CT scans.
Models for multimodal integration
The models used to predict the outcome from multiple modalities are logistic regressions, trained by cross validation with 5 folds on the training dataset of 646 patients from KB, stratified by age and outcome. Variables that were filled for less than 300 patients (conjugated bilirubin and alanine) were not used. For the remaining variables, missing values were simply replaced by the average over patients of the training set. L2 regularization was applied to the weights of the models. The regularization coefficient value was chosen by comparing the results obtained in cross validation with different values, ranging from 0.01 to 100. The value maximizing the average AUC over the 5 folds was selected. We use pandas and scikit-learn to manipulate data and perform machine learning algorithms13.
Selection of clinical and biological variables added to the models based on CT scan variables
Clinical and biological variables were selected through a forward feature selection technique (Supp Fig 4). At baseline (left of the figure), a model was trained in cross-validation using only a fixed set of variables. Three initial sets were considered here: radiologist report, AI Lungs and AI volumetry. The variables encoded in the radiologist report includes a presence/absence coding of Ground Glass opacity (GGO), rounded GGO, Crazy paving, Consolidation, Consolidation rounded, Topography peripheral, and Predominance inferior, as well as disease extent, which is a semi automatic assessment of the amount of lesions in the lung. The AI-Lung model includes the one variable output of the neural network model to predict severity and the AI volumetry model includes the automatic quantification of the ground glass, consolidation and crazy paving pattern, and the automatic quantification of disease extent. For comparison, the procedure was also performed starting from an empty set of variables (clinical only).
The added prognosis value of every clinical or biological variable was then assessed separately, by training a new model using this variable in addition to the previous set. The variable resulting in the largest AUC score was added to the selection. This procedure was repeated for 20 iterations. For every initial selection, performances of the models increased quickly at first (left part of Supp Fig 4), then reached a plateau (right half of the figure), indicating that the variables added after the tenth iteration did not significantly increase the predictive power of the models. Thus, for every case, only the ten best clinical and biological variables were selected.
Training and evaluation of models
To predict severity, models were trained on 646 patients from KB, which included the training set of AI-segment, and evaluated on two distinct evaluation sets, with 150 patients from KB and 137 patients from IGR. The prediction is performed using the logistic regression approach.
We evaluated models that predict severity using the Area Under the Curve (AUC) and differences between AUC values were tested using DeLong test14.
We evaluated the segmentation model AI-segment using mean absolute error that is defined as the average, over the available fully annotated CT scans in the validation set, of the absolute value of the difference between the ground truth percentage of each lesion type (deduced from annotations) and the estimated ones. We also evaluated the detection accuracy per lesion with respect to the reported radiologist scores, defined as the percentage of correctly predicted classes by AI-segment (GGO ; CP ; Consolidation) among the validation set. A given lesion type, in the AI-segment result, is considered as present when the estimated volumetry of the lesion type, averaged over both lungs, is above a certain threshold (here, we reported results for threshold 1% and 2%).
Benchmark models
We use the clinical and biological variables previously proposed in a multivariate risk score for severity, which is defined as admission to ICU or death, and we retrain a logistic regression model using these variables15. We also considered the MIT Covid Analytics calculator as a risk score for mortality (https://www.covidanalytics.io/mortality_calculator).
Supplementary Figures and Tables
Acknowledgements
We would like to thank J.-Y. Berthou, H. Berry, and Ph. Gesnouin from Inria and B. Schmauch, G. Rouzaud, and R. Patel from Owkin for their support.