RT Journal Article SR Electronic T1 An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.03.02.21252444 DO 10.1101/2021.03.02.21252444 A1 Cushnan, Dominic A1 Bennett, Oscar A1 Berka, Rosalind A1 Bertolli, Ottavia A1 Chopra, Ashwin A1 Dorgham, Samie A1 Favaro, Alberto A1 Ganepola, Tara A1 Halling-Brown, Mark A1 Imreh, Gergely A1 Jacob, Joseph A1 Jefferson, Emily A1 Lemarchand, François A1 Schofield, Daniel A1 Wyatt, Jeremy C A1 , YR 2021 UL http://medrxiv.org/content/early/2021/03/04/2021.03.02.21252444.abstract AB The National COVID-19 Chest Imaging Database (NCCID) is a centralised database containing chest X-rays, chest Computed Tomography (CT) scans and cardiac Magnetic Resonance Images (MRI) from patients across the UK, jointly established by NHSX, the British Society of Thoracic Imaging (BSTI), Royal Surrey NHS Foundation Trust (RSNFT) and Faculty. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and development of machine learning (ML) technologies that will improve care for patients hospitalised with a severe COVID-19 infection. The NCCID is now accumulating data from 20 NHS Trusts and Health Boards across England and Wales, with a total contribution of approximately 25,000 imaging studies in the training set (at time of writing) and is actively being used as a research tool by several organisations. This paper introduces the training dataset, including a snapshot analysis performed by NHSX covering: the completeness of clinical data, the availability of image data for the various use-cases (diagnosis, prognosis and longitudinal risk) and potential model confounders within the imaging data. The aim is to inform both existing and potential data users of the NCCID’s suitability for developing diagnostic/prognostic models. In addition, a cohort analysis was performed to measure the representativeness of the NCCID to the wider COVID-19 affected population. Three major aspects were included: geographic, demographic and temporal coverage, revealing good alignment in some categories, e.g., sex and identifying areas for improvements to data collection methods, particularly with respect to geographic coverage. All analyses and discussions are focused on the implications for building ML tools that will generalise well to the clinical use cases.Competing Interest StatementThe authors have declared no competing interest.Funding StatementJoseph Jacob was supported by a Wellcome Trust Clinical Research Career Development Fellowship (209553/Z/17/Z) and by the NIHR BRC at UCL.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Health Research Authority has given ethical approval for the National COVID-19 Chest Imaging Database (NCCID), and the analysis in this paper was reviewed by the NCCID Data Access Committee.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAccess to the dataset can be sought via an application to the National COVID-19 Chest Imaging Database (NCCID) Data Access Committee as described on the NCCID website linked. https://nhsx.github.io/covid-chest-imaging-database/