PT - JOURNAL ARTICLE AU - Cushnan, Dominic AU - Bennett, Oscar AU - Berka, Rosalind AU - Bertolli, Ottavia AU - Chopra, Ashwin AU - Dorgham, Samie AU - Favaro, Alberto AU - Ganepola, Tara AU - Halling-Brown, Mark AU - Imreh, Gergely AU - Jacob, Joseph AU - Jefferson, Emily AU - Lemarchand, François AU - Schofield, Daniel AU - Wyatt, Jeremy C AU - NCCID Collaborative TI - An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis AID - 10.1101/2021.03.02.21252444 DP - 2021 Jan 01 TA - medRxiv PG - 2021.03.02.21252444 4099 - http://medrxiv.org/content/early/2021/03/04/2021.03.02.21252444.short 4100 - http://medrxiv.org/content/early/2021/03/04/2021.03.02.21252444.full AB - The National COVID-19 Chest Imaging Database (NCCID) is a centralised database containing chest X-rays, chest Computed Tomography (CT) scans and cardiac Magnetic Resonance Images (MRI) from patients across the UK, jointly established by NHSX, the British Society of Thoracic Imaging (BSTI), Royal Surrey NHS Foundation Trust (RSNFT) and Faculty. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and development of machine learning (ML) technologies that will improve care for patients hospitalised with a severe COVID-19 infection. The NCCID is now accumulating data from 20 NHS Trusts and Health Boards across England and Wales, with a total contribution of approximately 25,000 imaging studies in the training set (at time of writing) and is actively being used as a research tool by several organisations. This paper introduces the training dataset, including a snapshot analysis performed by NHSX covering: the completeness of clinical data, the availability of image data for the various use-cases (diagnosis, prognosis and longitudinal risk) and potential model confounders within the imaging data. The aim is to inform both existing and potential data users of the NCCID’s suitability for developing diagnostic/prognostic models. In addition, a cohort analysis was performed to measure the representativeness of the NCCID to the wider COVID-19 affected population. Three major aspects were included: geographic, demographic and temporal coverage, revealing good alignment in some categories, e.g., sex and identifying areas for improvements to data collection methods, particularly with respect to geographic coverage. All analyses and discussions are focused on the implications for building ML tools that will generalise well to the clinical use cases.Competing Interest StatementThe authors have declared no competing interest.Funding StatementJoseph Jacob was supported by a Wellcome Trust Clinical Research Career Development Fellowship (209553/Z/17/Z) and by the NIHR BRC at UCL.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Health Research Authority has given ethical approval for the National COVID-19 Chest Imaging Database (NCCID), and the analysis in this paper was reviewed by the NCCID Data Access Committee.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAccess to the dataset can be sought via an application to the National COVID-19 Chest Imaging Database (NCCID) Data Access Committee as described on the NCCID website linked. https://nhsx.github.io/covid-chest-imaging-database/