RT Journal Article SR Electronic T1 COVID-19 Open Source Data Sets: A Comprehensive Survey JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2020.05.19.20107532 DO 10.1101/2020.05.19.20107532 A1 Shuja, Junaid A1 Alanazi, Eisa A1 Alasmary, Waleed A1 Alashaikh, Abdulaziz YR 2020 UL http://medrxiv.org/content/early/2020/07/13/2020.05.19.20107532.abstract AB In December 2019, a novel virus named COVID-19 emerged in the city of Wuhan, China. In early 2020, the COVID-19 virus spread in all continents of the world except Antarctica causing widespread infections and deaths due to its contagious characteristics and no medically proven treatment. The COVID-19 pandemic has been termed as the most consequential global crisis after the World Wars. The first line of defense against the COVID-19 spread are the non-pharmaceutical measures like social distancing and personal hygiene. The great pandemic affecting billions of lives economically and socially has motivated the scientific community to come up with solutions based on computer-aided digital technologies for diagnosis, prevention, and estimation of COVID-19. Some of these efforts focus on statistical and Artificial Intelligence-based analysis of the available data concerning COVID-19. All of these scientific efforts necessitate that the data brought to service for the analysis should be open source to promote the extension, validation, and collaboration of the work in the fight against the global pandemic. Our survey is motivated by the open source efforts that can be mainly categorized as (a) COVID-19 diagnosis from CT scans, X-ray images, and cough sounds, (b) COVID-19 case reporting, transmission estimation, and prognosis from epidemiological, demographic, and mobility data, (c) COVID-19 emotional and sentiment analysis from social media, and (d) knowledge-based discovery and semantic analysis from the collection of scholarly articles covering COVID-19. We survey and compare research works in these directions that are accompanied by open source data and code. Future research directions for data-driven COVID-19 research are also debated. We hope that the article will provide the scientific community with an initiative to start open source extensible and transparent research in the collective fight against the COVID-19 pandemic.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the Research and Development Office, Ministry of Education, Saudi Arabia.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Center of Innovation and Development in Artificial Intelligence Umm Al-Qura University Makkah, Saudi ArabiaAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesWe provide direct downloadable links for most of the datasets discussed in the manuscript