Data Availability
Data and Code Availability 76,553 FASTA genomes and associated sequencing metadata were downloaded from GISAID database from January 1, 2019 until August 3, 2020, specifying human as source host (https://www.gisaid.org/). The associated sequencing metadata including major variants per sample are available at Supplementary Table 1. Aggregated variants in VCF format for the latter genomes and associated consequence predictions are available here: https://usegalaxy.org/u/carlosfarkas/h/sars-cov-2-variants-gisaid-august-03-2020. 974 Brazilian FASTA sequences were downloaded from GISAID database from January 1, 2019 until September 25, 2020, specifying human as source host and South America / Brazil as location. These FASTA sequences and associated aggregated variants are available here: https://usegalaxy.org/u/carlosfarkas/h/brazil-genome-sequences-from-gisaid-sept25-2020. FASTA sequences from GISAID genomes containing associated metadata until September 28, 2020, including the results from snpFreq program (containing Deceased-Released SNP associations) are available here: https://usegalaxy.org/u/carlosfarkas/h/gisaid-patient-metadata-sept28-2020. Acknowledgements to all laboratories/consortia involved in the generation of GISAID genomes used in this study are listed in Supplementary Table 2. 17,560 sequencing datasets were downloaded from Sequence Read Archive Repository (SRA, https://www.ncbi.nlm.nih.gov/sars-cov-2/) From December 1, 2019 until July 28, 2020. Associated sequencing run accessions, sequencing metadata and related BioProjects are listed in Supplementary Table 3. The code generated during this study to replicate most of the computational calculations performed in this manuscript is available at the following github repository: https://github.com/cfarkas/SARS-CoV-2-freebayes.