SIFT missense predictions for genomes

Vaser, Robert; Adusumalli, Swarnaseetha; Leng, Sim Ngak; Sikic, Mile; Ng, Pauline C

doi:10.1038/nprot.2015.123

Protocol Update
Published: 03 December 2015

SIFT missense predictions for genomes

Nature Protocols volume 11, pages 1–9 (2016)Cite this article

9471 Accesses
27 Altmetric
Metrics details

Subjects

Abstract

The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4G's precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from http://sift-dna.org/sift4g.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Comparison of the SIFT and SIFT 4G algorithms.**

**Figure 3: The SIFT 4G annotator graphical user interface.**

**Figure 4: Select the database for the desired organism.**

**Figure 5: View of the SIFT 4G annotator after annotation has been completed.**

**Figure 2: Workflow for the SIFT 4G annotator.**

Searching thousands of genomes to classify somatic and novel structural variants using STIX

Article Open access 08 April 2022

The mutational constraint spectrum quantified from variation in 141,456 humans

Article Open access 27 May 2020

Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

Article Open access 20 November 2020

References

Xia, Q. et al. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326, 433–436 (2009).
Article CAS Google Scholar
The Bovine HapMap Consortium. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324, 528–532 (2009).
Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).
Article CAS Google Scholar
McNally, K.L. et al. Sequencing multiple and diverse rice varieties. Connecting whole-genome variation with phenotypes. Plant Physiol. 141, 26–31 (2006).
Article CAS Google Scholar
The 3,000 rice genomes project. The 3,000 rice genomes project. Gigascience 3, 7 (2014).
Herper, M. Gene Machine (Forbes, 2010).
Drosophila 12 Genomes Consortium. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).
Atanur, S.S. et al. The genome sequence of the spontaneously hypertensive rat: analysis and functional significance. Genome Res. 20, 791–803 (2010).
Article CAS Google Scholar
Seppälä, E.H. et al. LGI2 truncation causes a remitting focal epilepsy in dogs. PLoS Genet. 7, e1002194 (2011).
Article Google Scholar
Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
Article CAS Google Scholar
Ng, P.C. & Henikoff, S. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 12, 436–446 (2002).
Article CAS Google Scholar
Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Article CAS Google Scholar
Sim, N.L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).
Article CAS Google Scholar
Henikoff, S., Till, B.J. & Comai, L. TILLING. Traditional mutagenesis meets functional genomics. Plant Physiol. 135, 630–636 (2004).
Article CAS Google Scholar
Mitsui, J. et al. CSF1R mutations identified in three families with autosomal dominantly inherited leukoencephalopathy. Am. J. Med. Genet. B Neuropsychiatr. Genet. 159B, 951–957 (2012).
Article Google Scholar
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Article CAS Google Scholar
Lamichhaney, S. et al. Evolution of Darwin's finches and their beaks revealed by genome sequencing. Nature 518, 371–375 (2015).
Article CAS Google Scholar
Leida, C. et al. Variability of candidate genes, genetic structure and association with sugar accumulation and climacteric behavior in a broad germplasm collection of melon (Cucumis melo L.). BMC Genet. 16, 28 (2015).
Article Google Scholar
Moreira, G.C. et al. Variant discovery in a QTL region on chromosome 3 associated with fatness in chickens. Anim. Genet. 46, 141–147 (2015).
Article CAS Google Scholar
Ortega, R., Guzmán, C. & Alvarez, J. Wx gene in diploid wheat: molecular characterization of five novel alleles from einkorn (Triticum monococcum L. ssp. monococcum) and T. urartu. Mol. Breeding 34, 1137–1146 (2014).
Article CAS Google Scholar
Renaut, S. & Rieseberg, L.H. The accumulation of deleterious mutations as a consequence of domestication and improvement in sunflowers and other Compositae crops. Mol. Biol. Evol. 32, 2273–2283 (2015).
Article CAS Google Scholar
Choi, J.W. et al. Whole-genome resequencing analyses of five pig breeds, including Korean wild and native, and three European origin breeds. DNA Res. 22, 259–267 (2015).
Article CAS Google Scholar
Schensted, C. Longest increasing and decreasing subsequences. Can. J. Math. 13, 179–191 (1961).
Article Google Scholar
Korpar, M., Sosic, M., Blazeka, D. & Sikic,, M. SW#db: GPU-accelerated exact sequence similarity database search. 10.1101/013805 (14 January 2015).
Ng, P.C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001).
Article CAS Google Scholar
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS Google Scholar
Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C.H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Article CAS Google Scholar
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS Google Scholar
Pace, H.C. et al. Lac repressor genetic map in real space. Trends Biochem. Sci. 22, 334–339 (1997).
Article CAS Google Scholar
Rennell, D., Bouvier, S.E., Hardy, L.W. & Poteete, A.R. Systematic mutation of bacteriophage T4 lysozyme. J. Mol. Biol. 222, 67–88 (1991).
Article CAS Google Scholar
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS Google Scholar
Goodstein, D.M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186 (2012).
Article CAS Google Scholar
Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).
Article CAS Google Scholar

Download references

Acknowledgements

This work is financed in part by A*STAR and the Croatian Science Foundation (project no. 7353, Algorithms for Genome Sequence Analysis). We thank P.C.N.'s significant other for donating his gaming computer 'for science' and M. Korpar for providing the SW#db library.

Author information

Robert Vaser and Swarnaseetha Adusumalli: These authors contributed equally to this work.

Authors and Affiliations

Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Robert Vaser & Mile Sikic
Computational and Systems Biology Group, Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Swarnaseetha Adusumalli, Sim Ngak Leng & Pauline C Ng
Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, Singapore
Mile Sikic

Authors

Robert Vaser
View author publications
You can also search for this author inPubMed Google Scholar
Swarnaseetha Adusumalli
View author publications
You can also search for this author inPubMed Google Scholar
Sim Ngak Leng
View author publications
You can also search for this author inPubMed Google Scholar
Mile Sikic
View author publications
You can also search for this author inPubMed Google Scholar
Pauline C Ng
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

M.S. and P.C.N. conceived the project. R.V. implemented and tested the performance of the SIFT 4G algorithm. S.A. and S.N.L. implemented the SIFT 4G annotator. S.A. and P.C.N. wrote the manuscript.

Corresponding author

Correspondence to Pauline C Ng.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Sensitivity and specificity of SIFT and SIFT 4G.

The algorithms were applied to four datasets: HumDiv (red), HumVar (green), LacI (brown), and lysozyme (blue). SIFT and SIFT 4G’s performances are shown in light-colored and dark-colored bars, respectively. Reproduced under a Creative Commons license from http://sift-dna.org/sift4g/AboutSIFT4G.html.

Supplementary Figure 2 ROC comparison of SIFT and SIFT 4G.

The algorithms were applied to four datasets: HumDiv (red), HumVar (green), LacI (beige), and lysozyme (blue). SIFT’s performance is depicted with dashed lines; SIFT 4G with solid lines. Reproduced under a Creative Commons license from http://sift-dna.org/sift4g/AboutSIFT4G.html.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2, Supplementary Tables 1 and 2 (PDF 673 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vaser, R., Adusumalli, S., Leng, S. et al. SIFT missense predictions for genomes. Nat Protoc 11, 1–9 (2016). https://doi.org/10.1038/nprot.2015.123

Download citation

Published: 03 December 2015
Issue Date: January 2016
DOI: https://doi.org/10.1038/nprot.2015.123

SIFT missense predictions for genomes

Subjects

Abstract

Access options

Similar content being viewed by others

Searching thousands of genomes to classify somatic and novel structural variants using STIX

The mutational constraint spectrum quantified from variation in 141,456 humans

Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Sensitivity and specificity of SIFT and SIFT 4G.

Supplementary Figure 2 ROC comparison of SIFT and SIFT 4G.

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Searching thousands of genomes to classify somatic and novel structural variants using STIX

The mutational constraint spectrum quantified from variation in 141,456 humans

Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Sensitivity and specificity of SIFT and SIFT 4G.

Supplementary Figure 2 ROC comparison of SIFT and SIFT 4G.

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links