Abstract
Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
251,40 € per year
only 20,95 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Elias, J.E., Haas, W., Faherty, B.K. & Gygi, S.P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2, 667–675 (2005).
Chen, Y., Kwon, S.W., Kim, S.C. & Zhao, Y. Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. J. Proteome Res. 4, 998–1005 (2005).
Eng, J.K., McCormack, A.L. & Yates, J.R., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Moore, R.E., Young, M.K. & Lee, T.D. Qscore: An Algorithm for Evaluating SEQUEST Database Search Results. J. Am. Soc. Mass Spectrom. 13, 378–386 (2002).
Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J. & Gygi, S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43–50 (2003).
Kislinger, T. et al. PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2, 96–106 (2003).
Haas, W. et al. Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol. Cell Proteomics 7, 1326–1337 (2006).
Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Olsen, J.V., Ong, S.E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteomics 3, 608–614 (2004).
Nielsen, M.L., Savitski, M.M. & Zubarev, R.A. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol. Cell. Proteomics 4, 835–845 (2005).
Resing, K.A. et al. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal. Chem. 76, 3556–3568 (2004).
Qian, W.J. et al. Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. J. Proteome Res. 4, 53–62 (2005).
Higdon, R., Hogan, J.M., Van Belle, G. & Kolker, E. Randomized sequence databases for tandem mass spectrometry peptide and protein identification. OMICS 9, 364–379 (2005).
Beausoleil, S.A. et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101, 12130–12135 (2004).
Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P. & Gygi, S.P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004).
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Sadygov, R.G. & Yates, J.R., III. A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792–3798 (2003).
Nesvizhskii, A.I. & Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 4, 1419–1440 (2005).
Beausoleil, S.A., Villen, J., Gerber, S.A., Rush, J. & Gygi, S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).
Everley, P.A. et al. Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation. J. Proteome Res. 5, 1224–1231 (2006).
Kersey, P.J. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).
Acknowledgements
This work was supported in part by US National Institutes of Health (GM67945 and HG00041 to S.P.G.). We thank S. Beausoleil, P. Everley, S. Gerber and W. Haas for continuing and insightful discussions, and Sage-N for implementing our idea of the pseudo-reversed searches on their SEQUEST platform.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
False positive identifications can be estimated by doubling decoy hits from a search against a concatenated target/decoy database. (PDF 416 kb)
Supplementary Fig. 2
The distributions of potential peptide matches is consistent between target and decoy databases. (PDF 22 kb)
Supplementary Fig. 3
Example supporting the necessity for target/decoy competition. (PDF 47 kb)
Supplementary Fig. 4
Relative scores shift to smaller values for less than half of peptide hits when searched against composite target-decoy databases as opposed to separate databases. (PDF 41 kb)
Supplementary Fig. 5
Using decoy hits to guide selection of appropriate selection criteria. (PDF 160 kb)
Supplementary Table 1
Slopes of best-fit lines for precision values shown in Figure 5b. (PDF 14 kb)
Rights and permissions
About this article
Cite this article
Elias, J., Gygi, S. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4, 207–214 (2007). https://doi.org/10.1038/nmeth1019
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth1019