Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (216K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (348)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Schäffer, A. A.
Right arrow Articles by Altschul, S. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schäffer, A. A.
Right arrow Articles by Altschul, S. F.
Related Collections
Right arrow Computational methods
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2001, Vol. 29, No. 14 2994-3005
© 2001 Oxford University Press

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements

Alejandro A. Schäffer*, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin and Stephen F. Altschul

National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA

PSI-BLAST is an iterative program to search a database for proteins with distant similarity to a query sequence. We investigated over a dozen modifications to the methods used in PSI-BLAST, with the goal of improving accuracy in finding true positive matches. To evaluate performance we used a set of 103 queries for which the true positives in yeast had been annotated by human experts, and a popular measure of retrieval accuracy (ROC) that can be normalized to take on values between 0 (worst) and 1 (best). The modifications we consider novel improve the ROC score from 0.758 ± 0.005 to 0.895 ± 0.003. This does not include the benefits from four modifications we included in the ‘baseline’ version, even though they were not implemented in PSI-BLAST version 2.0. The improvement in accuracy was confirmed on a small second test set. This test involved analyzing three protein families with curated lists of true positives from the non-redundant protein database. The modification that accounts for the majority of the improvement is the use, for each database sequence, of a position-specific scoring system tuned to that sequence’s amino acid composition. The use of composition-based statistics is particularly beneficial for large-scale automated applications of PSI-BLAST.

* To whom correspondence should be addressed. Tel: +1 301 435 5884; Fax: +1 301 480 2918; Email: schaffer{at}helix.nih.gov


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Bacteriol.Home page
C. T. Parker, M. Gilbert, N. Yuki, H. P. Endtz, and R. E. Mandrell
Characterization of Lipooligosaccharide-Biosynthetic Loci of Campylobacter jejuni Reveals New Lipooligosaccharide Classes: Evidence of Mosaic Organizations
J. Bacteriol., August 15, 2008; 190(16): 5681 - 5689.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
C. A. Daep, R. J. Lamont, and D. R. Demuth
Interaction of Porphyromonas gingivalis with Oral Streptococci Requires a Motif That Resembles the Eukaryotic Nuclear Receptor Box Protein-Protein Interaction Domain
Infect. Immun., July 1, 2008; 76(7): 3273 - 3280.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu
The effectiveness of position- and composition-specific gap costs for protein similarity searches
Bioinformatics, July 1, 2008; 24(13): i15 - i23.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. E. Martinez-Guerrero, R. Ciria, C. Abreu-Goodger, G. Moreno-Hagelsieb, and E. Merino
GeConT 2: gene context analysis for orthologous proteins, conserved domains and metabolic pathways
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W176 - W180.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
L. Yu, B. Zhang, G. R. Szilvay, R. Sun, J. Janis, Z. Wang, S. Feng, H. Xu, M. B. Linder, and M. Qiao
Protein HGFI from the edible mushroom Grifola frondosa is a novel 8 kDa class I hydrophobin that forms rodlets in compressed monolayers
Microbiology, June 1, 2008; 154(6): 1677 - 1685.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
F. Dumetz, E. Duchaud, S. Claverol, N. Orieux, S. Papillon, D. Lapaillerie, and M. Le Henaff
Analysis of the Flavobacterium psychrophilum outer-membrane subproteome and identification of new antigenic targets for vaccine by immunomics
Microbiology, June 1, 2008; 154(6): 1793 - 1801.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. M. Lee, M. K. Chan, and R. Bundschuh
Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches
Bioinformatics, June 1, 2008; 24(11): 1339 - 1343.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
D. E. Holmes, T. Mester, R. A. O'Neil, L. A. Perpetua, M. J. Larrahondo, R. Glaven, M. L. Sharma, J. E. Ward, K. P. Nevin, and D. R. Lovley
Genes for two multicopper proteins required for Fe(III) oxide reduction in Geobacter sulfurreducens have different expression patterns both in the subsurface and on energy-harvesting electrodes
Microbiology, May 1, 2008; 154(5): 1422 - 1435.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
P. Sonego, A. Kocsor, and S. Pongor
ROC analysis: applications to the classification of biological sequences and 3D structures
Brief Bioinform, May 1, 2008; 9(3): 198 - 209.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Csuros, I. B. Rogozin, and E. V. Koonin
Extremely Intron-Rich Genes in the Alveolate Ancestors Inferred with a Flexible Maximum-Likelihood Approach
Mol. Biol. Evol., May 1, 2008; 25(5): 903 - 911.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
M. A. Oberhardt, J. Puchalka, K. E. Fryer, V. A. P. Martins dos Santos, and J. A. Papin
Genome-Scale Metabolic Network Analysis of the Opportunistic Pathogen Pseudomonas aeruginosa PAO1
J. Bacteriol., April 15, 2008; 190(8): 2790 - 2803.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev and N. V. Grishin
Accurate statistical model of comparison between multiple sequence alignments
Nucleic Acids Res., April 1, 2008; 36(7): 2240 - 2248.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
M. I. Elvira, M. M. Galdeano, P. Gilardi, I. Garcia-Luque, and M. T. Serra
Proteomic analysis of pathogenesis-related proteins (PRs) induced by compatible and incompatible interactions of pepper mild mottle virus (PMMoV) in Capsicum chinense L3 plants
J. Exp. Bot., April 1, 2008; 59(6): 1253 - 1265.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. A. Cymerman, I. Chung, B. M. Beckmann, J. M. Bujnicki, and G. Meiss
EXOG, a novel paralog of Endonuclease G in higher eukaryotes
Nucleic Acids Res., March 27, 2008; 36(4): 1369 - 1379.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. R. Shah, C. S. Oehmen, and B.-J. Webb-Robertson
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection
Bioinformatics, March 15, 2008; 24(6): 783 - 790.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
P. L. Obuchowski and C. Jacobs-Wagner
PflI, a Protein Involved in Flagellar Positioning in Caulobacter crescentus
J. Bacteriol., March 1, 2008; 190(5): 1718 - 1729.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
N. D. Gordon, G. L. Ottaviano, S. E. Connell, G. V. Tobkin, C. H. Son, S. Shterental, and A. M. Gehring
Secreted-Protein Response to {sigma}U Activity in Streptomyces coelicolor
J. Bacteriol., February 1, 2008; 190(3): 894 - 904.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Moreno-Hagelsieb and K. Latimer
Choosing BLAST options for better detection of orthologs as reciprocal best hits
Bioinformatics, February 1, 2008; 24(3): 319 - 324.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. K. Basu, I. B. Rogozin, O. Deusch, T. Dagan, W. Martin, and E. V. Koonin
Evolutionary Dynamics of Introns in Plastid-Derived Genes in Plants: Saturation Nearly Reached but Slow Intron Gain Continues
Mol. Biol. Evol., January 1, 2008; 25(1): 111 - 119.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. H. Hazen, D. Wu, J. A. Eisen, and P. A. Sobecky
Sequence Characterization and Comparative Analysis of Three Plasmids Isolated from Environmental Vibrio spp.
Appl. Envir. Microbiol., December 1, 2007; 73(23): 7703 - 7710.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
J. E. Coronado, S. Mneimneh, S. L. Epstein, W.-G. Qiu, and P. N. Lipke
Conserved Processes and Lineage-Specific Proteins in Fungal Cell Wall Evolution
Eukaryot. Cell, December 1, 2007; 6(12): 2269 - 2277.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
X. Fan, Y. Liu, D. Smith, L. Konermann, K. W. M. Siu, and D. Golemi-Kotra
Diversity of Penicillin-binding Proteins: RESISTANCE FACTOR FmtA OF STAPHYLOCOCCUS AUREUS
J. Biol. Chem., November 30, 2007; 282(48): 35143 - 35152.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
T. Gerken, C. A. Girard, Y.-C. L. Tung, C. J. Webby, V. Saudek, K. S. Hewitson, G. S. H. Yeo, M. A. McDonough, S. Cunliffe, L. A. McNeill, et al.
The Obesity-Associated FTO Gene Encodes a 2-Oxoglutarate-Dependent Nucleic Acid Demethylase
Science, November 30, 2007; 318(5855): 1469 - 1472.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
H.-B. Shen and K.-C. Chou
Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM
Protein Eng. Des. Sel., November 10, 2007; (2007) gzm057v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. A. Shiryev, J. S. Papadopoulos, A. A. Schaffer, and R. Agarwala
Improved BLAST searches using longer words for protein seeding
Bioinformatics, November 1, 2007; 23(21): 2949 - 2951.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
P. M. Swe, N. C. K. Heng, Y.-T. Ting, H. J. Baird, A. Carne, A. Tauch, J. R. Tagg, and R. W. Jack
ef1097 and ypkK encode enterococcin V583 and corynicin JK, members of a new family of antimicrobial proteins (bacteriocins) with modular structure from Gram-positive bacteria
Microbiology, October 1, 2007; 153(10): 3218 - 3227.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Pukatzki, A. T. Ma, A. T. Revel, D. Sturtevant, and J. J. Mekalanos
Type VI secretion system translocates a phage tail spike-like protein into target cells where it cross-links actin
PNAS, September 25, 2007; 104(39): 15508 - 15513.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Heger, S. Mallick, C. Wilton, and L. Holm
The global trace graph, a novel paradigm for searching protein sequence databases
Bioinformatics, September 15, 2007; 23(18): 2361 - 2367.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
A. D. Berti, N. J. Greve, Q. H. Christensen, and M. G. Thomas
Identification of a Biosynthetic Gene Cluster and the Six Associated Lipopeptides Involved in Swarming Motility of Pseudomonas syringae pv. tomato DC3000
J. Bacteriol., September 1, 2007; 189(17): 6312 - 6323.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
C. L. Dovey and P. Russell
Mms22 Preserves Genomic Integrity During DNA Replication in Schizosaccharomyces pombe
Genetics, September 1, 2007; 177(1): 47 - 61.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
V. Martin, L.-L. Du, S. Rozenzhak, and P. Russell
Protection of telomeres by a conserved Stn1 Ten1 complex
PNAS, August 28, 2007; 104(35): 14038 - 14043.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
L. V. Parfenova, K. Abarca-Heidemann, B. M. Crane, and B. S. Rothberg
Molecular Architecture and Divalent Cation Activation of TvoK, a Prokaryotic Potassium Channel
J. Biol. Chem., August 17, 2007; 282(33): 24302 - 24309.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
E. A. Hussa, T. M. O'Shea, C. L. Darnell, E. G. Ruby, and K. L. Visick
Two-Component Response Regulators of Vibrio fischeri: Identification, Mutagenesis, and Characterization
J. Bacteriol., August 15, 2007; 189(16): 5825 - 5838.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. J. Suhrer, M. Gruber, and M. J. Sippl
QSCOP-BLAST--fast retrieval of quantified structural information for protein sequences of unknown structure
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W411 - W415.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev, M. Tang, B.-H. Kim, and N. V. Grishin
COMPASS server for remote homology inference
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W653 - W658.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. Weinberg, J. E. Barrick, Z. Yao, A. Roth, J. N. Kim, J. Gore, J. X. Wang, E. R. Lee, K. F. Block, N. Sudarsan, et al.
Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline
Nucleic Acids Res., July 9, 2007; (2007) gkm487v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge
The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
K. J. Evans, C. Lambert, and R. E. Sockett
Predation by Bdellovibrio bacteriovorus HD100 Requires Type IV Pili
J. Bacteriol., July 1, 2007; 189(13): 4850 - 4859.
[Abstract] [Full Text] [PDF]


Home page
Biol. Reprod.Home page
N. Zmora, J. Trant, S.-M. Chan, and J. S. Chung
Vitellogenin and Its Messenger RNA During Ovarian Development in the Female Blue Crab, Callinectes sapidus: Gene Expression, Synthesis, Transport, and Cleavage
Biol Reprod, July 1, 2007; 77(1): 138 - 146.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
H. Dumay-Odelot, C. Marck, S. Durrieu-Gaillard, O. Lefebvre, S. Jourdain, M. Prochazkova, A. Pflieger, and M. Teichmann
Identification, Molecular Cloning, and Characterization of the Sixth Subunit of Human Transcription Factor TFIIIC
J. Biol. Chem., June 8, 2007; 282(23): 17179 - 17189.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. Bonner, C. Perrier, B. Corthesy, and S. J. Perkins
Solution Structure of Human Secretory Component and Implications for Biological Function
J. Biol. Chem., June 8, 2007; 282(23): 16969 - 16980.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
A. Ludtke, J. Buettner, W. Wu, A. Muchir, A. Schroeter, S. Zinn-Justin, S. Spuler, H. H.-J. Schmidt, and H. J. Worman
Peroxisome Proliferator-Activated Receptor-{gamma} C190S Mutation Causes Partial Lipodystrophy
J. Clin. Endocrinol. Metab., June 1, 2007; 92(6): 2248 - 2255.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
W. A. Laing, M. A. Wright, J. Cooney, and S. M. Bulley
From the Cover: The missing step of the L-galactose pathway of ascorbate biosynthesis in plants, an L-galactose guanyltransferase, increases leaf ascorbate content
PNAS, May 29, 2007; 104(22): 9534 - 9539.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
Y. Wang, R. G. Kleespies, A. M. Huger, and J. A. Jehle
The Genome of Gryllus bimaculatus Nudivirus Indicates an Ancient Diversification of Baculovirus-Related Nonoccluded Nudiviruses of Insects
J. Virol., May 15, 2007; 81(10): 5395 - 5406.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. Loprasert, W. Whangsuk, J. M. Dubbs, R. Sallabhan, K. Somsongkul, and S. Mongkolsuk
HpdR Is a Transcriptional Activator of Sinorhizobium meliloti hpdA, Which Encodes a Herbicide-Targeted 4-Hydroxyphenylpyruvate Dioxygenase
J. Bacteriol., May 1, 2007; 189(9): 3660 - 3664.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
A. Moraleda-Munoz and L. J. Shimkets
Lipolytic Enzymes in Myxococcus xanthus
J. Bacteriol., April 15, 2007; 189(8): 3072 - 3080.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
S. J. Spatz, L. Petherbridge, Y. Zhao, and V. Nair
Comparative full-length sequence analysis of oncogenic and vaccine (Rispens) strains of Marek's disease virus
J. Gen. Virol., April 1, 2007; 88(4): 1080 - 1096.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
S. Richardt, D. Lang, R. Reski, W. Frank, and S. A. Rensing
PlanTAPDB, a Phylogeny-Based Resource of Plant Transcription-Associated Proteins
Plant Physiology, April 1, 2007; 143(4): 1452 - 1466.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
K. L. Visick, T. M. O'Shea, A. H. Klein, K. Geszvain, and A. J. Wolfe
The Sugar Phosphotransferase System of Vibrio fischeri Inhibits both Motility and Bioluminescence
J. Bacteriol., March 15, 2007; 189(6): 2571 - 2574.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
P. Fariselli, I. Rossi, E. Capriotti, and R. Casadio
The WWWH of remote homolog detection: The state of the art
Brief Bioinform, March 1, 2007; 8(2): 78 - 87.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. Suzuki, H. H.-Y. Wang, and D. R. McCarty
Repression of the LEAFY COTYLEDON 1/B3 Regulatory Network in Plant Embryo Development by VP1/ABSCISIC ACID INSENSITIVE 3-LIKE B3 Genes
Plant Physiology, February 1, 2007; 143(2): 902 - 911.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Zaros, J.-F. Briand, Y. Boulard, S. Labarre-Mariotte, M. C. Garcia-Lopez, P. Thuriaux, and F. Navarro
Functional organization of the Rpb5 subunit shared by the three yeast RNA polymerases
Nucleic Acids Res., January 28, 2007; 35(2): 634 - 647.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. L. Wheeler, T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. DiCuccio, R. Edgar, S. Federhen, et al.
Database resources of the National Center for Biotechnology Information
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D5 - D12.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
W. Eiamphungporn, N. Charoenlap, P. Vattanaviboon, and S. Mongkolsuk
Agrobacterium tumefaciens soxR Is Involved in Superoxide Stress Protection and Also Directly Regulates Superoxide-Inducible Expression of Itself and a Target Gene
J. Bacteriol., December 15, 2006; 188(24): 8669 - 8673.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. M. Babu, L. M. Iyer, S. Balaji, and L. Aravind
The natural history of the WRKY-GCM1 zinc fingers and the relationship between transcription factors and transposons
Nucleic Acids Res., December 2, 2006; 34(22): 6505 - 6520.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. D. Silva, L. Shen, V. Tcherepanov, C. Watson, and C. Upton
Predicted function of the vaccinia virus G5R protein
Bioinformatics, December 1, 2006; 22(23): 2846 - 2850.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
I. Katic and I. Greenwald
EMB-4: A Predicted ATPase That Facilitates lin-12 Activity in Caenorhabditis elegans
Genetics, December 1, 2006; 174(4): 1907 - 1915.
[Abstract] [Full Text] [PDF]