Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (67K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (124)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Zhang, Z.
Right arrow Articles by Altschul, S. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, Z.
Right arrow Articles by Altschul, S. F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 3986-3990  


Protein sequence similarity searches using patterns as seeds
Introduction
The PHI-BLAST Algorithm
Statistical Analysis
Implementation And Examples
   CED4-like cell death regulators
   HS90-type ATPase domains
   Archaeal tRNA nucleotidyltransferases
   Archaeal homologs of DnaG-type DNA primases
Performance Evaluation
Conclusion
   Note
Acknowledgements
References


Protein sequence similarity searches using patterns as seeds

Protein sequence similarity searches using patterns as seeds

Zheng Zhang, Alejandro A. Schäffer1, Webb Miller, Thomas L. Madden2, David J. Lipman2, Eugene V. Koonin2 and Stephen F. Altschul2,*

Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA, 1Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD 21224, USA and 2National Center for Biotechnology Information, National Library of Medicine,National Institutes of Health, Bethesda, MD 20894, USA

Received May 7, 1998; Revised and Accepted July 8, 1998

ABSTRACT

Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.

INTRODUCTION

In the analysis of a protein or DNA sequence, particular interest often focuses upon a small region, domain or sequence pattern. A natural question is whether there are other related sequences that share the same pattern. The most widely used tools for sequence similarity search allow matching between arbitrary regions of the query and database sequences (1-5). In contrast, many motif-based search methods seek database sequences that match a pre-specified pattern (6-12). If this pattern is too weak, or not specified with sufficient precision, the number of matches may be very large, most being of no biological relevance. On the other hand, an overly-specific pattern may exclude many sequences of interest.

We describe here the pattern-hit initiated BLAST (PHI-BLAST) program, whose hybrid strategy addresses a type of question frequently asked by researchers: namely, is a particular pattern seen in a protein of interest likely to be functionally relevant, or does it occur simply by chance? To address this question, we combine a pattern search with a search for statistically significant sequence similarity. These two approaches were combined previously in a program that explored the output of a BLAST search for conserved patterns (10). PHI-BLAST implements a reverse strategy which is computationally more efficient, and which we believe will be of greater utility. Specifically, the similarity search is restricted to a subset of the sequence database comprised of the sequences that contain the given pattern.

The input to PHI-BLAST consists of a protein or DNA sequence, along with a specific pattern occurring at least once within the sequence. The pattern is currently required to be a sequence of residues or sets of residues, with `wild cards' and variable spacing allowed; all PROSITE patterns (12), for example, have this form. For each match between an instance of the pattern in the query sequence and an instance in a database sequence, PHI-BLAST constructs a high-scoring local alignment that includes the match. All resulting alignments are sorted by score and evaluated statistically.

This approach has greatest utility when it is suspected that a few residues comprising a small motif may be crucial for the biological function of interest. Showing that this pattern occurs within an extended and statistically significant alignment of the query sequence with one or more database sequences greatly reduces the likelihood that the pattern is spurious. Conversely, insisting on the presence of the pattern and hence searching a reduced sequence space may aid the detection of subtle similarities that blend into the background noise in a regular BLAST search.

THE PHI-BLAST ALGORITHM

To search for matches to a given pattern, we adapted a method of Baeza-Yates and Gonnet (13) and Wu and Manber (14). This method permits simple patterns to be represented in a single computer word and matches to be found very efficiently. When the pattern is relatively complex, for example consisting of many rigid parts and/or having wide ranges of spacer lengths, our program first searches for the rigid part that is least likely to match by chance alone, and then performs local searches for the remaining pattern elements.

For each instance of the input pattern in a database sequence, paired with an instance in the query, PHI-BLAST attempts to find the optimal local alignment (1,15) containing the aligned patterns. This can be done rigorously by applying dynamic programming (16,17) to the parts of the two sequences preceding and the parts following the pattern. The alignment returned is required to begin at the corner of the path graph, but is permitted to end anywhere within the graph. The difficulty with this approach is that, to guarantee optimality, a very large portion of the path graph needs to be searched, and this requires inordinate time in a database search (18). Accordingly, we have used the gapped extension heuristic described in Altschul et al. (5) and Zhang et al. (18). Basically, path graph cells are considered only if the score of the best alignment leading into them falls no more than X below the best score yet found. For sufficiently large values of the X parameter, this approach almost always returns the optimal local alignment.

Because PHI-BLAST performs a gapped extension whenever an instance of the input pattern is encountered in the database, reasonable execution times depend upon such instances being relatively rare. Therefore, we allow only patterns that are expected to occur less frequently than once per 5000 database residues. Any pattern that contains four completely specified residues, or three specified residues whose average background frequency is < 5.8%, passes this test. Of course, the more specific the input pattern, the faster PHI-BLAST will run. The frequency with which a pattern will occur within the database can be estimated easily (19) from background amino acid frequencies (20).

STATISTICAL ANALYSIS

An alignment A produced by PHI-BLAST may be divided into three parts: the region A0 spanned by the input pattern, and the local alignments A1 and A2 produced to either side of A0 by the gapped extension routine. Either or both of A1 and A2 may be empty. Correspondingly, the score S of the alignment may be divided into the scores S0, S1 and S2. For the purpose of statistical analysis, it is easiest to assume that all alignment regions A0 that satisfy the input pattern are of equal biological plausibility, and therefore to ignore their scores. Accordingly, each alignment produced by PHI-BLAST is ranked by its reduced score S[prime] = S1 + S2. For a given value x, we wish to estimate how many alignments are expected to have a reduced score S[prime] > x purely by chance.

In general, the input pattern is chosen because it is known to correspond to some feature of biological interest. Therefore, we make no statistical inference from the number of times the pattern is observed to occur within the query sequence (nq) and the database as a whole (nd). We simply record N = nq nd, the number of distinct pattern pairs that may seed a PHI-BLAST local alignment.

The simplest model of protein sequences is as random strings of amino acids, chosen independently with specific background probabilities for the various possible residues. To estimate the random distribution of S[prime], we start by considering the distribution of the scores S1 and S2 of which it is the sum. Each of these scores can be thought of as the result of the gapped extension routine applied to a pair of random sequences. In the limit of large values for the X-dropoff parameter (5,18), S1 is the score of the optimal local alignment required to start at a particular point P. The much studied Smith-Waterman alignment score (1) is just this constrained local alignment score, maximized over all path graph points P. The distribution of Smith-Waterman scores has been established empirically to follow an extreme value distribution, whose scale or decay parameter [lambda] does not change with increasing search space sizes (4,21-24). This implies (25) that the distribution of S1 should have an exponential tail, with decay parameter [lambda] equal to that of the extreme value distribution for Smith-Waterman scores. Some simple calculus then yields that for sufficiently large scores x, the distribution of S[prime] = S1 + S2 has the form Prob(S[prime] > x) [approximately equals] C([lambda]x + 1)e-[lambda]x for some constant C. The scores of optimal local alignments constrained to contain distinct pattern pairs may be correlated, but the expected number of alignments attaining a given score is independent of such correlation. Therefore, the expected number of chance alignments produced by PHI-BLAST with reduced score at least x is

E(S[prime] > x) [approximately equals] CN([lambda]x + 1)e-[lambda]x 1

Tables of [lambda] for a variety of amino acid substitution matrices and gap costs have been reported (4), and their validity tested on a large number of protein families (26). The values for [lambda] employed here differ slightly from those published previously (4), because we have re-estimated [lambda] using larger and therefore more accurate simulations. The parameter C of equation 1 is new and requires its own estimation. Random simulation (data not shown) using the background amino acid frequencies of Robinson and Robinson (20) yields C [ap] 0.6 for the BLOSUM-62 matrix (27) in conjunction with the complete range of affine gap costs useful for standard protein sequence comparison (4). We will consider the validity of equation 1 after discussing several biological examples.

IMPLEMENTATION AND EXAMPLES

To enhance the utility and functionality of a WWW-based version of PHI-BLAST, we have nested it between two other programs. While one may define a pattern based upon specific knowledge concerning the query sequence, a researcher often wishes to search a pattern-database for any well-characterized motifs the query may contain. To streamline this latter approach, we have implemented a program that first searches the PROSITE database (12) with the query; any patterns found may then be used to launch a PHI-BLAST database search. To facilitate more detailed analysis of PHI-BLAST output, we allow it automatically to serve as the basis for constructing a position-specific score matrix for further database searching via the position-specific iterated BLAST (PSI-BLAST) program (5). Like other BLAST family programs, PHI-BLAST incorporates a pre-filter for protein regions of biased amino acid composition (low complexity) that often corrupt database searches (28,29).

PHI-BLAST may detect subtle relationships that escape standard database similarity searches, but this potential depends upon the specification of an amino acid pattern likely to be conserved within the protein family of interest. We discuss four examples involving protein families whose original description depended critically upon detecting relatively weak sequence similarities. In each case, PHI-BLAST reports a subtle but structurally and functionally relevant relationship. The alignments suggesting these relationships are not all statistically significant but, in each database search output ranked by E-value, they appear immediately after the alignments involving clear family members, thereby prompting further analysis. In contrast, any of these similarities reported by gapped BLAST (5) are preceded by a number of alignments with smaller E-values involving unrelated sequences. The four examples discussed below are summarized in Table 1. All searches were performed on the non-redundant (NR) protein sequence database maintained by the NCBI (30).


Table 1. Detection of subtle protein sequence relationships using PHI-BLAST
The reported results are from searches of the NCBI (30) non-redundant protein sequence database (April 9, 1998; 298 842 sequences; 90 087 406 residues). The PHI-BLAST and BLAST algorithms used the BLOSUM-62 substitution matrix (27), in conjunction with penalties of 11+ k for gaps of length k. BLAST E-values were calculated using the statistical parameters [lambda] = 0.270 and K = 0.047, and applying an edge-effect correction (4). PHI-BLAST E-values were calculated from equation 1, using the statistical parameters [lambda] = 0.270 and C = 0.6.
aPatterns are described using the one-letter amino acid code. Brackets represent a choice among any of the enclosed amino acids. `x' represents any amino acid. `h' represents [ILVMF], a hydrophobic amino acid.

CED4-like cell death regulators

The Caenorhabditis elegans protein CED4 is a regulator of programmed cell death (apoptosis). CED4 contains the classical P-loop motif involved in phosphate binding and found in a great variety of ATPases and GTPases. ATP binding by CED4, and the role of ATP in its function, have been demonstrated (31,32). In a gapped BLAST search of the NR database, CED4 shows statistically significant sequence similarity to only one protein, the human apoptosis regulator Apaf-1, in which the P-loop is conserved (33,34). However when PHI-BLAST is used, requiring conservation of the P-loop (Table 1), the best hit after Apaf-1, with E-value 0.038, is to a plant disease resistance protein, Arabidopsis thaliana T7N9.18 (35). Further sequence comparison shows that animal apoptosis regulators and putative plant ATPases involved in disease resistance share several conserved motifs, suggesting that they have a common origin and may have similar roles in programmed cell death (L.Aravind, V.M.Dixit and E.V.Koonin, unpublished observations). Before the Apaf-1 sequence became available, this conclusion had been reached through a laborious comparison of CED4 to a large number of different ATPases (32). Because the Apaf-1 sequence is highly similar to homologous plant proteins, the connection between CED4 and the plant proteins can be easily demonstrated by iterative database search (5). Even without Apaf-1, however, PHI-BLAST is able immediately to establish this link.

HS90-type ATPase domains

We used PHI-BLAST to investigate the subtle but structurally validated relationship between the ATPase domains in the MutL DNA repair proteins, type II topoisomerases, histidine kinases and HS90 family proteins (36,37). The output identified a new family of eukaryotic proteins that contain the same type of predicted ATPase domain, but that in standard database searches do not show significant similarity to any known member of the superfamily. A PHI-BLAST search with the Escherichia coli MutL protein (38) as query showed moderate similarity (E-value 0.017) to the C.elegans protein ZC155.3 (39) that was originally described as having `weak similarity to Bovine synaptocanalin I'. Subsequent database searches with this worm protein sequence as query revealed homologs in humans (KIAA0136) (40) and plants (41,42), whereas a PHI-BLAST search also showed convincing similarity to MutL family members (best E-value 6 × 10-5). Elucidation of the function of this new family of eukaryotic ATP-utilizing enzymes will be of considerable interest; the synaptocanalin domain apparently was fused to the worm protein by exon misassembly.

Archaeal tRNA nucleotidyltransferases

The archaeal tRNA nucleotidyltransferases (Cca) are a distinct family of nucleic acid polymerases (43) that in standard database searches do not have detectable similarity to any proteins other than orthologs from other archaeal species. However, they do contain a conserved motif, with two aspartate residues, that resembles the catalytic sites of many other polymerases (44). When this pattern (Table 1) is specified in a PHI-BLAST search with Methanococcus jannaschii Cca (45) as query, the top hit outside the archaeal Cca family itself, with E-value 0.061, is to hypothetical protein AF0299 from Archaeoglobus fulgidus (46), which belongs to a previously described archaeal family of predicted nucleotidyltransferases (47); the third hit (E-value 0.13) is to an experimentally characterized streptomycin 3[prime][prime]-adenylyltransferase from Enterococcus faecalis (48).


Table 2. Accuracy of PHI-BLAST statistics
PHI-BLAST searches were performed on shuffled and reversed versions of the NR database, using the query sequences and associated patterns of Table 1, as well as the same alignment scoring system and statistical parameters [lambda] and C. A, CED4-like cell death regulators; B, HS90-type ATPase domains; C, archaeal tRNA nucleotidyltransferases; D, archaeal homologs of DnaG-type DNA primases.

Archaeal homologs of DnaG-type DNA primases

Archaeal homologs of bacterial DNA primases, e.g. M.jannaschii protein MJ1206 (45), contain a motif typical of helicases (47), but do not show significant similarity to these proteins in standard BLAST searches. Using M.jannaschii MJ1206 and the helicase motif as query, the first non-trivial PHI-BLAST hit, with E-value 0.54, is to the well known helicase Neisseria gonorrhoeae UvrB (49). The relevance of the helicase motif in the archaeal primase homologs is supported by an extended alignment with the UvrB helicase (L.Aravind, D.D.Leipe and E.V.Koonin, unpublished observations). The similarities uncovered in this example are undetectable with standard database search techniques.

PERFORMANCE EVALUATION

To test the accuracy of the PHI-BLAST statistics given by equation 1, we used each of the examples above to search `random databases' constructed from NR by shuffling or reversing each sequence. For each query, the lowest recorded E-value, and the number of alignments found with E-value < 10, are given in Table 2. For the shuffled database, the geometric mean of the observed numbers of sequences with E-value < 10 is 10.0, and no single case diverges from this value by more than a factor of 2.5. This might be expected, as the values of [lambda] and C used in equation 1 were calculated employing a random protein model in which all amino acids occur independently. Perhaps surprisingly, Table 2 suggests that under an alternative random protein model, based upon reversed real sequences, these statistics are slightly conservative.

To compare the speed of PHI-BLAST to that of a standard gapped BLAST program (5) we timed both for searches of each of the four examples above against the NR database. Analysis of the results (Table 3) suggests that on the computer system used, ~8 s of each PHI-BLAST run were required to scan the database for pattern hits and for system overhead; the remainder was spent on constructing gapped extensions for all pattern hits found. Clearly, the number of hits generated by the input pattern is a key determinant of PHI-BLAST's speed. For relatively informative patterns PHI-BLAST is very fast, requiring not much more time than that needed to search for pattern hits. For relatively weak patterns, PHI-BLAST expends most of its effort extending hits, and can require time comparable to that for gapped BLAST.


Table 3. Execution speed of PHI-BLAST
The four examples of Table 1 were used to search the NR database using PHI-BLAST, and BLASTP version 2.0.4. Both programs employed the same substitution and gap costs, and the same X-dropoff parameter. This timing experiment was run on one 168 MHz UltraSparc processor of a Sun Ultra Enterprise 4000/5000 server with 768 Mbytes of RAM. This machine runs the operating system Solaris, version 2.6, which is an implementation of UNIX. We used the current Sun C compiler, with the -O option for optimization, to compile both programs. The times given are the sum of the user and system times reported by the time command, and are for the better of two identical runs. A-D, as in Table 2.

CONCLUSION

As illustrated by the biological examples discussed above, PHI-BLAST helps both to ascertain the biological relevance of patterns detected within protein sequences, and in some cases to detect subtle similarities that escape a regular BLAST search. We note, however, that PHI-BLAST was specifically designed to combine pattern search with the search for statistically significant sequence similarity, rather than to maximize search sensitivity. Thus in general one should not expect PHI-BLAST, which by its nature is a single-pass search method, to be more sensitive than PSI-BLAST (5). Furthermore, within proteins, residues that are absolutely conserved during evolution constitute a small minority, and even specifying a restricted set of possibilities for a given residue position often excludes many members of a protein family. PHI-BLAST therefore is not the ideal tool for completely delineating a class of related proteins. However, by greatly restricting the size of the search space, PHI-BLAST can allow the similarities of some distant homologs to rise above the background noise that would otherwise obscure them. Such findings can be used subsequently for more extensive family analysis using PSI-BLAST (5) or other tools.

We have developed PHI-BLAST for protein-protein comparison, but plan to extend its applicability. A version that translates a DNA database in all six reading frames for comparison to a protein query would be particularly valuable, and a DNA-DNA comparison version should also find use. We also plan to extend PHI-BLAST so that it may use generalized affine gap costs (50) in place of the traditional affine gap costs (51-54) currently permitted.

Note

Source code for PHI-BLAST is available by anonymous ftp from the machine ncbi.nlm.nih.gov, within the directory `blast', and the program may be run from NCBI's web site at http://www.ncbi.nlm.nih.gov/

ACKNOWLEDGEMENTS

Z.Z. and W.M. are supported by grant LM05110 from the National Library of Medicine. We thank Dr L. Aravind for helpful discussions.

REFERENCES

1. Smith,T.F. and Waterman,M.S. (1981) J. Mol. Biol., 147, 195-197. MEDLINE Abstract

2. Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444-2448. MEDLINE Abstract

3. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) J. Mol. Biol., 215, 403-410. MEDLINE Abstract

4. Altschul,S.F. and Gish,W. (1996) Methods Enzymol., 266, 460-480. MEDLINE Abstract

5. Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402. MEDLINE Abstract

6. Myers,E.W. and Miller,W. (1989) Bull. Math. Biol., 51, 5-37. MEDLINE Abstract

7. Smith,R.F. and Smith,T.F. (1990) Proc. Natl Acad. Sci. USA, 87, 118-122. MEDLINE Abstract

8. Staden,R. (1990) Methods Enzymol., 183, 193-211. MEDLINE Abstract

9. Mehldau,G. and Myers,G. (1993) Comp. Appl. Biosci., 9, 299-314. MEDLINE Abstract

10. Tatusov,R.L. and Koonin,E.V. (1994) Comp. Appl. Biosci., 10, 457-459. MEDLINE Abstract

11. Ogiwara,A., Uchiyama,I., Takagi,T. and Kanehisa,M. (1996) Protein Sci., 5, 1991-1999. MEDLINE Abstract

12. Bairoch,A., Bucher,P. and Hofmann,K. (1997) Nucleic Acids Res., 25, 217-221. MEDLINE Abstract

13. Baeza-Yates,R. and Gonnet,G. (1992) Commun. Assoc. Comp. Mach., 35, 74-82.

14. Wu,S. and Manber,U. (1992) Commun. Assoc. Comp. Mach., 35, 83-91.

15. Sellers,P.H. (1984) Bull. Math. Biol., 46, 501-514.

16. Needleman,S.B. and Wunsch,C.D. (1970) J. Mol. Biol., 48, 443-453. MEDLINE Abstract

17. Sankoff,D. (1972) Proc. Natl Acad. Sci. USA, 69, 4-6. MEDLINE Abstract

18. Zhang,Z., Berman,P. and Miller,W. (1998) J. Comput. Biol., 5, 197-210. MEDLINE Abstract

19. Staden,R. (1989) Comp. Appl. Biosci., 5, 89-96. MEDLINE Abstract

20. Robinson,A.B. and Robinson,L.R. (1991) Proc. Natl Acad. Sci. USA, 88, 8880-8884. MEDLINE Abstract

21. Smith,T.F., Waterman,M.S. and Burks,C. (1985) Nucleic Acids Res., 13, 645-656. MEDLINE Abstract

22. Collins,J.F., Coulson,A.F.W. and Lyall,A. (1988) Comp. Appl. Biosci., 4, 67-71. MEDLINE Abstract

23. Mott,R. (1992) Bull. Math. Biol., 54, 59-75.

24. Waterman,M.S. and Vingron,M. (1994) Stat. Sci., 9, 367-381.

25. Gumbel,E.J. (1958) Statistics of Extremes. Columbia University Press, New York, NY.

26. Pearson,W.R. (1998) J. Mol. Biol., 276, 71-84. MEDLINE Abstract

27. Henikoff,S. and Henikoff,J.G. (1992) Proc. Natl Acad. Sci. USA, 89, 10915-10919. MEDLINE Abstract

28. Wootton,J.C. and Federhen,S. (1993) Comp. Chem., 17, 149-163.

29. Altschul,S.F., Boguski,M.S., Gish,W. and Wootton,J.C. (1994) Nature Genet., 6, 119-129. MEDLINE Abstract

30. Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J. and Ouellette,B.F. (1998) Nucleic Acids Res., 26, 1-7. MEDLINE Abstract

31. Seshagiri,S. and Miller,L.K. (1997) Curr. Biol., 7, 455-460. MEDLINE Abstract

32. Chinnaiyan,A.M., Chaudhary,D., O'Rourke,K., Koonin,E.V. and Dixit,V.M. (1997) Nature, 388, 728-729. MEDLINE Abstract

33. Zou,H., Henzel,W.J., Liu,X., Lutschg,A. and Wang,X. (1997) Cell, 90, 405-413. MEDLINE Abstract

34. Li,P., Nijhawan,D., Budihardjo,I., Srinivasula,S.M., Ahmad,M., Alnemri,E.S. and Wang,X. (1997) Cell, 91, 479-489. MEDLINE Abstract

35. Buehler,E., Dewar,K., Feng,J., Kim,C., Li,Y., Shinn,P., Sun,H., Conway,A., Conway,A., Kurtz,D., et al). (1997) GenBank accession no. 2213598.

36. Bergerat,A., de Massy,B., Gadelle,D., Varoutas,P.C., Nicolas,A. and Forterre,P. (1997) Nature, 386, 414-417. MEDLINE Abstract

37. Mushegian,A.R., Bassett,D.E., Jr, Boguski,M.S., Bork,P. and Koonin,E.V. (1997) Proc. Natl Acad. Sci. USA, 94, 5831-5836. MEDLINE Abstract

38. Tsui,H.T., Mandavilli,B.S. and Winkler,M.E. (1992) Nucleic Acids Res., 20, 2379. MEDLINE Abstract

39. Wilson,R., Ainscough,R., Anderson,K., Baynes,C., Berks,M., Bonfield,J., Burton,J., Connell,M., Copsey,T., Cooper,J., et al). (1994) Nature, 368, 32-38. MEDLINE Abstract

40. Nagase,T., Seki,N., Tanaka,A., Ishikawa,K. and Nomura,N. (1995) DNA Res., 2, 167-174. MEDLINE Abstract

41. Bevan,M., Hilbert,H., Braun,M., Holzer,E., Brandt,A., Duesterhoeft,A., Hoheisel,J., Jesse,T., Heijnen,L., Vos,P., et al). (1998) GenBank accession no. 2961386.

42. Bevan,M., Hilbert,H., Braun,M., Holzer,E., Brandt,A., Duesterhoeft,A., Hoheisel,J., Jesse,T., Heijnen,L., Vos,P., et al). (1998) GenBank accession no. 2961387.

43. Yue,D., Maizels,N. and Weiner,A.M. (1996) RNA, 2, 895-908. MEDLINE Abstract

44. Dracheva,S., Koonin,E.V. and Crute,J. (1995) J. Biol. Chem., 270, 14148-14153. MEDLINE Abstract

45. Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., et al). (1996) Science, 273, 1058-1073. MEDLINE Abstract

46. Klenk,H.P., Clayton,R.A., Tomb,J., White,O., Nelson,K.E., Ketchum,K.A., Dodson,R.J., Gwinn,M., Hickey,E.K., Peterson,J.D., et al). (1997) Nature, 390, 364-370. MEDLINE Abstract

47. Koonin,E.V., Mushegian,A.R., Galperin,M.Y. and Walker,D.R. (1997) Mol. Microbiol., 25, 619-637. MEDLINE Abstract

48. LeBlanc,D.J., Lee,L.N. and Inamine,J.M. (1991) Antimicrob. Agents Chemother., 35, 1804-1810. MEDLINE Abstract

49. Black,C.G., Fyfe,J.A. and Davies,J.K. (1995) J. Bacteriol., 177, 1952-1958. MEDLINE Abstract

50. Altschul,S.F. (1998) Proteins, 32, 88-96. MEDLINE Abstract

51. Gotoh,O. (1982) J. Mol. Biol., 162, 705-708. MEDLINE Abstract

52. Fitch,W.M. and Smith,T.F. (1983) Proc. Natl Acad. Sci. USA, 80, 1382-1386.

53. Altschul,S.F. and Erickson,B.W. (1986) Bull. Math. Biol., 48, 603-616. MEDLINE Abstract

54. Myers,E.W. and Miller,W. (1988) Comp. Appl. Biosci., 4, 11-17. MEDLINE Abstract


*To whom correspondence should be addressed. Tel: +1 301 496 2475; Fax: +1 301 480 9241; Email: altschul@ncbi.nlm.nih.gov


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 14 Aug 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Andreeva and H. Tidow
A novel CHHC Zn-finger domain found in spliceosomal proteins and tRNA modifying enzymes
Bioinformatics, October 15, 2008; 24(20): 2277 - 2280.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
M. E. Ortiz-Soto, M. Rivera, E. Rudino-Pinera, C. Olvera, and A. Lopez-Munguia
Selected mutations in Bacillus subtilis levansucrase semi-conserved regions affecting its biochemical properties
Protein Eng. Des. Sel., October 1, 2008; 21(10): 589 - 595.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Shu, T. Zhou, and S. Hovmoller
Prediction of zinc-binding sites in proteins from sequence
Bioinformatics, March 15, 2008; 24(6): 775 - 782.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler
GenBank
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D25 - D30.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Heger, S. Mallick, C. Wilton, and L. Holm
The global trace graph, a novel paradigm for searching protein sequence databases
Bioinformatics, September 15, 2007; 23(18): 2361 - 2367.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
D. W. Brown, R. A. E. Butchko, M. Busman, and R. H. Proctor
The Fusarium verticillioides FUM Gene Cluster Encodes a Zn(II)2Cys6 Protein That Affects FUM Gene Expression and Fumonisin Production
Eukaryot. Cell, July 1, 2007; 6(7): 1210 - 1218.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. S. Papadopoulos and R. Agarwala
COBALT: constraint-based alignment tool for multiple protein sequences
Bioinformatics, May 1, 2007; 23(9): 1073 - 1079.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler
GenBank
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D21 - D25.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
L. Hao, J. Klein, and M. Nei
Heterogeneous but conserved natural killer receptor gene complexes in four major orders of mammals
PNAS, February 28, 2006; 103(9): 3192 - 3197.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. B. Thomas and R. I. Gumport
Dimerization of the bacterial RsrI N6-adenine DNA methyltransferase
Nucleic Acids Res., February 6, 2006; 34(3): 806 - 815.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler
GenBank
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D16 - D20.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Pugalenthi, A. Bhaduri, and R. Sowdhamini
iMOTdb--a comprehensive collection of spatially interacting motifs in proteins
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D285 - D286.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
T. Bickel, L. Lehle, M. Schwarz, M. Aebi, and C. A. Jakob
Biosynthesis of Lipid-linked Oligosaccharides in Saccharomyces cerevisiae: Alg13p AND Alg14p FORM A COMPLEX REQUIRED FOR THE FORMATION OF GlcNAc2-PP-DOLICHOL
J. Biol. Chem., October 14, 2005; 280(41): 34500 - 34506.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Chakrabarti, A. P. Anand, N. Bhardwaj, G. Pugalenthi, and R. Sowdhamini
SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W274 - W276.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
M. G. Bowden, W. Chen, J. Singvall, Y. Xu, S. J. Peacock, V. Valtulina, P. Speziale, and M. Hook
Identification and preliminary characterization of cell-wall-anchored proteins of Staphylococcus epidermidis
Microbiology, May 1, 2005; 151(5): 1453 - 1464.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. G. Conticello, C. J. F. Thomas, S. K. Petersen-Mahrt, and M. S. Neuberger
Evolution of the AID/APOBEC Family of Polynucleotide (Deoxy)cytidine Deaminases
Mol. Biol. Evol., February 1, 2005; 22(2): 367 - 377.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler
GenBank
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D34 - D38.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Pugalenthi, A. Bhaduri, and R. Sowdhamini
GenDiS: Genomic Distribution of protein structural domain Superfamilies
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D252 - D255.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
M. B. Lobocka, D. J. Rose, G. Plunkett III, M. Rusin, A. Samojedny, H. Lehnherr, M. B. Yarmolinsky, and F. R. Blattner
Genome of Bacteriophage P1
J. Bacteriol., November 1, 2004; 186(21): 7032 - 7068.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
I. Alam, A. Dress, M. Rehmsmeier, and G. Fuellen
Comparative homology agreement search: An effective combination of homology-search methods
PNAS, September 21, 2004; 101(38): 13814 - 13819.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. McGinnis and T. L. Madden
BLAST: at the core of a powerful and diverse set of sequence analysis tools
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W20 - W25.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
Q.-l. Wang, S. Chen, N. Esumi, P. K. Swain, H. S. Haines, G. Peng, B. M. Melia, I. McIntosh, J. R. Heckenlively, S. G. Jacobson, et al.
QRX, a novel homeobox gene, modulates photoreceptor gene expression
Hum. Mol. Genet., May 15, 2004; 13(10): 1025 - 1040.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler
GenBank: update
Nucleic Acids Res., January 1, 2004; 32(90001): D23 - 26.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
A. Bhaduri and R. Sowdhamini
A genome-wide survey of human tyrosine phosphatases
Protein Eng. Des. Sel., December 1, 2003; 16(12): 881 - 888.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
L. Papazisi, T. S. Gorton, G. Kutish, P. F. Markham, G. F. Browning, D. K. Nguyen, S. Swartzell, A. Madan, G. Mahairas, and S. J. Geary
The complete genome sequence of the avian pathogen Mycoplasma gallisepticum strain Rlow
Microbiology, September 1, 2003; 149(9): 2307 - 2316.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
K. M. McGinnis, S. G. Thomas, J. D. Soule, L. C. Strader, J. M. Zale, T.-p. Sun, and C. M. Steber
The Arabidopsis SLEEPY1 Gene Encodes a Putative F-Box Subunit of an SCF E3 Ubiquitin Ligase
PLANT CELL, May 1, 2003; 15(5): 1120 - 1130.
[Abstract] [Full Text]


Home page
J. Biol. Chem.Home page
G. K-W. Kong, G. Polekhina, W. J. McKinstry, M. W. Parker, B. Dragani, A. Aceto, D. Paludi, D. R. Principe, B. Mannervik, and G. Stenberg
Contribution of Glycine 146 to a Conserved Folding Module Affecting Stability and Refolding of Human Glutathione Transferase P1-1
J. Biol. Chem., January 3, 2003; 278(2): 1291 - 1302.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler
GenBank
Nucleic Acids Res., January 1, 2003; 31(1): 23 - 27.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
H. van der Wel, H. R. Morris, M. Panico, T. Paxton, A. Dell, L. Kaplan, and C. M. West
Molecular Cloning and Expression of a UDP-N-acetylglucosamine (GlcNAc):Hydroxyproline Polypeptide GlcNAc-transferase That Modifies Skp1 in the Cytoplasm of Dictyostelium
J. Biol. Chem., November 22, 2002; 277(48): 46328 - 46337.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Y.-H. Feng, Y. Sun, and J. G. Douglas
Gbeta gamma -independent constitutive association of Galpha s with SHP-1 and angiotensin II receptor AT2 is essential in AT2-mediated ITIM-independent activation of SHP-1
PNAS, September 17, 2002; 99(19): 12049 - 12054.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
H.-S. Lee, M.-S. Kim, H.-S. Cho, J.-I. Kim, T.-J. Kim, J.-H. Choi, C. Park, H.-S. Lee, B.-H. Oh, and K.-H. Park
Cyclomaltodextrinase, Neopullulanase, and Maltogenic Amylase Are Nearly Indistinguishable from Each Other
J. Biol. Chem., June 7, 2002; 277(24): 21891 - 21897.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, B. A. Rapp, and D. L. Wheeler
GenBank
Nucleic Acids Res., January 1, 2002; 30(1): 17 - 20.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
I. Tatsuno, M. Horie, H. Abe, T. Miki, K. Makino, H. Shinagawa, H. Taguchi, S. Kamiya, T. Hayashi, and C. Sasakawa
toxB Gene on pO157 of Enterohemorrhagic Escherichiacoli O157:H7 Is Required for Full Epithelial Cell Adherence Phenotype<