BLAST: at the core of a powerful and diverse set of sequence analysis tools
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
* To whom correspondence should be addressed. Tel: +1 301 435 5945; Email: mcginnis{at}ncbi.nlm.nih.gov
Received February 20, 2004; Revised April 2, 2004; Accepted April 14, 2004
| ABSTRACT |
|---|
|
|
|---|
Basic Local Alignment Search Tool (BLAST) is one of the most heavily used sequence analysis tools available in the public domain. There is now a wide choice of BLAST algorithms that can be used to search many different sequence databases via the BLAST web pages (http://www.ncbi.nlm.nih.gov/BLAST/). All the algorithmdatabase combinations can be executed with default parameters or with customized settings, and the results can be viewed in a variety of ways. A new online resource, the BLAST Program Selection Guide, has been created to assist in the definition of search strategies. This article discusses optimal search strategies and highlights some BLAST features that can make your searches more powerful.
| INTRODUCTION |
|---|
|
|
|---|
Basic Local Alignment Search Tool (BLAST) is a sequence similarity search program that can be used via a web interface or as a stand-alone tool (1,2). There are several types of BLAST to compare all combinations of nucleotide or protein queries with nucleotide or protein databases. BLAST is a heuristic that finds short matches between two sequences and attempts to start alignments from these hot spots. In addition to performing alignments, BLAST provides statistical information to help decipher the biological significance of the alignment; this is the expect value, or false-positive rate.
The BLAST server at the National Center for Biotechnology Information (NCBI) now has a diverse set of features that can add power to your BLAST searching. The BLAST homepage (http://www.ncbi.nlm.nih.gov/BLAST/) lists the varieties of BLAST searches by type: Nucleotide, Protein, Translated and Genomes. Table 1 documents the default parameters for each link. In the online version of this table (http://www.ncbi.nlm.nih.gov/blast/link_params.html), each cell of the top row and leftmost column of the online version is hyperlinked to a description of that column or row.
|
The Program Selection Guide can assist in the selection of search type and databases (http://www.ncbi.nlm.nih.gov/BLAST/producttable.shtml). The default nucleotide database used is nt, i.e. GenBank without the high-throughput, patent, genomic or sequence tagged site (STS) sequences (see http://www.ncbi.nlm.nih.gov/BLAST/Why.shtml#NUC_ for more details). The default protein database is nr: a non-redundant set of all the non-patent sequences; i.e. sequences that are exactly the same over their entire length are merged into one database entry, although information about the sequences that make up the entry is preserved (see http://www.ncbi.nlm.nih.gov/BLAST/Why.shtml#PROT_DB for more details).
When the query is submitted, either as a sequence in FASTA format or as a sequence identifier, e.g. GenBank accession.version, the search is sent to the BLAST server and a Request Identifier (RID) is returned. The query and results are stored in a structured format for up to 24 h after an RID is issued. The RID identifies the query and allows the results to be viewed in several formats, which include the familiar BLAST report, a simplified hit table, XML and ASN.1 [(3) and http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.610]. The number of outstanding jobs from one IP address is taken into account when queuing requests, as described at http://www.ncbi.nlm.nih.gov/BLAST/blast_FAQs.shtml#Queuetime, so that one user does not monopolize the entire service. Infrequently a user will be blocked for an excessive number of queries; typically, this is some user who has become overly enthusiastic with a PERL script rather than a deliberate denial of service attack. Blocking is performed on a case-by-case basis. A user scripting queries to the BLAST server is advised to wait 3 s between queries; not to poll for results for any given RID more often than every 2 min; and to run all scripts between the hours of 9 p.m. and 5 a.m. EST USA. Users anticipating a large volume of searches (several hundred or more) may wish to email blast-help{at}ncbi.nlm.nih.gov for assistance in formulating their queries and advice on submitting searches.
| SEARCH STRATEGIES |
|---|
|
|
|---|
Using the default settings for a BLAST search is a sensible approach because they should give the best all-round results. Moving beyond the default settings by changing the type of search and the search parameters requires a strategic approach. Generally, there is a tradeoff between speed and sensitivity and a user should try to use the fastest set of parameters sensitive enough for the task at hand. Use of overly sensitive settings, especially with long queries or very large databases, can mean an excessive wait time for the user or that the job will exceed the CPU resource limit on the server and only an error message will be returned to the user. The BLAST web page encourages optimal parameter setting by offering a number of links for specific purposes, described in Table 1. If the goal is identification of a sequence or an intra-organism comparison, then it is best to use a fast and stringent search. Otherwise, it might be necessary to use more sensitive settings which normally come at a cost in terms of time taken to run the search. In this section we discuss the items in Table 1 under Nucleotide and Protein. We discuss other sections of the table as appropriate in the rest of this article.
The speed and sensitivity of nucleotidenucleotide searches varies most dramatically with the word-size and the type of gapped extension. The fastest program is megaBLAST, which defaults to a large word-size (an exact match of 28 bases is required to initiate an extension) and a greedy gapped extension algorithm (4) that has no gap existence cost but merely a gap extension cost, making it ideal for comparing similar sequences (e.g. from the same organism). More sensitive is a search with discontiguous megaBLAST, which uses discontiguous word-matches (not all bases in a word are required to match) to initiate extensions (5,6). It also uses a non-greedy extension, an option that is appropriate for comparisons up to 80% identity. Roughly as sensitive but generally slower (especially for longer queries) is the standard BLASTN, which uses an 11-base contiguous word to initiate extensions. Very slow is the option for short nucleotide searches. This option is intended only for very short sequences that contain little information and might otherwise not find any hits. Using this option with a query longer than 50 bases will probably exceed the server's CPU resource limit.
The fastest way to identify the function of a protein is to perform a CDD search (7), which uses a database of motifs to characterize conserved-domains in a protein sequence. This normally takes just a few seconds and a CDD search is actually performed for every proteinprotein search by default. The standard proteinprotein search option provides good all-round search parameters. Use of the PSI-BLAST page allows the user to initiate an iterative search (2) that produces a position-specific scoring matrix for further searches. PHI-BLAST searches a database looking only for alignments that include a specified pattern (8). The short sequence option makes use of the PAM30 matrix, which is recommended for short sequences that contain little information (9).
Another consideration is which dataset to search; a database consisting of well-curated sequences will return database matches that are more accurately annotated and contain fewer sequencing errors or vector contamination. Another, more subtle issue, concerns the expect value for the matches found. The expect value indicates the validity of the match: the smaller the expect value, the more likely the match is good and represents real similarity rather than a chance match (see http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html for more details). The expect value scales roughly with the size of the database; therefore, if it is a database in which 90% of the sequences are not of interest, e.g. they are from the wrong species, then the expect value of all hits is increased by a factor of 10, i.e. the false-positive rate will be higher.
| SEARCH AN ENTIRE GENOME |
|---|
|
|
|---|
Traditionally, users have chosen nt or nr for their searches. Often this is no longer the optimal choice. nt contained
10 billion bases as of February 2004, an increase of
20% from February 2003; nr contained
540 million residues as of February 2004, an increase of
25% from a year earlier (C. Camacho, personal communication). Recent announcements (see http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowSection&rid=coffeebrk.chapter.622 and references therein) indicate that this trend will continue. The human genome database contains
3 billion bases; for users interested only in human sequences, searching the human genome rather than nt is about three times as efficient. The protein subset of the human genome is also only
12 million residues. Nucleotidenucleotide searches on the human genome BLAST page include filtering for human repeats using a mini-BLAST search (T. Madden, personal communication) alongside low-complexity filtering by default. The default expect value cutoff on the genome page is set to a more conservative 0.01, rather than the default 10, under the assumption that the search of a genome is very targeted. Results for the search of an mRNA for the MEN1 gene [GenBank accession number U93236
[GenBank]
; (10)] against the human genome are presented in Figure 1.
|
| LIMIT BY ENTREZ QUERY |
|---|
|
|
|---|
The sequence data at NCBI are divided into separate databases for BLAST searching [e.g. expressed sequence tags (ESTs), Trace Archives and whole genomes]. Further restriction can be applied using the Limit by Entrez Query option on the BLAST search pages. For example, the search can be limited to a specific organism or to taxonomic groups, such as Mammals or Archaea. Any query in the Entrez search format (http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_Fields_and_Qualifiers) can be used to restrict the search.
Consider the sequence from GenPept accession.version AAH04246
[GenBank]
1, described as the human homolog of an Escherichia coli DNA mismatch repair protein. When searching against the nr database with no restriction by organism or other criteria and using the default display limit of 100 database sequences, no hits to E.coli are found. However, if the search is limited to E.coli by selecting it from the pull-down list of organisms, there are 47 matches to E.coli. The top-scoring match is a 343-residue alignment between the query and a sequence entitled mismatch repair protein (GenPept accession number AAM82372
[GenBank]
with an expect value of 1.0e-50. To limit the results still further, use an additional Entrez query. For example, to search only E.coli sequences from the Reference Sequence collection [(11) and http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.681], use the Entrez query
![]() |
| USE THE PROGRAM SELECTION GUIDE |
|---|
|
|
|---|
The number of BLAST programs and databases now available can make choosing a search strategy a daunting task. To address this, a new tool called the Program Selection Guide (http://www.ncbi.nlm.nih.gov/BLAST/producttable.shtml) has been designed to assist users. It has been organized on three basic characteristics: (i) the nature of the query sequence, (ii) the purpose of the search and (iii) the dataset intended as the target of the search. An example of how to use the Guide with a nucleotide query is shown in Figure 2.
|
| USE megaBLAST FOR MULTIPLE QUERIES |
|---|
|
|
|---|
A common function in high-throughput sequencing projects is to group nucleotides of related function together. A reasonable approach is to first find the very obvious similarities with a fast algorithm (using a nucleotidenucleotide comparison with a large word-size), and then to use more sensitive algorithms on the sequences that did not have strong matches in the earlier step (e.g. using a smaller word-size or a translating search). As discussed above, megaBLAST was created specifically for the task of efficiently looking for very similar sequences. megaBLAST scans the database once for a large number of queries, making the search very fast. As an example, the 200 Cyprinus carpio expressed sequence tag sequences from Savan and Sakai (12) (GenBank accession numbers AU183343AU183542) were downloaded from the NCBI website and concatenated into one FASTA file; then the file was uploaded into the megaBLAST page using Browse/Load query file from disk. The expect value was changed from its default of 10 to 1.0e5, and the database was left as nt. By default, megaBLAST returns output in the form of a hit table [(3) and http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.610], which lists the sequence identifiers for each sequence and the start and stop positions for each alignment, as well as the score, expect value and percentage identity matched. The hit table format can also be downloaded in plain text to be analyzed locally. The normal BLAST report, with full alignments and sequence, can also be viewed using the RID. On a run in January 2004, 93 of the 200 queries had at least 1 match to the nt database. This compares well with the analysis by Savan and Sakai (12), who used nucleotidenucleotide and translating searches to find strong similarities to 129 of the query sequences.
| EXPLORE ALL THE HYPERLINKS IN THE BLAST OUTPUT |
|---|
|
|
|---|
Additional information on the sequences found by BLAST has traditionally been found through links to GenBank or GenPept from the sequence identifiers of the hits. From the GenBank record, it is possible to navigate to other resources on the same sequence; however, this usually involves several steps (or mouse clicks). The LinkOut icons on the BLAST report provide a shortcut to collections of related information, which can be a powerful tool in itself. For example, when a proteinprotein comparison of the E.coli mismatch repair protein found earlier (GenPept accession number AAM82372 [GenBank] is performed against nr but limited to Mus musculus, some very strong hits have uninformative definition lines (Figure 3). Selecting LinkOut is immediately helpful: GenPept accession.version AAH40784 [GenBank] 1 is described as a mutS homolog 3, Msh3, and DDBJ GenPept accession.version BAB27085 [GenBank] 1 is described as a mutS homolog 6, with the gene symbol Msh6.
|
Links to structural information (S) and UniGene (U) may also be found on a BLAST report.
| ALTERNATIVE VIEWS OF BLAST RESULTS |
|---|
|
|
|---|
Rather than looking at BLAST results as a series of pair-wise alignments, it is sometimes useful to view the query lined up against a number of retrieved database sequences. This can be particularly useful for finding or observing conserved motifs. The query-anchored alignments provide this view (Figure 4). A dinucleotide-binding motif positioned at bases 1121 of Rab Escort Protein is characterized by bulky hydrophobic residues followed by a glycine-rich loop (13). The pattern of the conserved motif becomes much clearer when the query-anchored alignment view is selected.
|
| IMPLEMENTATION DETAILS |
|---|
|
|
|---|
Searches sent to the BLAST server are handled by a sophisticated system that makes use of a farm of mostly two-CPU machines running LINUX; there are currently about 200 CPUs available, double the number used 2 years ago, For a given query the system splits the database into a number of chunks (typically 1020) and spreads the calculations across multiple back-end machines. This system also tracks which database chunk has most recently been searched on a given back-end (and is probably still in memory) so it can send another search against the same chunk. The system stores queries, results and various statistics in a pair of machines running Microsoft SQL Server 2000, which can also generate reports on the current state of the system. For example, it is possible to track the number of failed requests organized by any number of criteria such as the database searched, the program used and the back-end machine, allowing, quick diagnosis respectively of BLAST database corruption, issues with a certain part of the algorithm and problems with an individual back-end machine.
| FUTURE DIRECTIONS |
|---|
|
|
|---|
The number of BLAST queries sent to the server continues to increase, growing from about 100 000 per weekday at the beginning of 2002 to about 140 000 per weekday in early 2004. As described above, the BLAST databases also continue to grow. In order to keep pace with this growth the computing power of the BLAST website will probably double over the course of the next year or two. A new BLAST report formatter is currently being written and to now available on the website. Currently this formatter can present regions masked by filtering as lowercase letters or in different colors. Another enhancement, still at the discussion stage, is on-the-fly title generation for alignments involving very long database sequences that might code for many different genes and typically have uninformative (generic) definition lines. Automatically generated information about the genes or coding regions in the area covered by an such an alignment could be presented to the user. Improvements in web navigability will attempt to steer users to the appropriate settings or link for a given need. This may include links for specialized purposes, e.g. search only mRNAs or a specific taxonomic node.
| Notes |
|---|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated.
| REFERENCES |
|---|
|
|
|---|
- Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. ( (1990) ) Basic local alignment search tool. J. Mol. Biol., , 215, , 403410.[CrossRef][ISI][Medline]
- Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. ( (1997) ) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., , 25, , 33893402.
[Abstract/Free Full Text] - Madden,T.L. ( (2002) ) The BLAST sequence analysis tool. In McEntyre,J. (ed.), The NCBI Handbook [Internet]. National Library of Medicine (US), National Center for Biotechnology Information, Bethesda, MD.
- Zhang,Z., Schwartz,S., Wagner,L. and Miller,W. ( (2000) ) A greedy algorithm for aligning DNA sequences. J. Comput. Biol., , 7, , 203214.[CrossRef][ISI][Medline]
- Ma,B., Tromp,J. and Li,M. ( (2002) ) PatternHunter: faster and more sensitive homology search. Bioinformatics, , 18, , 440445.
[Abstract/Free Full Text] - Buehler,J. ( (2001) ) Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, , 17, , 419428.
[Abstract/Free Full Text] - Marchler-Bauer,A., Panchenko,A.R., Shoemaker,B.A., Thiessen,P.A., Geer,L.Y. and Bryant,S.H. ( (2002) ) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res., , 30, , 281283.
[Abstract/Free Full Text] - Zheng,Z., Schaffer,A.A., Miller,W., Madden,T.L., Lipman,D.J., Koonin,E.V. and Altschul,S.F. ( (1998) ) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res., , 26, , 39863990.
[Abstract/Free Full Text] - Altschul,S.F. ( (1991) ) Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol., , 219, , 555565.[CrossRef][ISI][Medline]
- Chandrasekharappa,S.C., Guru,S.C., Manickam,P., Olufemi,S.E., Collins,F.S., Emmert-Buck,M.R., Debelenko,L.V., Zhuang,Z., Lubensky,I.A., Liotta,L.A. et al. ( (1997) ) Positional cloning of the gene for multiple endocrine neoplasia-type 1. Science, , 276, , 404407.
[Abstract/Free Full Text] - Pruitt,K.D., Tatusova,T. and Ostell,J. ( (2002) ) The Reference Sequence (RefSeq) project. In McEntyre,J. (ed.), The NCBI Handbook [Internet]. National Library of Medicine (US), National Center for Biotechnology Information, Bethesda, MD.
- Savan,R. and Sakai,M. ( (2002) ) Analysis of expressed sequence tags (EST) obtained from common carp, Cyprinus carpio L., head kidney cells after stimulation by two mitogens, lipopolysaccharide and concanavalin-A. Comp. Biochem. Physiol. B Biochem. Mol. Biol., , 131, , 7182.[CrossRef][Medline]
- Koonin,E.V. ( (1996) ) Human choroideremia protein contains a FAD-binding domain. Nature Genet., , 12, , 237239.[CrossRef][ISI][Medline]
This article has been cited by other articles:
![]() |
Y. Chen, F. Zhou, G. Li, and Y. Xu A Recently Active Miniature Inverted-Repeat Transposable Element, Chunjie, Inserted Into an Operon Without Disturbing the Operon Structure in Geobacter uraniireducens Rf4 Genetics, August 1, 2008; 179(4): 2291 - 2297. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Johnson, I. Zaretskaya, Y. Raytselis, Y. Merezhuk, S. McGinnis, and T. L. Madden NCBI BLAST: a better web interface Nucleic Acids Res., July 1, 2008; 36(suppl_2): W5 - W9. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. C. Sutcliffe, G. W. Black, and D. J. Harrington Bioinformatic insights into the biosynthesis of the Group B carbohydrate in Streptococcus agalactiae Microbiology, May 1, 2008; 154(5): 1354 - 1363. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Duret, J. Cohen, C. Jubin, P. Dessen, J.-F. Gout, S. Mousset, J.-M. Aury, O. Jaillon, B. Noel, O. Arnaiz, et al. Analysis of sequence variability in the macronuclear DNA of Paramecium tetraurelia: A somatic view of the germline Genome Res., April 1, 2008; 18(4): 585 - 596. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Andersen, W. Vongsangnak, G. Panagiotou, M. P. Salazar, L. Lehmann, and J. Nielsen A trispecies Aspergillus microarray: Comparative transcriptomics of three Aspergillus species PNAS, March 18, 2008; 105(11): 4387 - 4392. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Huck, M. Sonnen, and K. J. Boor Tracking Heat-Resistant, Cold-Thriving Fluid Milk Spoilage Bacteria from Farm to Packaged Product J Dairy Sci, March 1, 2008; 91(3): 1218 - 1228. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Cuthbertson, Y. Liao, L. Birnbaumer, and P. J. Blackshear Characterization of zfs1 as an mRNA-binding and -destabilizing Protein in Schizosaccharomyces pombe J. Biol. Chem., February 1, 2008; 283(5): 2586 - 2594. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M.C. Robb, E. Ross, and A. S. Alvarado SmedGD: the Schmidtea mediterranea genome database Nucleic Acids Res., January 11, 2008; 36(suppl_1): D599 - D606. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wanchana, S. Thongjuea, V. J. Ulat, M. Anacleto, R. Mauleon, M. Conte, M. Rouard, M. Ruiz, N. Krishnamurthy, K. Sjolander, et al. The Generation Challenge Programme comparative plant stress-responsive gene catalogue Nucleic Acids Res., January 11, 2008; 36(suppl_1): D943 - D946. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Park, B. Park, K. Jung, S. Jang, K. Yu, J. Choi, S. Kong, J. Park, S. Kim, H. Kim, et al. CFGP: a web-based, comparative fungal genomics platform Nucleic Acids Res., January 11, 2008; 36(suppl_1): D562 - D571. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. K. Fry, J. Duncan, M. T. Edwards, R. E. Tilley, D. Chitnavis, R. Harman, H. Hammerton, and L. Dainton A UK clinical isolate of Bordetella hinzii from a patient with myelodysplastic syndrome J. Med. Microbiol., December 1, 2007; 56(12): 1700 - 1703. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Bosch, M. Caceres, M. F. Cardone, A. Carreras, E. Ballana, M. Rocchi, L. Armengol, and X. Estivill Characterization and evolution of the novel gene family FAM90A in primates originated by multiple duplication and rearrangement events Hum. Mol. Genet., November 1, 2007; 16(21): 2572 - 2582. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Lin, P. J. Bledsoe, and V. Stewart Activation of yeaR-yoaG Operon Transcription by the Nitrate-Responsive Regulator NarL Is Independent of Oxygen- Responsive Regulator Fnr in Escherichia coli K-12 J. Bacteriol., November 1, 2007; 189(21): 7539 - 7548. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Huck, B. H. Hammond, S. C. Murphy, N. H. Woodcock, and K. J. Boor Tracking Spore-Forming Bacterial Contaminants in Fluid Milk-Processing Systems J Dairy Sci, October 1, 2007; 90(10): 4872 - 4883. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Guimaraes, N. F. Azevedo, C. Figueiredo, C. W. Keevil, and M. J. Vieira Development and Application of a Novel Peptide Nucleic Acid Probe for the Specific Detection of Helicobacter pylori in Gastric Biopsy Specimens J. Clin. Microbiol., September 1, 2007; 45(9): 3089 - 3094. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Deng, D. C. Nickle, G. H. Learn, B. Maust, and J. I. Mullins ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user's datasets Bioinformatics, September 1, 2007; 23(17): 2334 - 2336. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Kolkman, S. T. Berry, A. J. Leon, M. B. Slabaugh, S. Tang, W. Gao, D. K. Shintani, J. M. Burke, and S. J. Knapp Single Nucleotide Polymorphisms and Linkage Disequilibrium in Sunflower Genetics, September 1, 2007; 177(1): 457 - 468. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Siddiqui, D. A. Mangus, T.-C. Chang, J.-M. Palermino, A.-B. Shyu, and K. Gehring Poly(A) Nuclease Interacts with the C-terminal Domain of Polyadenylate-binding Protein Domain from Poly(A)-binding Protein J. Biol. Chem., August 24, 2007; 282(34): 25067 - 25075. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. A. Kibbe OligoCalc: an online oligonucleotide properties calculator Nucleic Acids Res., July 13, 2007; 35(suppl_2): W43 - W46. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Forner, B. Weber, S. Thuss, S. Wildum, and S. Binder Mapping of mitochondrial mRNA termini in Arabidopsis thaliana: t-elements contribute to 5' and 3' end formation Nucleic Acids Res., June 28, 2007; 35(11): 3676 - 3692. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. S. Ng and S. K. Mishra De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures Bioinformatics, June 1, 2007; 23(11): 1321 - 1330. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gojobori, H. Tang, J. M. Akey, and C.-I Wu Adaptive evolution in humans revealed by the negative correlation between the polymorphism and fixation phases of evolution PNAS, March 6, 2007; 104(10): 3907 - 3912. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-H. Li, L. Lin, X.-L. Wang, and T. Liu Protein protein interaction site prediction based on conditional random fields Bioinformatics, March 1, 2007; 23(5): 597 - 604. [Abstract] [ |
















