Nucleic Acids Research, 2001, Vol. 29, No. 12 2607-2618
© 2001 Oxford University Press
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
1School of Biology and 2School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332-0230, USA and 3Gene Probe, Inc., 883 Heritage Place, Atlanta, GA 30033-4103, USA
Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed.
* To whom correspondence should be addressed at: School of Biology, Georgia Institute of Technology, Atlanta, GA 30332-0230, USA. Tel: +1 404 894 8432; Fax: +1 404 894 0519; Email: mark{at}amber.gatech.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. J. Heidel and G. Glockner Mitochondrial Genome Evolution in the Social Amoebae Mol. Biol. Evol., July 1, 2008; 25(7): 1440 - 1450. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations Biophys. J., June 1, 2008; 94(11): 4173 - 4183. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Muller, F. Leclerc, I. Behm-Ansmant, J.-B. Fourmann, B. Charpentier, and C. Branlant Combined in silico and experimental identification of the Pyrococcus abyssi H/ACA sRNAs and their target sites in ribosomal RNAs Nucleic Acids Res., May 1, 2008; 36(8): 2459 - 2475. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. P. Stinear, T. Seemann, P. F. Harrison, G. A. Jenkin, J. K. Davies, P. D.R. Johnson, Z. Abdellah, C. Arrowsmith, T. Chillingworth, C. Churcher, et al. Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis Genome Res., May 1, 2008; 18(5): 729 - 741. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Ruckdeschel, C. Kirkham, A. J. Lesse, Z. Hu, and T. F. Murphy Mining the Moraxella catarrhalis Genome: Identification of Potential Vaccine Antigens Expressed during Human Infection Infect. Immun., April 1, 2008; 76(4): 1599 - 1607. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Yoshida, K. Nagasaki, Y. Takashima, Y. Shirai, Y. Tomaru, Y. Takao, S. Sakamoto, S. Hiroishi, and H. Ogata Ma-LMM01 Infecting Toxic Microcystis aeruginosa Illuminates Diverse Cyanophage Genome Strategies J. Bacteriol., March 1, 2008; 190(5): 1762 - 1772. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Moreno-Hagelsieb and K. Latimer Choosing BLAST options for better detection of orthologs as reciprocal best hits Bioinformatics, February 1, 2008; 24(3): 319 - 324. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.-Q. Hu, X. Zheng, Y.-F. Yang, P. Ortet, Z.-S. She, and H. Zhu ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes Nucleic Acids Res., January 11, 2008; 36(suppl_1): D114 - D119. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. McCauley, S. de Groot, T. Mailund, and J. Hein Annotation of selection strengths in viral genomes Bioinformatics, November 15, 2007; 23(22): 2978 - 2986. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ibrahim, P. Nicolas, P. Bessieres, A. Bolotin, V. Monnet, and R. Gardan A genome-wide survey of short coding sequences in streptococci Microbiology, November 1, 2007; 153(11): 3631 - 3644. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Sulakhe, M. D'Souza, M. Syed, A. Rodriguez, Y. Zhang, E. M. Glass, M. F. Romine, and N. Maltsev GNARE--a grid-based server for the analysis of user submitted genomes Nucleic Acids Res., May 25, 2007; (2007) gkm366v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Yukawa, C. A. Omumasaba, H. Nonaka, P. Kos, N. Okai, N. Suzuki, M. Suda, Y. Tsuge, J. Watanabe, Y. Ikeda, et al. Comparative analysis of the Corynebacterium glutamicum group and complete genome sequence of strain R Microbiology, April 1, 2007; 153(4): 1042 - 1058. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Delcher, K. A. Bratke, E. C. Powers, and S. L. Salzberg Identifying bacterial genes and endosymbiont DNA with Glimmer Bioinformatics, March 15, 2007; 23(6): 673 - 679. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer GISMO--gene identification using a support vector machine for ORF classification Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hart, M. Ackermann, G. Jayawardane, G. Russell, D. M. Haig, H. Reid, and J. P. Stewart Complete sequence and analysis of the ovine herpesvirus 2 genome J. Gen. Virol., January 1, 2007; 88(1): 28 - 39. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, J. Park, and T. Takagi MetaGene: prokaryotic gene finding from environmental genome shotgun sequences Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Makarova, A. Slesarev, Y. Wolf, A. Sorokin, B. Mirkin, E. Koonin, A. Pavlov, N. Pavlova, V. Karamychev, N. Polouchine, et al. Comparative genomics of the lactic acid bacteria PNAS, October 17, 2006; 103(42): 15611 - 15616. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Bulach, R. L. Zuerner, P. Wilson, T. Seemann, A. McGrath, P. A. Cullen, J. Davis, M. Johnson, E. Kuczek, D. P. Alt, et al. Genome reduction in Leptospira borgpetersenii reflects limited transmission potential PNAS, September 26, 2006; 103(39): 14560 - 14565. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gimmestad, M. Steigedal, H. Ertesvag, S. Moreno, B. E. Christensen, G. Espin, and S. Valla Identification and Characterization of an Azotobacter vinelandii Type I Secretion System Responsible for Export of the AlgE-Type Mannuronan C-5-Epimerases. J. Bacteriol., August 1, 2006; 188(15): 5551 - 5560. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. L. Bannam, W. L. Teng, D. Bulach, D. Lyras, and J. I. Rood Functional Identification of Conjugation and Replication Regions of the Tetracycline Resistance Plasmid pCW3 from Clostridium perfringens J. Bacteriol., July 1, 2006; 188(13): 4942 - 4951. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. McCauley and J. Hein Using hidden Markov models and observed evolution to annotate viral genomes Bioinformatics, June 1, 2006; 22(11): 1308 - 1316. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-S. Chung, C.-H. Chen, M.-Y. Ho, C.-Y. Huang, C.-L. Liao, and W. Chang Vaccinia Virus Proteome: Identification of Proteins in Vaccinia Virus Intracellular Mature Virion Particles J. Virol., March 1, 2006; 80(5): 2127 - 2140. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wietzorrek, H. Schwarz, C. Herrmann, and V. Braun The Genome of the Novel Phage Rtp, with a Rosette-Like Tail Tip, Is Homologous to the Genome of Phage T1 J. Bacteriol., February 15, 2006; 188(4): 1419 - 1436. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Yoshida, Y. Takashima, Y. Tomaru, Y. Shirai, Y. Takao, S. Hiroishi, and K. Nagasaki Isolation and Characterization of a Cyanophage Infecting the Toxic Cyanobacterium Microcystis aeruginosa Appl. Envir. Microbiol., February 1, 2006; 72(2): 1239 - 1247. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Tyler and A. Severini The Complete Genome Sequence of Herpesvirus Papio 2 (Cercopithecine Herpesvirus 16) Shows Evidence of Recombination Events among Various Progenitor Herpesviruses J. Virol., February 1, 2006; 80(3): 1214 - 1221. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nielsen and A. Krogh Large-scale prokaryotic gene prediction and comparison to genome annotation Bioinformatics, December 15, 2005; 21(24): 4322 - 4329. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. L. Poole II, B. A. Gerwe, R. C. Hopkins, G. J. Schut, M. V. Weinberg, F. E. Jenney Jr., and M. W. W. Adams Defining Genes in the Genome of the Hyperthermophilic Archaeon Pyrococcus furiosus: Implications for All Microbial Genomes J. Bacteriol., November 1, 2005; 187(21): 7325 - 7332. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Likos, S. A. Sammons, V. A. Olson, A. M. Frace, Y. Li, M. Olsen-Rasmussen, W. Davidson, R. Galloway, M. L. Khristova, M. G. Reynolds, et al. A tale of two clades: monkeypox viruses J. Gen. Virol., October 1, 2005; 86(10): 2661 - 2672. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Tech, N. Pfeifer, B. Morgenstern, and P. Meinicke TICO: a tool for improving predictions of prokaryotic translation initiation sites Bioinformatics, September 1, 2005; 21(17): 3568 - 3569. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Besemer and M. Borodovsky GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses Nucleic Acids Res., July 1, 2005; 33(suppl_2): W451 - W454. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Brocchieri, T. N. Kledal, S. Karlin, and E. S. Mocarski Predicting Coding Potential from Genome Sequence: Application to Betaherpesviruses Infecting Rats and Mice J. Virol., June 15, 2005; 79(12): 7570 - 7596. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. W. Schmidt, J. T. Nelson, D. A. Rasko, S. Sudek, J. A. Eisen, M. G. Haygood, and J. Ravel Patellamide A and C biosynthesis by a microcin-like pathway in Prochloron didemni, the cyanobacterial symbiont of Lissoclinum patella PNAS, May 17, 2005; 102(20): 7315 - 7320. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. H. Mann, M. R. J. Clokie, A. Millard, A. Cook, W. H. Wilson, P. J. Wheatley, A. Letarov, and H. M. Krisch The Genome of S-PM2, a "Photosynthetic" T4-Type Bacteriophage That Infects Marine Synechococcus Strains J. Bacteriol., May 1, 2005; 187(9): 3188 - 3200. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. E. Collins, J. Liebenberg, E. P. de Villiers, K. A. Brayton, E. Louw, A. Pretorius, F. E. Faber, H. van Heerden, A. Josemans, M. van Kleef, et al. The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number PNAS, January 18, 2005; 102(3): 838 - 843. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hou, J. H. Saw, K. S. Lee, T. A. Freitas, C. Belisle, Y. Kawarabayasi, S. P. Donachie, A. Pikina, M. Y. Galperin, E. V. Koonin, et al. Genome sequence of the deep-sea {gamma}-proteobacterium Idiomarina loihiensis reveals amino acid fermentation as a source of carbon and energy PNAS, December 28, 2004; 101(52): 18036 - 18041. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Glockner, R. Lehmann, A. Romualdi, S. Pradella, U. Schulte-Spechtel, M. Schilhabel, B. Wilske, J. Suhnel, and M. Platzer Comparative analysis of the Borrelia garinii genome Nucleic Acids Res., November 16, 2004; 32(20): 6038 - 6046. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Kattenhorn, R. Mills, M. Wagner, A. Lomsadze, V. Makeev, M. Borodovsky, H. L. Ploegh, and B. M. Kessler Identification of Proteins Associated with Murine Cytomegalovirus Virions J. Virol., October 15, 2004; 78(20): 11187 - 11197. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Greer-Phillips, B. B. Stephens, and G. Alexandre An Energy Taxis Transducer Promotes Root Colonization by Azospirillum brasilense J. Bacteriol., October 1, 2004; 186(19): 6595 - 6604. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Lindell, M. B. Sullivan, Z. I. Johnson, A. C. Tolonen, F. Rohwer, and S. W. Chisholm Transfer of photosynthesis genes to and from Prochlorococcus viruses PNAS, July 27, 2004; 101(30): 11013 - 11018. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Harte, V. Silventoinen, E. Quevillon, S. Robinson, K. Kallio, X. Fustero, P. Patel, P. Jokinen, and R. Lopez Public web-based services from the European Bioinformatics Institute Nucleic Acids Res., July 1, 2004; 32(suppl_2): W3 - W9. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Malek, J. M. Wierzbowski, W. Tao, S. A. Bosak, D. J. Saranga, L. Doucette-Stamm, D. R. Smith, P. J. McEwan, and K. J. McKernan Protein interaction mapping on a functional shotgun sequence of Rickettsia sibirica Nucleic Acids Res., February 10, 2004; 32(3): 1059 - 1064. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. R. Streit, R. A. Schmitz, X. Perret, C. Staehelin, W. J. Deakin, C. Raasch, H. Liesegang, and W. J. Broughton An Evolutionary Hot Spot: the pNGR234b Replicon of Rhizobium sp. Strain NGR234 J. Bacteriol., January 15, 2004; 186(2): 535 - 542. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. G. Klupp, C. J. Hengartner, T. C. Mettenleiter, and L. W. Enquist Complete, Annotated Sequence of the Pseudorabies Virus Genome J. Virol., January 1, 2004; 78(1): 424 - 440. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mills, M. Rozanov, A. Lomsadze, T. Tatusova, and M. Borodovsky Improving gene annotation of complete viral genomes Nucleic Acids Res., December 1, 2003; 31(23): 7041 - 7055. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lithwick and H. Margalit Hierarchy of Sequence-Dependent Features Associated With Prokaryotic Translation Genome Res., December 1, 2003; 13(12): 2665 - 2673. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Dufresne, M. Salanoubat, F. Partensky, F. Artiguenave, I. M. Axmann, V. Barbe, S. Duprat, M. Y. Galperin, E. V. Koonin, F. Le Gall, et al. From the Cover: Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome PNAS, August 19, 2003; 100(17): 10020 - 10025. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Mignone, G. Grillo, S. Liuni, and G. Pesole Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis Nucleic Acids Res., August 1, 2003; 31(15): 4639 - 4645. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Schiex, J. Gouzy, A. Moisan, and Y. de Oliveira FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences Nucleic Acids Res., July 1, 2003; 31(13): 3738 - 3741. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-B. Guo, H.-Y. Ou, and C.-T. Zhang ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes Nucleic Acids Res., March 15, 2003; 31(6): 1780 - 1789. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Ma, A. Campbell, and S. Karlin Correlations between Shine-Dalgarno Sequences and Gene Features Such as Predicted Expression Levels and Operon Structures J. Bacteriol., October 15, 2002; 184(20): 5733 - 5745. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Walker, V. Pavlovic, and S. Kasif A comparative genomic method for computational identification of prokaryotic translation initiation sites Nucleic Acids Res., July 15, 2002; 30(14): 3181 - 3191. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. I. Slesarev, K. V. Mezhevaya, K. S. Makarova, N. N. Polushin, O. V. Shcherbinina, V. V. Shakhova, G. I. Belova, L. Aravind, D. A. Natale, I. B. Rogozin, et al. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens PNAS, April 2, 2002; 99(7): 4644 - 4649. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nicolas, L. Bize, F. Muri, M. Hoebeke, F. Rodolphe, S. D. Ehrlich, B. Prum, and P. Bessieres Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models Nucleic Acids Res., March 15, 2002; 30(6): 1418 - 1426. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Fitz-Gibbon, H. Ladner, U.-J. Kim, K. O. Stetter, M. I. Simon, and J. H. Miller Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum PNAS, January 9, 2002; (2002) 241636498. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Fitz-Gibbon, H. Ladner, U.-J. Kim, K. O. Stetter, M. I. Simon, and J. H. Miller Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum PNAS, January 22, 2002; 99(2): 984 - 989. [Abstract] |











