A family of gene prediction programs developed at Georgia Institute of Technology , Atlanta, Georgia, USA.
What's New: A new algorithm, GeneMark-ET , an automatically trained eukaryotic gene finder utilizing mapped RNA-Seq reads was published in NAR. Supported by NIH

Gene Prediction in Bacteria, Archaea, Metagenomes and Meta-transcriptomes
circular genome A novel genomic sequence can be analyzed either by the GeneMark.hmm with Heuristic models or by the self-training program GeneMarkS (if the sequence is longer than 50 kb). For many species we have model parameters ready, thus you can use the GeneMark.hmm page. In all the three options a combination of GeneMark.hmm-P* and GeneMark-P* is run. Metagenomic sequences can be analyzed by MetaGeneMark (2010)a program with high accuracy and speed (see downloads).
Gene Prediction in Eukaryotes
mouse For a novel genome we suggest to install and run locally the self-training program GeneMark-ES* (2005) - see downloads. Note that GeneMark-ES* has a special popular mode for analyzing fungal genomes (2008). For some species we have model parameters ready and gene prediction can be done via the web site by a combination of GeneMark.hmm-E* and GeneMark-E*.
Gene Prediction in Transcripts
gel A task of training parameters and gene prediction for a large collection of transcripts can be successfully solved by a special version of GeneMarkS. A special version of GeneMark.hmm with Heuristic models can be used for a single transcript.
Gene Prediction in Viruses, Phages and Plasmids
virus A novel virus, phage or plasmid can be analyzed either by the GeneMark.hmm with Heuristic models (if the sequence is shorter than 50 kb) or by the self-training program GeneMarkS*. Both options will run a parallel combination of GeneMark.hmm and GeneMark.

The software of GeneMark line is a part of genome annotation pipelines at NCBI, JGI, Broad Institute as well as the following software packages:
  • QUAST : quality assessment tool for genome assemblies
    -- using GeneMarkS
  • MetAMOS : a modular and open source metagenomic assembly and analysis
    -- using MetaGeneMark
  • MAKER2 : an annotation pipeline and genome-database management tool for second-generation genome projects
    -- using GeneMark-ES
What the programs do:

The GeneMark-P and GeneMark-E algorithms determine protein-coding potentials, the values of a posteriory probability of protein coding in a DNA fragment in six possible reading frames. The six frame GeneMark graph shows details of the protein-coding potential distribution along the sequence.

The GeneMark.hmm-P and GeneMark.hmm-E algorithms predict prokaryotic (P) and eukaryotic (E) genes in a sequence as a whole. The algorithms use the Hidden Markov models reflecting the "grammar" of gene organization. The GeneMark.hmm (P and E) algorithms identify the maximum likely parse of the whole DNA sequence into protein coding genes (with introns in E case) and intergenic regions.

* "P" stands for "Prokaryotic"
  "E" stands for "Eukaryotic"
  "S" stands for "Self-training"

For more information see Background and Publications.

Borodovsky Group

Gene Prediction
Information Databases of predicted genes Downloads Other Programs In silico Biology
Studies at
Georgia Tech

Contact Us   |   Home