GeneMark
A family of gene prediction programs developed at
Georgia Institute of Technology, Atlanta, Georgia, USA.
What's New:
Gene identification in novel eukaryotic genomes by self-training algorithm: GeneMark.hmm-ES
Supported by NIH
NIH

Gene Prediction in Bacteria, Archaea and Metagenomes
The E. coli genome For bacterial and archaeal gene prediction you can use the parallel combination of GeneMark-P and GeneMark.hmm-P. For a novel genome you can use either the Heuristic models option (if the sequence is shorter than 200 kb) or the self-training program GeneMarkS (aka GeneMark.hmm-PS).
Gene Prediction in Eukaryotes
Mouse For eukaryotic gene prediction you can use the parallel combination of GeneMark-E and GeneMark.hmm-E. For a novel genome (the one whose name is not in the list of available models) you can run GeneMark.hmm-ES, the self-training program (just 10MB sequence is needed for training).
Gene Prediction in Viruses, Phages and Plasmids
The HIV virus For novel virus, phage and plasmid gene prediction you can use either the Heuristic approach (if the sequence is shorter than 50 kb) or the self- training program GeneMarkS (aka GeneMark.hmm-PS). Both options will run the parallel combination of GeneMark and GeneMark.hmm.
Gene Prediction in EST and cDNA
Gel To analyze ESTs and cDNAs you can use GeneMark-E.

What the programs do:

The GeneMark-P and GeneMark-E programs determine the protein-coding potential of a DNA sequence (within a sliding window) by using species specific parameters of the Markov models of coding and non-coding regions. This approach allows deliniating local variations of coding potential, therefore, the GeneMark graph shows details of the protein-coding potential distribution along a sequence.

GeneMark is documented as the most accurate prokaryotic gene finder. GeneMark.hmm-P and GeneMark.hmm-E programs are predicting genes and intergenic regions in a sequence as a whole. They use the Hidden Markov models reflecting the "grammar" of gene organization. The GeneMark.hmm (P and E) programs identify the maximum likely parse of the whole DNA sequence into protein coding genes (with possible introns) and intergenic regions.

For more information see Background and References.

Powered by IBM IBM
Borodovsky Group

Gene Prediction
Programs
Information Databases of predicted genes Models for Gene Prediction Other Programs In silico Biology
International
Conferences
Bioinformatics
Studies at
Georgia Tech

Contact Us | Home