|
Gene Prediction in Bacteria, Archaea and Metagenomes
|
|
For bacterial and archaeal gene prediction you can use the parallel combination of
GeneMark-P and GeneMark.hmm-P.
For a novel genome you can use either the
Heuristic models
option (if the sequence is shorter than 200 kb) or the self-training program
GeneMarkS
(aka GeneMark.hmm-PS).
|
|
Gene Prediction in Eukaryotes
|
|
For eukaryotic gene prediction you can use the parallel combination of
GeneMark-E and GeneMark.hmm-E.
For a novel genome (the one whose name is not in the list of available models) you can run
GeneMark.hmm-ES,
the self-training program (just 10MB sequence is needed for training).
|
|
Gene Prediction in Viruses, Phages and Plasmids
|
|
For novel virus, phage and plasmid gene prediction you can use either the
Heuristic approach (if the sequence is shorter than 50 kb) or the self-
training program GeneMarkS (aka GeneMark.hmm-PS). Both options will run the
parallel combination of GeneMark and GeneMark.hmm.
|
|
Gene Prediction in EST and cDNA
|
|
To analyze ESTs and cDNAs you can use GeneMark-E.
|
|
|
What the programs do:
|
|
The GeneMark-P and GeneMark-E programs determine the protein-coding potential
of a DNA sequence (within a sliding window) by using species specific parameters
of the Markov models of coding and non-coding regions. This approach allows
deliniating local variations of coding potential, therefore, the GeneMark graph
shows details of the protein-coding potential distribution along a sequence.
GeneMark is documented as the most accurate prokaryotic gene finder.
GeneMark.hmm-P and GeneMark.hmm-E programs are predicting genes and intergenic
regions in a sequence as a whole. They use the Hidden Markov models reflecting
the "grammar" of gene organization. The GeneMark.hmm (P and E) programs identify
the maximum likely parse of the whole DNA sequence into protein coding genes
(with possible introns) and intergenic regions.
For more information see
Background
and
References.
|