Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes
Novel genomic sequences can be analyzed either by the self-training program GeneMarkS
(sequences longer than 50 kb) or by GeneMark.hmm with Heuristic models.
For many species pre-trained model parameters are ready and available through the GeneMark.hmm
page. Metagenomic sequences can be analyzed by MetaGeneMark , the program optimized for speed.
Gene Prediction in Eukaryotes
Novel genomes can be analyzed by the program GeneMark-ES utilizing unsupervised training.
Note that GeneMark-ES has a special mode for analyzing fungal genomes.
Recently, we have developed a semi-supervised version of GeneMark-ES, called GeneMark-ET that uses RNA-Seq reads to improve training.
For several species pre-trained model parameters are ready and available through the GeneMark.hmm page.
Gene Prediction in Transcripts
Sets of assembled eukaryotic transcripts can be analyzed by the modified GeneMarkS algorithm
(the set should be large enough to permit self-training).
A single transcript can be analyzed by a special version of GeneMark.hmm with Heuristic models.
A new advanced algorithm GeneMarkS-T was developed recently (manuscript sent to publisher);
The GeneMarkS-T software (beta version) is available for
download .
Gene Prediction in Viruses, Phages and Plasmids
Sequences of viruses, phages or plasmids can be analyzed either by the GeneMark.hmm with Heuristic models
(if the sequence is shorter than 50 kb) or by the self-training program GeneMarkS.
All the software programs mentioned here are available for download and local installation.
The software of GeneMark line is a part of genome annotation pipelines at NCBI, JGI, Broad Institute as well as the following software packages:
QUAST : quality assessment tool for genome assemblies -- using GeneMarkS
MetAMOS : a modular and open source metagenomic assembly and analysis -- using MetaGeneMark
MAKER2 : a eukaryotic genome annotation pipeline -- using GeneMark-ES (along with SNAP and AUGUSTUS)
BRAKER1 : an RNA-seq based eukaryotic genome annotation pipeline -- using GeneMark-ET and AUGUSTUS