IPSSP Web Service Instructions Instructions

IPSSP predicts the secondary structure of an amino acid sequence.
It does NOT employ any homology information (such as profiles of multiple alignments). The IPSSP is a single-sequence prediction method.

Input Sequence

Title
A title used for the results. This is an optional field. If the sequence (either pasted in the box or loaded from a file) is in FASTA format, the FASTA title line will be used as the title. Otherwise, expect a title based on the date, such as "Fri Oct 1 12:34:42 EDT 2005".

Sequence Text
The input text should include the sequences to be predicted. Its format should be:

Protein id
Amino acid sequence
Protein id
Amino acid sequence
....

There should not be any breaks between the lines. The amino acid sequence should be entered in one line.
Maximum allowed amino acid sequence length: 10,000.

Sequence File upload
As an alternative to pasting, a sequence can be loaded from a file on your system. The same rules as mentioned in the "Sequence Text" section will apply.

Training Dataset

Two datasets are available to estimate the model parameters. The first one is the PDB_SELECT 2005 by Hobohm and Sander, with no pair of sequences having sequence identity greater than 25% (for alignments of length 80 or more residues). The procedure used to generate the PDB_SELECT list is described in [1-2]. The second set was downloaded from the EVA server ftp site as of 09/2004 [3]. The proteins in this set satisfy the condition that percentage of identity between any pair of sequences should not exceed the length dependent threshold S (for instance for sequences longer than 450 amino acids, S=19.5) [4].

Conversion Rules to Reduce 8 States to 3

The following conversion rules are available for the training set:

(i) EHL mapping: H, G, I to H; E, B to E; S, T, ' ' to L,

(ii) PSIPRED's mapping: H, G to H; E, B to E; I, S, T, ' ' to L,

(iii) CK mapping: H to H; E to E; G, I, B, S, T, ' ' to L.

Length Adjustments

For each conversion rule, it is possible to apply the length adjustments proposed by Frishman and Argos [5], in which short helices and strands are converted to loops so that Hmin = 5, Emin = 3.

References

[1] U.Hobohm, M.Scharf, R.Schneider, C.Sander: Selection of a representative set of structures from the Brookhaven Protein Data Bank, Protein Science 1 (1992), 409-417.
[2] U.Hobohm and C.Sander: Enlarged representative set of protein structures, Protein Science 3 (1994), 522-524.
[3] EVA server FTP site: http://cubic.bioc.columbia.edu/eva/doc/ftp.html.
[4] B. Rost: Twilight zone of protein sequence alignments, Protein Engineering 12 (1999), 85-94.
[5] D. Frishman and P. Argos: Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence, Protein Engineering 9 (1996), 133-142.

Output Options

Email address

Address to email the prediction results to. Prediction results will be sent by email instead display on web, useful for long sequences.

The output file format:

Protein id
Amino acid sequence
Secondary Structure Sequence Result

Example:

>Input Sequence 1
MKAIFVLNAQHDEAVDANSLAEAKVLANRELDKYGVSDYYKNLINNAKTVEGVKALIDEILAALP
LLEEEEELLLLLLLLLLLLHHHHHHHHHHHHHHLLLLHHHHHHHHHLLLHHHHHHHHHHHHHLLL


































Web pages maintained by GeneMark administrator, genemark-admin@amber.biology.gatech.edu.
Please send any suggestions for improvements or problems to the web page maintainer.