IPSSP predicts the secondary structure of an amino acid sequence.
It does NOT employ any homology information (such as profiles of multiple alignments). The IPSSP is a single-sequence prediction method.
Input Sequence
Title
A title used for the results. This is an optional field. If the
sequence (either pasted in the box or loaded from a file) is
in FASTA format, the FASTA title line will be used as the title.
Otherwise, expect a title based on the date, such as
"Fri Oct 1 12:34:42 EDT 2005".
Sequence Text
The input text should include the sequences to be predicted. Its format should be:
Protein id
Amino acid sequence
Protein id
Amino acid sequence
....
There should not be any breaks between the lines. The amino acid sequence should be entered in one line.
Maximum allowed amino acid sequence length: 10,000.
Sequence File upload
As an alternative to pasting, a sequence can be loaded from a file on
your system. The same rules as mentioned in the "Sequence Text" section
will apply.
Training Dataset
Two datasets are available to estimate the model parameters. The first one is the PDB_SELECT 2005 by Hobohm and
Sander, with no pair of sequences having sequence identity greater than 25% (for alignments of length 80 or more residues). The procedure used to generate the PDB_SELECT
list is described in [1-2]. The second set was downloaded from the EVA server ftp site as of 09/2004 [3]. The proteins in this set satisfy the
condition that percentage of identity between any pair of sequences should not exceed the length dependent threshold S (for instance for sequences longer than
450 amino acids, S=19.5) [4].
Conversion Rules to Reduce 8 States to 3
The following conversion rules are available for the training set:
(i) EHL mapping: H, G, I to H; E, B to E; S, T, ' ' to L,
(ii) PSIPRED's mapping: H, G to H; E, B to E; I, S, T, ' ' to L,
(iii) CK mapping: H to H; E to E; G, I, B, S, T, ' ' to L.
Length Adjustments
For each conversion rule, it is possible to apply the length adjustments proposed by Frishman and Argos [5], in which short helices and strands are converted to loops so
that Hmin = 5, Emin = 3.
References
[1] U.Hobohm, M.Scharf, R.Schneider, C.Sander: Selection of a representative set of structures from the Brookhaven Protein Data Bank, Protein Science 1 (1992), 409-417.
[2] U.Hobohm and C.Sander: Enlarged representative set of protein structures, Protein Science 3 (1994), 522-524.
[3] EVA server FTP site: http://cubic.bioc.columbia.edu/eva/doc/ftp.html.
[4] B. Rost: Twilight zone of protein sequence alignments, Protein Engineering 12 (1999), 85-94.
[5] D. Frishman and P. Argos: Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence, Protein Engineering 9 (1996),
133-142.