Poster Abstract

Finding Sequencing Errors on DNA Coding Regions Based on Frame Predictions from a Combination of Statistical and Neural Network Methods.

Artemis G. Hatzigeorgiou and Martin Reczko

Synaptic Ltd., Aristotelous 313, 13671 Acharnai, Athens, Greece.

Sequencing errors in DNA sequences may be found using a very reliable frame prediction.

Searching for the best possible frame prediction on coding sequences we relate to the results of Fickett & Tung (1992)[1] and Faber, Lapedes and Sirotkin (1992)[2]. Different architectures of neural networks (NN) and hybrid systems containing statistical methods and neural networks are applied to the training and test data of Ficket & Tung. In order to train the NN, we use various architectures and novel training methods of NN. On novel patterns most of these algorithms seem to generalize better than traditional backpropagation. The algorithms considered are: Quickprop, Resilient Backpropagation (Rprop), Scaled Conjugate Gradients and Time Delay Neural Networks (TDNN). Combining statistical methods with neural networks leads on average to a better prediction performance than the previously used combination of the same statistical methods and Penrose Discriminant Analysis. To predict frameshifts, neural networks are trained on patterns from coding regions in frame versus coding regions not in frame. Excluding regions containing a stop codon from the negative patterns leads to improved prediction performance. Current results for the recognition of the coding frame on an independent test set have a sensitivity and specificity of both 89% for every 54 Bp window. This performance level is already sufficient to detect frame errors in the database. All training algorithms used in this study are implemented with the public domain software tool 'Stuttgart Neural Network Simulator' (SNNS)[2].

[1] Fickett J.W., Tung C.-S. (1992) "Assessment of Protein Coding Measures." N.A.R., vol.20, pp.6441-50.

[2] Farber R., Lapedes A. , Sirotkin K. (1992) "Determination of Eukaryotic Protein Coding Regions Using Neural Networks and Information Theory J.Mol.Biol. vol. 224 , pp.471-479.

[3] Zell A., Mamier G., Vogt A. Mache N., Huebner R., Doering S., Herrmann K.U., Soyez T., Schmalzl M., Sommer T., Hatzigeorgiou A., Posselt D., Schreiner T., Kett B., Clemente G., Reczko M., Riedmiller M. Seemann M., Ritt M., DeCosterJ., Biedermann J., Danz J., Wehrfritz C., Werner R., Berthold M. (1995) SNNS ( Stuttgarter Neural Network Simulator) User Manual, Version 4.0, Report No. 6/95 (p. 279) University of Stuttgart, Institute for Parallel and Distributed High Performance Systems. http://www.informatik.uni-stuttgart.de/ipvr/bv/projekte/snns/snns.html