Some papers about structural alphabet



Definition of Protein Blocks and local prediction

de Brevern A.G., Etchebest C., and Hazout, S. (2000), Bayesian probabilistic approach for prediction backbone structures in terms of protein blocks, Proteins : Structure, Functions and Genetics, 41(3), pp.271-287.
Using an unsupervised cluster analyser, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive Ca ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35 %. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6 %. This prédiction accuracy exceeds 75 % when keeping the first four predicted protein blocks at each site of the protein.

Top


Use of Protein Blocks to understand the sequence - structure relationship

de Brevern A.G., and Hazout S. (2000), Hybrid Protein Model (HPM): a method to compact protein 3D-structures information and physicochemical properties, IEEE - Computer Society : Proceedings of the 7th Symposium on String Processing and Information Retrieval, 1, pp.49-54.
The transformation of protein 1D-sequence to protein 3D-structure is one of the main difficulties of the structural biology. A structural alphabet had been previously defined from dihedral angles describing the protein backbone as structural information by using an unsupervised classifier. The 16 Protein Blocks (PBs), basis element of the structural alphabet, allows a correct 3D structure approximation. Local prediction had been estimated by a Bayesian approach and shown that sequence information induces strongly the local fold, but stays coarse (prediction rate of 40.7 % with one PB, 75.8 % with the four most probable PBs). The Hybrid Protein Model presented in this study learns both sequence and structure of the proteins. The analysis made along the hybrid protein has permitted to appreciate more precisely the spatial location of some types of amino acid residues in the secondary structures and their flanking regions. This study leads to a fuzzy model of dependence between sequence and structure.

Top


Protein Blocks and similar local structures

de Brevern A.G., and Hazout S. (2001), Compacting local protein folds with a Hybrid Protein, Theoretical Chemistry Accounts, 106(1/2), 36-47.
The "Hybrid Protein Model" (HPM) is a fuzzy model for compacting local protein structures. It learns a non-redundant database encoded in a previously defined structural alphabet composed of 16 protein blocks (PBs). The hybrid protein is composed of a series of distributions of the probability of observing the PBs. The training is an iterative unsupervised process that for every fold to be learnt consists of looking for the most similar pattern present in the hybrid protein and modifying it slightly. Finally each position of the hybrid protein corresponds to a set of similar local structures. Superimposing those local structures yields an average root mean square of 3.14 Å. The significant amino acid characteristics related to the local structures are determined. The use of this model is illustrated by finding the most similar folds between two cytochromes P450.

Top


Use of a structural alphabet for predicting the loops

Camproux A.C., de Brevern A.G., Hazout S., and Tufféry P. (2001), Exploring the use of a structural alphabet for a structural prediction of protein loops, Theoretical Chemistry Accounts,106(1/2), 28-35.
The prediction of loop conformations is one of the challenging problems of homology modeling, due to the large sequence variability associated with these parts of protein structures. In the present study, we introduce a search procedure that evolves in a structural alphabet space deduced from a hidden Markov model to simplify the structural information. It uses a Bayesian criterion to predict, from the amino acid sequence of a loop region, its corresponding word in the structural alphabet space. Results show, that our approach ranks 30 % of the target words with the best score, 50 % within the 5 best scores. Interestingly, our approach is also suited to accept or not the prediction performed. This allows to rank 57 % of the target words with the best score, 67 % within the 5 best scores, accepting 16 % of learned words and rejecting 93 % of unknown words.

Top


Structural alphabet : A review

de Brevern A.G., Camproux A.C., Hazout S., Etchebest C., and Tuffery P. (2001), Protein structural alphabets: beyond the secondary structure description, Recent Adv. In Prot. Eng., in press.
The considerable increase of the protein structural database allows to cross the line from the classical secondary structure description of proteins. While still confronted with numerous problems, defining structural alphabets is an emerging concept in the field of protein structure analysis. It is an attempt to objectively classify the whole set of conformations occurring in protein structures described by small overlapping fragments. It is expected to lead to a better understanding of protein architecture and to open new opportunities for protein structure prediction.

Top


If you want more information about those works mailto: debrevern@urbb.jussieu.fr.

De BREVERN Alexandre
Equipe de Bioinformatique Génomique & Moléculaire du professeur Serge Hazout
Unité INSERM U436
Modélisations Statistiques et Mathématiques en Biologie et en Médecine
Université Paris 7
2, place Jussieu case 7113
75251 Paris Cedex 05
to send a mail e-mail with Subject: Protein Blocks.