INSERM
LocPred

        3D protein structures are classically described by the succession of their secondary structures : the periodic a-helix and b-sheet, and, the coils (everything else). However, this approach lets 50% of the structures not described.

       A structural alphabet (de Brevern et al, 2001 [ abstract ])is a set (or library) of small prototypes which approximate every part of the protein structures. They compose a limited number of recurrent structural elements of proteins. The associations between these structural "letters" are governed by logic rules and form the words of protein structures. The applications are numerous and range from simplifying the protein backbone conformation with a correct accuracy to more ambitious prediction approaches.

PBs

Figure 1 . Representation of the 16 Protein Blocks.

Protein Blocks are small prototype approximating the protein backbones (de Brevern et al., 2000 [ abstract]). We have used a non-redudant structure protein datank and translated the 3D structure into diehedral angles vectors. This information is used by an unsupervized classifier to generate series of small prototype of 5 residue length. They describe at best all the diehedral vectors of the database. The series of 16 PBs was chosen as a correct equilibrium between 3D approximation and prediction. The prediction method is a Bayesian method, it was improved with a new approach called sequence family. Morevoer, a confidence index called Number of EQuivalent or Neq had permitted the conception of two strategies which are (i) a global strategy (a variable number of Protein Blocks is given at each site for a given prediction rate) and (ii) a local strategy (only residue with a correct Neq are conserved for a given prediction rate).

        The most frequent successions of 5 PBs length called Structural Words (SWs) have been examined. The selection defines 72 SWs that exhibit a good structural approximation for 9 Calpha length. Combination of most of the SWs in a protein network includes more than 90% of protein residues of non-redundant protein structural databank. Interestingly, more than 80% of the coils are included in the network. The structural stability of the protein network is examined for every part and shows locally only one type of folds. Amino acid composition is analyzed and new type of relationship between local folds and amino acid distribution is shown. The results show that the 3D structure of the protein databank may be easily described through a combination of sub-graphs included in the network(de Brevern et al., 2002 [abstract]).

        This structural alphabet was used in a compaction of a structural databank with a new clustering approach called Hybrid Protein Model (de Brevern & Hazout, 2001 abstract]). It differs from a classical clustering because the clusters are not independent, they are overlapping and so create continuity. The methodology has been improved (de Brevern & Hazout, 2003 [abstract] and Benros et al. [abstract].

        It has shown its potentiality in the analysis and description of the relationship between structure and sequence in globular proteins (de Brevern & Hazout, 2000 abstract). An other approach is developped in our laboratory using Hidden Markov Model (Camproux et al., 2001 [ abstract]).

       

window sequence

The first screen of LocPred is a classical window to write the protein sequence. In the following sections, we look at the different options and analysis.

part A : sequence part A : sequence part B : results part B : results part C : Global Neq part C : Global Neq part D : Local Neq part D : Local Neq

window sequence

The help files are sub-divised into 5 distinct sections.

  • Part A : The sequence, the choices for different types of prediction and available formats of results.
  • Part B : The different types of results.
  • Part C : The "Global" Neq and rasmol script.
  • Part D : The "Local" Neq and the two type of strategies (local and global).
  • Part E : Conclusion.
  • references.
  • annex.
back
Last modif : 11 March 2004