SA-Search

A program to search for 3D fragments similar to (fragments of) a query structure.

Introduction

This program uses the compression of the 3D structures of proteins using a structural alphabet to transpose the classical amino acid sequence methods to the search of 3D similarities in proteins. The score matrix quantifying the
similarities between the letters of the structural alphabet have been derived from the probabilities of observing each letter at each position when encoding series of protein structures. An example of possible scoring matrix is reported here

SA-search is not currently as performant as true 3D similarity search methods, but it provides a very fast way of mining collection of structures, especially for medium range fragments, where true 3D similarity search methods are confronted with a large number of comparisons.
Also, in our experience, the differences between true 3D similarity search methods and SA-Search often come from the lenghts of the structural alignements, more than from the structures identified as "hits".

Some other search services are listed here.

How to use SA-Search ?

SA-search expects on entry a "query" file and an "against" what it will be searched.

The "query" as well as the "against" can be specified using several formats that can be selected using the radio buttons.
Make sure to have the entry specified according to the radio button selected.

Parameters of the search:

The search parameters allow mostly to control the depth of the search that is achieved.

Results:

 The results are returned using the NBRF/PIR format (since it allows to specify the localisation of the matches  in the structures, which might be of interest if you want to  get the superimposed structures) or in a raw format (one match per line). Note that  the returned numbers of  the residues currently start from 1 for the first residue of the chain.

For each match, we return the PDB and SCOP Ids, the name of the compound, the amino-acid and HMM-SA sequence identities, the alpha-carbon best fit RMS deviation, and the aligned sequences. For fragmented PDB files, the fragments are numbered from 1, and the PDB id will be on the form: PDBid+PDBChn+_+Fragment number (1qmnA_1 for example).