3D modeling method
Enter input sequence
You can enter a single-chain protein sequence in FASTA format with or without the FASTA header line. Your input sequence may look like this:
or simply this
If the sequence contains a FASTA header line, it must start with “>”, followed by the sequence name. We automatically remove spaces and line breaks. We currently cannot handle non-canonical amino acids, neither multi-chain targets.
Please limit the length of the sequence to 400 amino acids.
If you like to obtain predictions for longer proteins, please contact us.
Upload contact restraints
You can upload your own contact restraints to be used for the structure prediction. Please follow these steps:
- Store your contact restraints in a single text file.
- Make sure to follow the format described below.
- Upload the contacts file when you submit the job.
We accept uploaded contact restraints in the following file format (according to CASP11 file format definition):
TARGET T9999 REMARK Predictor remarks 1 8 0 8 0.720 1 10 0 8 0.715 # <- i=1 j=10: indices of residues (integers), 31 38 0 8 0.710 10 20 0 8 0.690 # <- d1=0 d2=8: the range of Cb-Cb distance 30 37 0 8 0.678 # predicted for the residue pair (i,j) 11 29 0 8 0.673 1 9 0 8 0.63 # <- p=0.63: probability of the residues i=1 and j=9 21 37 0 8 0.502 # being in contact (in descending order) 8 15 0 8 0.401 3 14 0 8 0.400 5 15 0 8 0.307 7 14 0 8 0.30 END
The format consists of a header followed by residue-residue contacts defined by 5 columns:
i j d1 d2 p
- indices i and j of the two residues in contact are provided such that i < j, i.e. only half of the contact map is supplied.
- the numbers d1 and d2 indicate the distance limits defining a contact. A pair of residues is defined to be in contact when the distance between their C-beta atoms (C-alpha in case of glycine) is less then 8 Ångstrøm. Therefore, we accept d1=0 and d2=8.
- the real number p indicates probability / confidence of the two residues being in contact, and is the range 0.0 – 1.0. Contacts are listed according to the decreasing probability p.
Please note that we only parse contact restraint lines that follow this format. Any other lines (remarks, comments, wrongly defined contact restraints, other distance ranges) will be ignored.
We inform you about the number of accepted contact restraints after the parsing and list the rejected contacts with an appropriate error message.
We explain how we integrate the accepted contact restraints in our FAQ.
3D modeling method
RBO Aleph automatically decides which 3D modeling method is most appropriate for the given sequence if “Automatic” is chosen as method (Default). After predicting the contacts for the given target sequence our server runs template retrieval. Based on its outcome RBO Aleph perform either:
- template-based modeling if our template retrieval detects a suitable template or
- ab initio modeling if no suitable template was found.
RBO Aleph implements a machine learning approach to select suitable templates, which are retrieved from several threading algorithms.
RBO Aleph also allows to select one out of these two modeling methods explicitly. Note that forcing template-based modeling may result in a poor prediction if the found template matches poorly.
RBO Aleph performs domain boundary prediction by default when ‘automatic’ modeling is selected. We recommend to keep this setting for input sequences with more than 250 amino acids, because the quality of ab initio predictions deteriorates for longer sequences. If RBO Aleph detects multiple domains, it is much easier to predict each domain separately.
But you may deactivate “Domain splitting” to predict the whole input sequence at once by deselecting the check box when using the ‘automatic’ modeling method. In case you force ‘abinitio’ or ‘template-based modeling’ instead of ‘automatic’ domain prediction cannot be used in the current implementation. This will be changed soon.
RBO Aleph performs domain splitting
- based on selected templates if uncovered regions longer than 50 residues occur or
- if no template is found we use a consensus of state-of-the-art sequence-based domain prediction method. In case we cannot detect a domain boundary with high enough confidence, we will not split the target sequence at all.
In case the target is splitted into domains we run a domain assembly protocol to build a prediction over the whole target.