File formats
Results file
Contacts file
Template alignment file
Results file
We output the atom coordinates of our best 5 predictions (Top 5) in the commonly used PDB-format.
Contacts file
We output the predicted contact restraints in the following file format (according to CASP11 file format definition):
TARGET T9999 REMARK Predictor remarks 1 8 0 8 0.720 1 10 0 8 0.715 # <- i=1 j=10: indices of residues (integers), 31 38 0 8 0.710 10 20 0 8 0.690 # <- d1=0 d2=8: the range of Cb-Cb distance 30 37 0 8 0.678 # predicted for the residue pair (i,j) 11 29 0 8 0.673 1 9 0 8 0.63 # <- p=0.63: probability of the residues i=1 and j=9 21 37 0 8 0.502 # being in contact (in descending order) 8 15 0 8 0.401 3 14 0 8 0.400 5 15 0 8 0.307 7 14 0 8 0.30 END
The format consists of a header followed by residue-residue contacts defined by 5 columns:
i j d1 d2 p
- indices i and j of the two residues in contact are provided such that i < j, i.e. only half of the contact map is supplied.
- the numbers d1 and d2 indicate the distance limits defining a contact. A pair of residues is defined to be in contact when the distance between their C-beta atoms (C-alpha in case of glycine) is less then 8 Ångstrøm. Therefore, typically d1=0 and d2=8.
- the real number p indicates probability / confidence of the two residues being in contact, and is the range 0.0 – 1.0. Contacts are listed according to the decreasing probability p.
- any pair NOT listed is predicted as not in contact and had NO influence on RBO Aleph’s structure prediction steps. Note that RBO Aleph used only predicted (and if so user-defined) contacts to guideĀ ab initio modeling.
Please note that the predicted contacts file provided as result contains only the contacts predicted by EPC-map.
Template-alignment file
If RBO Aleph used templates to predict the structure of the protein the downloadable ZIP-archive of the prediction results also contains the templates alignments to the input sequence in the FASTA format, e.g.:
>HIV_gag_polyprotein_bc667 MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQTGSEELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEE---QNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQGQMVHQAISPRTLNAWVKVVE-E-KA-F-SPEV-IPMFSAL-S-E-G-ATPQDL-NTMLNT---V--GGHQAAMQM--LKETINE------E---AAEWDRVHPVHA-GPI-AP---GQMREPRGS-D-IAGTTS--TLQEQI-GWMT-NNPPI-PVGEI----YKRWIILGLNKIVR-MYSP-TSILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALG-P----AAT-LEEMMTACQGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKCFNCGKEGHTARNCRAPRKKGCWKCGKEGHQMKDCTERQANFLGKIWPSYKGRPGNFLQSRPEPTAPPEESFRSGVETTTPPQKQEPIDKELYPLTSLRSLFGNDPSSQ >1l6n_A -GARASVLSGGELDKWEKIRLRPGGKKQYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQTGSEELRSLYNTIAVLYCVHQRIDVKDTKEALDKIEEE---QNKSKKKAQQAAADTGNNSQVSQNYPIVQNLQGQMVHQAISPRTLNAWVKVVE-E-KA-F-SPEV-IPMFSAL-S-E-G-ATPQDL-NTMLNT---V--GGHQAAMQM--LKETINE------E---AAEWDRLHPVHA-GPI-AP---GQMREPRGS-D-IAGTTS--TLQEQI-GWMT-HNPPI-PVGEI----YKRWIILGLNKIVR-MYSP-TSILHH----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >2gol_A -----SVLSGGELDKWEKIRLRPGGKKQYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQTGSEELRSLYNTIAVLYCVHQRIDVKDTKEALDKIEEE-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >3h47_A ---------------------------------------------------------------------------------------------------------------------------------------PIVQN---QMVHQCISPRTLNAWVKVVE-E-KA-F-SPEV-IPMFSAL-S-C-G-ATPQDL-NTMLNT---V--GGHQAAMQM--LKETINE------E---AAEWDRLHPVH----I-AP---GQMREPRGS-D-IAGTTS--TLQEQI-GWMT-HNPPI-PVGEI----YKRWIILGLNKIVR-MYSP-TSILDIRQGPKEPFRDYVDRFYKTLRAE------------TLLVQNANPDCKTILKALG-P----GAT-LEEMMTACQ----------------------------------------------------------------------------------------------------------------------------------------------------
Each entry consists of a FASTA header line starting with “>” followed by a line containing the amino acid sequence of the protein. The first two lines refer to the input protein, the remaining ones show the alignment of the amino acid sequences of each used template w.r.t. the input sequence. “-” denote gaps in the alignment. We report the alignments of the ten templates with highest rank. Please contact us if you need the alignment of all used templates.