What is RBO Aleph?
RBO_Aleph is a server for contact prediction and structure prediction of proteins. It is primarily designed to leverage and exploit novel information to guide conformational search: Predicted residue-residue contacts from evolutionary and physicochemical information (EPC-map) and promising regions in the energy landscape identified by model-based search (MBS). Our main target is ab-initio modeling, when sequence and structure homologs are absent. To make RBO Aleph applicable to a wide range of protein targets it also performs template-based modeling based on state-of-the-art algorithms if a suitable template is found.
When could RBO Aleph be useful for you?
The most challenging targets in protein structure prediction have no structural homolog and hence rely on an effective search for the native structure in an enormous search space. Especially for such targets RBO Aleph’s two key features become most valuable:
First, we exploit novel information in form of predicted contact restraints obtained with EPC-map in template-free ab initio structure prediction. Contact information describes sequence separated residue pairs in spatial proximity in the native structure. Additional energy terms utilize contact restraints to guide search towards relevant regions in the conformational space by smoothing the energy landscape.
Second, we perform ab initio structure prediction with model based-search (MBS), a highly effective search strategy that extracts relevant information from observations made during search to use it as further guidance. Model-based search identifies funnels in the energy landscape using an all-atom energy function and incrementally increases the focus on regions in the conformational space that are most likely to contain the native structure.
How does RBO Aleph work?
Given the sequence of a target, RBO Aleph first predicts its contacts with EPC-map. EPC-map is a novel framework that integrates evolutionary information from multiple sequence alignments obtained with GREMLIN [Kamissetty 2013] and physicochemical information leveraged from structure decoys. We employ a novel machine learning system to capture relevant properties of the physicochemical environment of a contact encoded in a graph-based representation. This allows us to differentiate native from non-native contacts in decoys. By combining these two sources of information EPC-map delivers accurate contact predictions even in the absence of a large number of sequence homologs.
We exploit information from predicted contacts at two stages in our pipeline: to re-rank retrieved templates in template-based modeling and to guide conformational space search in template-free modeling.
In template-based modeling, RBO Aleph uses a machine-learning based classifier to re-rank templates, which are derived from several threading algorithms [Hildebrand 2009, Peng 2011, Yang 2011, Wu 2007]. The classifier is trained on features such as compatibility between predicted contacts and templates, template alignment quality, and consensus of templates. The contact-based features characterize the ratio of satisfied contacts in each template, while taking into account the contact confidence estimated by EPC-map.
In template-free modeling, we obtain distance restraints from the contact information that smooth the energy landscape, thereby effectively guiding search towards biologically relevant regions in the conformational space.
After the contact prediction RBO Aleph runs template retrieval. If suitable templates are detected our server automatically performs template-based modeling [Sali 1993]. Otherwise RBO Aleph runs ab initio modeling with model-based search.
RBO Aleph splits the target sequence into domains
- if it contains parts longer than 50 residues that remain uncovered by the selected templates or
- if no suitable template was found and the target sequence has more than 250 residues. Because the prediction quality of ab initio modeling quickly deteriorates for longer targets, RBO Aleph integrates state-of-the-art sequence-based domain predictors [Sim 2005, Cheng 2006]. If the consensus built from the domain predictions results in a domain boundary with high confidence, we split the target accordingly.
We use a domain assembly protocol based on RosettaDock [Rohl 2004] to generate a continuous structure prediction for the whole target if it was splitted into domains.
Finally, we estimate the prediction error per residue [McGuffin 2008] and select the top 5 models based on a combination of a specialized energy function [Sippl 1993] and Rosetta full-atom energy [Rohl 2004].
Performance of RBO Aleph
Our fully automated server RBO Aleph participated in the recent CASP 11 (Critical Assessment of Protein Structure Prediction). RBO Aleph ranked as No. 1 by the average z-score (>0) and No. 3 by the sum of z-scores (>-2) of the assessors formula out of 44 servers in the FM (Free Modeling) category that comprises the most difficult targets without detectable sequence or structure homologs.
Our contact prediction method, EPC-map, ranked 1st for medium+long-range contacts and 7th for long-range contacts in the categories of FM and TBM/FM (Template-Based Modeling) targets.
RBO Aleph is also continuously evaluated in CAMEO.
- M. Mabrouk, I. Putz, T. Werner, M .Schneider, M .Neeb, P. Bartels and O. Brock (2015). “RBO Aleph: Leveraging Novel Information Sources for Protein Structure Prediction“. Nucleic Acids Research, doi:10.1093/nar/gkv357. [PDF]
Ab-initio protein structure prediction with Model Based Search (MBS)
- TJ Brunette and O. Brock (2008). “Guiding conformation space search with an all atom energy potential“. Proteins 73, 958–972. [PDF]
- TJ Brunette and O. Brock (2005) . “Improving protein structure prediction with model-based search“. Bioinformatics 21(1): i66-i74. [PDF]
Contact prediction with EPC-map
- M. Schneider and O. Brock (2014). “Combining Physicochemical and Evolutionary Information for Protein Contact Prediction“. PloS one 9.10: e108438. [PDF]