- Homology modeling is also known as comparative modeling predicts protein structures based on sequence homology with known structures.
- It is based on the principle that “if two proteins share a high enough sequence similarity, they are likely to have very similar three-dimensional structures.”
- It hence relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence.
- Thus, if one of the protein sequences has a known structure, then the structure can be copied to the unknown protein with a high level of confidence.
Working of Homology Modeling
It predicts the three-dimensional structure of a given protein sequence (target) based on alignment to one or more known protein structures (templates). If the similarity between the target sequence and the template sequence is detected, structural similarity can be assumed. In general, 30% sequence identity is required to generate a useful model.
Steps in Homology Modeling
The overall homology modeling procedure consists of six steps.
- The first step is template selection, which involves the identification of homologous sequences in the protein structure database to be used as templates for modeling.
- It most commonly relies on serial pairwise sequence alignments aided by database search techniques such as FASTA and BLAST but may employ other approaches such as PSI-BLAST, Protein threading etc in addition to these.
- The second step is the alignment of the target and template sequences.
- Once the structure with the highest sequence similarity is identified as a template, the full-length sequences of the template and target proteins need to be realigned using refined alignment algorithms to obtain optimal alignment.
- The best possible multiple alignment algorithms, such as Praline and T-coffee, should be used for this purpose followed by manual refinement of the alignment such as to improve alignment quality.
- The third step is to build a framework structure for the target protein consisting of main chain atoms.
- Once optimal alignment is achieved, coordinates of the corresponding residues of the template proteins can be simply copied onto the target protein.
- If the two aligned residues are identical, coordinates of the side chain atoms are copied along with the main chain atoms. If the two residues differ, only the backbone atoms can be copied.
- The fourth step of model building includes the addition and optimization of side chain atoms and loops.
- In the sequence alignment for modeling, there are often regions caused by insertions and deletions producing gaps in the sequence alignment. The gaps cannot be directly modeled, creating “holes” in the model.
- Closing the gaps requires loop modeling, which is a very difficult problem.
- Currently, there are two main techniques used to approach the problem: the database searching method and the ab initio method.
- Once main chain atoms are built, the positions of side chains that are not modeled must be determined.
- Modeling side chain geometry is very important in evaluating protein-ligand interactions at active sites and protein-protein interactions at the contact interface.
- Most modeling packages incorporate the side chain refinement function. A specialized side-chain modeling program that has reasonably good performance is SCWRL (sidechain placement with a rotamer library), a UNIX program that works by placing side chains on a backbone template according to preferences in the backbone-dependent rotamer library.
- The fifth step is to refine and optimize the entire model according to energy criteria.
- The entire raw homology model is made free of structural irregularities such as unfavorable bond angles, bond lengths, or close atomic contacts.
- If structural irregularities are seen, it can be corrected by applying the energy minimization procedure on the entire model.
- Another often used structure refinement procedure is a molecular dynamics simulation. GROMOS (www.igc.ethz.ch/gromos/) is a UNIX program for molecular dynamics simulation.
- The final step involves evaluating the overall quality of the model obtained.
- The final homology model has to be evaluated to make sure that the structural features of the model are consistent with the physicochemical rules.
- If necessary, alignment and model building are repeated until a satisfactory result is obtained.
Uses of Homology Modeling
- Protein modeling Provide a solid basis for:
- Rational design of proteins with increased stability or novel functions
- Analysis of protein function, interactions, antigenic behavior
- Structure-based drug design
- Because it is difficult and time-consuming to obtain experimental structures from methods such as X-ray crystallography and protein NMR for every protein of interest, homology modeling can provide useful structural models for generating hypotheses about a protein’s function and directing further experimental work.
- Xiong J. (2006). Essential Bioinformatics. Texas A & M University. Cambridge University Press.
- Arthur M Lesk (2014). Introduction to bioinformatics. Oxford University Press. Oxford, United Kingdom
- John, B; Sali, A. (2003). “Comparative protein structure modeling by iterative alignment, model building and model assessment”. Nucleic Acids Res. 31 (14): 3982–92.