BLAST- Definition, 4 Types, Scores, E-Value, Uses

The Basic Local Alignment Search Tool (BLAST) is an algorithm and program that finds the region of local similarity between the sequences by the comparison of the primary biological sequence information of amino acids found in proteins or the nucleotides of DNA or RNA sequences.

  • It is a search program for sequence similarity that can quickly search a sequence database to match with the query sequences.
  • There are different sub-types of the BLAST that is used for comparing all combination of nucleotide and protein query sequences against a nucleotide or proteins database.
  • Similarly, comparing the query sequence with the reference sequence or alignment during BLAST gives the “Expert Value,” from which the statistical information about the significance of each alignment can be known.
  • Researchers have been using BLAST as it is one of the most popular bioinformatics tools, often used for searching data in bulk.
  • The current BLAST command-line applications were made available to the public in late 1997, and BLAST is part of the NCBI toolkit.
  • BLAST can be supported on a number of platforms like Linux and various flavors of UNIX, including Mac OX and Microsoft windows.
  • The initial 1997 BLAST tool lacked various features, but the present-day BLAST is able to handle databases with more than two billion letters.
  • It can be used to observe the functional and evolutionary relationship between sequences as well as help identify members of a gene family.
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST). Image Source: NCBI.

Four Types of BLAST

BLASTn (Nucleotide BLAST)

This tool helps to compare one or more nucleotide sequences to reference sequences or a database of nucleotide sequences. It is used during the determination of the evolutionary relationship among different organisms. 

For this, we have to enter into the NCBI website and search for the Nucleotide BLAST option. After this, add the accession number of the reference sequences and the query sequences. Then, for comparing sequences, find the box called align two or more sequences under the query sequence box. Finally, click on the BLAST options leaving other settings to their default options.

BLASTx (translated nucleotide sequence searched against protein sequence)

BLASTx compares a nucleotide query sequence that is translated into protein sequences against the protein sequence database. BLASTx is particularly important when the reading frame of the query sequence is unknown, or it contains errors that may lead to frameshift or other coding errors because it translates the query sequence in all six reading frames and provides a combined statistical significance for hits to different frames. With a newly determined sequence, BLASTx is often the first analysis that is performed.

tBLASTn (protein sequence searched against translated nucleotide sequences)

A query protein sequence is compared against the six-frame translation of a database of nucleotide sequences in this type of BLAST search tool. Homologous protein-coding regions in unannotated nucleotide sequences like expressed sequence tags (ESTs) and draft genome records (HTG), located in the BLAST database est and htgs, respectively, can be found by tBLASTn. A short single-read cDNA sequence is called ESTs, which consists of the largest pool of the sequenced data for many organisms and also consists of proportions of transcripts from many uncharacterized genes. There is no corresponding protein translation in the BLAST protein datasets since ESTs do not have annotated coding sequences. Thus, tBLASTn is only the potential way to search these potential coding regions at the protein level. Another large source of unannotated coding regions is the HTG sequence, a draft sequence from various genome projects or large genomic clones.

BLASTp

One or more protein sequences are compared to subject protein sequences or a database of protein sequences by this type of BLAST search. It is used for the identification of protein sequences.

BLAST Scores

  • Once a similar sequence has been found for the query sequence in the database through BLAST, then it becomes essential to have the idea of whether the alignment is good or whether it shows the possible biological relationships or not. So BLAST uses statistical theory to produce a bit score for each alignment pair. 
  • The indication of the good alignment is given by the bit score, which shows the higher the scores, the better the alignments.
  • Generally, this score is calculated by taking into consideration the alignment of the similar or identical residues and the gaps introduced while aligning the sequences.
  • It uses the “substitution matrix” for the alignment of any possible residues.
  • For most of the BLAST programs, the BLOSUM62 matrix is the default with the exception of BLASTn and MegaBLAST as these are the programs that perform nucleotide-nucleotide comparisons and do not use protein-specific matrices.
  • Bit scores from different alignments can be compared, even if there is the use of different matrices.
  • Bit score is not dependent on the size of the database and gives the same value for hits in databases of different sizes.

BLAST E-value

  • E-value is the statistical theory used in the BLAST for the alignment of each pair of sequences and provides the idea of whether the alignment is good or not and whether the two sequences match with it or not.
  • The number of expected hits of similar quality (score) that could be found just by chance is the BLAST E-value and the E-value of 10 means that up to 10 hits can be expected to be found by chance.
  • The E-value provides the information about the likelihood that a given sequence match is purely by chance and is used as the first quality filter for the BLAST search result.
  • The lower the E-value the better the match which means if E is less than 1e-50, then there is high confidence that the database match is a result of homologous relationships. 
  • If the value of E is between 0.01 and 10 then the match is considered to be non-significant but may have a weak homology relationship.
  • Similarly, if the value of E is greater than 10, then the sequence under consideration is either unrelated or if related then has an extremely distant relationship.
  • A corrected bit-score adjusted to the sequence database size is the E-value (expected value) and it depends on the size of the used sequence database.
  • When presented in the smaller database, the sequence hit would get a better E-value.

Uses of BLAST

  • It is used for searching the database like:
    • Having a new isolated sequence but have no idea of what it might be of.
    • Identification of the name and function.
  • To perform a Pairwise alignment.
  • To design PCR primers.
  • Checking the PCR primers for specificity.

Application of BLAST

  • DNA mapping: BLAST plays an important role to compare the chromosomal positions of the sequence of interest to reveal the sequences in the database when looking for the specific gene at an unknown location and working with the known species. Thus, BLAST helps in the identification and mapping of the gene between the known and unknown species.
  • Domains location: BLAST helps to identify and locate the domains in the protein sequences of interest.
  • Comparison: while comparing the sequences between the two different or similar species, BLAST is used and thus it helps to identify the similar genes present or the functions between the species.
  • Identification of species: BLAST can also be used for the identification of the species by the comparison of the sequences of the DNA between the different organisms.
  • Establishing phylogeny: after the alignment of the sequences using BLAST, one can identify the result and observe the phylogenetic relationships between the known and unknown species.

References

  1. https://blast.ncbi.nlm.nih.gov/Blast.cgi
  2. McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W20-5. doi: 10.1093/nar/gkh435. PMID: 15215342; PMCID: PMC441573.
  3. https://guides.lib.berkeley.edu/ncbi/blast
  4. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-421
  5. https://slideplayer.com/slide/16115779/
  6. https://www.ncbi.nlm.nih.gov/books/NBK2590/?report=reader
  7. https://www.unmc.edu/bsbc/docs/NCBI_blast.pdf
  8. https://www.unmc.edu/bsbc/docs/NCBI_blast.pdf
  9. https://www.metagenomics.wiki/tools/blast/evalue
  10. https://www.ncbi.nlm.nih.gov/books/NBK62051/
  11. https://ravilabio.info/notes/bioinformatics/e-value-bitscore.html
  12. https://www.metagenomics.wiki/tools/blast/evalue
  13. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-421

Leave a Comment