Choosing the correct BLAST algorithm

SequenceServer has an auto-detection feature that selects the appropriate BLAST algorithm for your input data and databases.

However, there are five basic BLAST algorithms: blastp, blastn, tblastx, tblastn, and blastx. Each algorithm has a different use case, and it’s essential to choose the appropriate one for your analysis. This post will help you choose the right one.

The appropriate BLAST algorithm choice depends on what you’re trying to do.

As biologists, we work with nucleotide sequences and protein (i.e., amino-acid) sequences. Several versions of BLAST exist so we can analyze both types of sequences. Are we searching with a nucleotide sequence or a protein sequence? Are we comparing that to a database of amino-acid sequences such as UniRef90 or to a database of nucleotide sequences such as the Telomere-to-Telomere human genome?

The correct BLAST algorithm depends on the type of query sequence and the type of database sequence. Below is a summary overview from our 2019 Mol Biol Evol paper:

Overview of BLAST algorithms and how they are used

Choosing the wrong algorithm can lead to incorrect results

Choosing the wrong algorithm can lead to incorrect results. For example, if you want to search with a nucleotide query sequence but run blastp, BLAST will still run. But it will give you incorrect results—false negatives. You will erroneously conclude that there is no similarity between your query sequence and the selected database. You should have used blastn, tblastn or tblastx depending on your database and the expected evolutionary distance between your query and the sequences you are comparing against.

SequenceServer automatically chooses the right algorithm depending on your query and database sequence types

So, if you’re running BLAST locally or at NCBI, you need to know the type of query sequence and the type of database sequence. Think carefully before clicking.

However, if you’re using SequenceServer, no need to worry. SequenceServer automatically chooses the appropriate algorithm. Indeed, it has an “automagic” selection mechanism that identifies query type and database type, and selects the BLAST algorithm that will work best. You can focus on the science and avoid costly mistakes.

In the screenshot below, a biologist pasted some nucleotide sequences as the query, and selected a protein database. SequenceServer auto-detected this and consequently selected BLASTX, the only algorithm appropriate for comparing nucleotide sequences to a protein database.

`blastn` vs. `tblastx`: two options for comparing nucleotide sequences

Things are a bit more complex if you search with nucleotide query sequences against nucleotide databases. You have a choice between blastn and tblastx. Why are there two algorithms that seemingly do the same thing? What are the tradeoffs, and which should you choose?

Algorithmic differences between `blastn` and `tblastx`

In short, blastn does comparisons in nucleotide space. It compares nucleotides directly. It does this using the forward sequence, and the reverse-complement sequence.

In contrast, tblastx performs its comparisons in the world of amino-acid sequences. For that, tblastx translates the nucleotide query sequence into amino-acid sequences using all six possible reading frames (three forward and three reverse-complement). And tblastx does the same thing with the nucleotide database, translating it into all six possible translated amino-acid sequences. Thus, each query sequence is effectively compared to the database sequence in thirty-six directions.

Tradeoffs between `blastn` and `tblastx`

The algorithmic differences between blastn and tblastx create multiple tradeoffs:

blastn is faster because it makes far fewer comparisons, and each comparison is more straightforward than tblastx.
blastn is more precise for highly similar nucleotide sequences.
tblastx is more sensitive for divergent sequences. Indeed, it can better detect similarity among distantly related sequences than blastn. This is because nucleotides degenerate faster than amino acids (because there are 4 * 4 * 4 = 64 possible codons for 20 amino acids plus the “stop signal”, there is some redundancy; thus, different nucleotide sequences can encode identical amino acid sequences).
Only use tblastx for protein-coding genes. Remember that translating nucleotide sequences into protein sequences isn’t always reasonable, for example, non-coding RNAs, conserved non-coding elements, or primer sequences.

Conclusion

In conclusion, it’s crucial to choose the right algorithm for your data types and question. SequenceServer will automatically choose what works for the sequence types you’re entering. But if you’re running BLAST locally or at NCBI, you must carefully think through which types of query and database sequences you’re comparing.

Overview of BLAST algorithms and how they are used

For specific applications, additional adjustments are needed. For example,

for verifying primer sequences, you’ll want to use blastn and tweak other parameters such as word size and the E-value threshold.
to identify protein-coding genes that are orthologous between species for which you have protein-coding genesets, you’ll want to use blastp. But if you only have transcriptome assemblies, tblastx may be more appropriate.

Stay up to date

To receive the latest news from our team, enter your email:

Choosing the correct BLAST algorithm

The appropriate BLAST algorithm choice depends on what you’re trying to do.

Choosing the wrong algorithm can lead to incorrect results

SequenceServer automatically chooses the right algorithm depending on your query and database sequence types

`blastn` vs. `tblastx`: two options for comparing nucleotide sequences

Algorithmic differences between `blastn` and `tblastx`

Tradeoffs between `blastn` and `tblastx`

Conclusion

Stay up to date

Some other blog posts you might like:

Get your exclusive SequenceServer stickers – and more!

Taxonomic restriction of BLAST searches

Identifying conserved protein domains to understand gene function

Choosing the correct BLAST algorithm

The appropriate BLAST algorithm choice depends on what you’re trying to do.

Choosing the wrong algorithm can lead to incorrect results

SequenceServer automatically chooses the right algorithm depending on your query and database sequence types

blastn vs. tblastx: two options for comparing nucleotide sequences

Algorithmic differences between blastn and tblastx

Tradeoffs between blastn and tblastx

Conclusion

Stay up to date

Some other blog posts you might like:

Get your exclusive SequenceServer stickers – and more!

Taxonomic restriction of BLAST searches

Identifying conserved protein domains to understand gene function

`blastn` vs. `tblastx`: two options for comparing nucleotide sequences

Algorithmic differences between `blastn` and `tblastx`

Tradeoffs between `blastn` and `tblastx`