Taxonomic filtering to improve your BLAST searches

What is taxonomy, and why BLAST with taxonomic restriction?

Taxonomy is the science of classifying and naming organisms. It aims to organize the vast diversity of life on Earth into a system that reflects evolutionary relationships. A taxon is a related group of organisms. The plural of taxon is taxa. Examples of taxa include:

We can use such classification to restrict BLAST to subsets of large databases. Restricting BLAST searches to species or groups of taxa can make it easier to interpret results.

For BLAST, taxonomic filtering requires taxonomic identifiers, or “taxids”. These taxids can be found in the NCBI Taxonomy database. It also requires BLAST databases to understand taxids – this is the case for very large databases, including NCBI (nt/nr) and UniProtKB/SwissProt, but only some other smaller databases.

Advantages of taxonomic restriction

Restricting the BLAST search to specific taxonomic groups has some advantages, which include:

Taxonomic restriction doesn’t really speed up BLAST

Unfortunately, taxonomic filtering with BLAST does not directly reduce the initial search space. Rather than BLAST immediately focusing on the targeted data subset, it first searches the entire dataset for hits and then performs taxonomic restriction. Therefore, the computational effort and time of a taxonomically restricted versus unrestricted BLAST search are similar.

Using different taxonomy levels to restrict BLAST searches

We can limit BLAST searches to specific taxonomic groups, using any taxonomic level like order, genus, or species. However, a higher-level taxon will override lower-level ones. For example, if we limit the search to the Diptera order, it will include the Drosophila genus and Drosophila melanogaster species since both are within Diptera.

NCBI Taxonomy website showing Drosophila melanogaster page that has the taxid number at the top.

Example: Finding vitellogenin genes in particular ant genera

Vitellogenin is important for egg yolk. We identified a predicted vitellogenin gene in the ant Camponotus floridanus. For our research question, we want to find other possible vitellogenin genes in the ant genera Camponotus, Formica, and Solenopsis. We can use BLAST with taxonomic restriction to search within the NCBI nt database, which limits the returned hits to just these three genera.

For instance, you can use the NCBI taxonomy identifiers for these genera:

Once we have our taxids, we must use SequenceServer’s “Advanced Parameters’’ options prior to initiating the BLAST search. The taxids are inserted (multiple IDs delimited by ‘,’) after the -taxids command. We can also exclude taxa with the command -negative_taxids, but this cannot be used at the same time as -taxids.

We can simultaneously use other BLAST parameters, such as E-value cutoff. The following advanced parameters retain only the strongest hits for our three focal ant genera.

You need to use the advanced parameters option to be able to add multiple parameters settings including the taxids for the ant genera we are interested in.

The top of SequenceServer’s BLAST report also indicates which parameters were used; below the taxid restriction is circled in red.

When we run the BLAST, the output summary includes the taxonomic restrictions that we put in place.

Thanks to the taxid restriction, the above report focuses exclusively on the taxa we are interested in. Without this restriction, we would also have obtained many other hits from related genera (including Polyrhachis, Cataglyphis, Nylanderia, and Cardiocondyla).

NOTE: The E-values differ for the same subject hits with and without taxonomic restriction. The E-values are lower (more significant) in the taxonomic restricted BLAST. This highlights how the taxonomic restriction can be used to gain further confidence by limiting searches to only relevant taxon groups.

When we don't use taxonomic restrictions the same BLAST search would reveal many more hits. However, this means we have a lot of hits to filter out because they are not of interest.

Taxonomically restricted BLASTs with SequenceServer

SequenceServer makes BLAST with taxonomic restrictions straightforward. Your results with the taxonomic restriction are also saved to your BLAST History, allowing you to keep track of the taxids you have used. Why not have a go at taxonomic restricted BLASTs with a free trial of SequenceServer!

Happy BLASTing!

Stay up to date

To receive the latest news from our team, enter your email:

Some other blog posts you might like: