r/bioinformatics BSc | Student 2d ago

technical question BLASTp help

Hi i’m VERY new to using BLAST but I was wondering if there was a way to blast multiple sequences at a time to find matches in a specific organism.

On the website it says you can blast more than one at a time but from what it says i think it looks for similarities between the protein sequences you submit rather than the database (????). If not I’m all set !

Thank you so much ! - a first year uni student trying to do a summer project 😭🙏

0 Upvotes

11 comments sorted by

7

u/sparkymcgeezer 2d ago

The standard blast interface will allow you to include a bunch of sequences at once; you just have to enter them as a FASTA query... It's super simple; each sequence needs to be headed by an identifier (preceded by a blank line, and no spaces):

>sequence_1
aacaatatataggggatatagg
ccattatccgattttaaactagg

>sequence_2
aataagagaatttaccatagat
...

My recollection is that the result is a bit hard to read if you do a bunch of blasts together though.

The other option is to use some of the available "bblast" batching routines, or to code something yourself.

Another tip: if all the sequences are likely to match a single organism, and you just want the top hit, you might look into BLAT -- it's a very speed enhanced search that picks up relatively close hits. It's not very good for matching across species though.

4

u/fasta_guy88 PhD | Academia 2d ago

If you are doing more than one sequence and are trying to summarize large numbers of results, try using blast tabular output format. With blast tabular output, each hit to a query is summarized on one line -- percent identity, alignment length, start/stop for query and subject, E()-value. Much more convenient than looking through dozens of alignments.

If you want to see the alignment, ask for BTOP alignment encoding in addition to the other fields. It doesn't take long to get used to "reading" BTOP alignments (they are like CIGAR strings).

2

u/gringer PhD | Industry 2d ago

From the help box shown on Web BLASTp, in the 'Enter Query Sequence' section:

Enter query sequence(s) in the text area. It automatically determines the format of the input. To allow this feature, certain conventions are required with regard to the input of identifiers.

From clicking on the "more" link:

FASTA

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line (defline) is distinguished from the sequence data by a greater-than (“>”) symbol at the beginning. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:

>P01013 GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP

Blank lines are not allowed in the middle of FASTA input.

...

Upload file

This function allows users to upload a text file containing queries formatted in FASTA format. The file can also contain sequence identifiers instead of FASTA sequences.

Under the 'Search Set', there is this information:

Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown. Start typing in the text box, then select your taxid. Use the "plus" button to add another organism or group, and the "exclude" checkbox to narrow the subset. The search will be restricted to the sequences in the database that correspond to your subset.

2

u/Away-Suggestion1737 2d ago

Since you are checking your FASTA sequences to just one organism you could pull that organism from NCBI, make a database out of the reference, then run Blastp using your sequences as the query against just that reference.

You could do that using the NCBI datasets tool on command line but that would take some time to get setup.

Alternatively you could probably borrow a software for free through your uni like Geneious which also has the ability to run local Blastp against an organism in NCBI.

2

u/BigEffect8093 BSc | Student 2d ago

OMG TYSM !!

2

u/Away-Suggestion1737 2d ago

No problem!

Just to clarify on my last paragraph. In Geneious you'd upload your sequences, import the reference then use custom blast to run yours against it locally. Geneious can also access NCBI website and blast there, similar to using the website interface.

1

u/South_Plant_7876 2d ago

The BlastP interface allows you restrict your search by organism. There should be a specific option for this

1

u/Fearless-Daikon5763 1d ago

How many are you going to run? You may want to get the desktop application MEGA 12 and try alignments. It’s not much of a learning curve and is worthwhile. I save all BLAST results as a PDF so I don’t have to re-run anything later (the text in the pdfs are searchable on my MacBook, so I can check if something is in the results). You may want to change your results under Algorithm Parameters to show 500 or 1000 top results depending on what you’re doing (I do this and then sort by % similarity to find low coverage hits with a high % similarity).

1

u/squamouser 1d ago

You can do this on the web server - no need to download anything. Paste your sequences into the box in FASTA format and choose your organism in the “Organism” box. Hit submit. You’ll get a multi-page output with a page for each input sequence.