TABLE OF CONTENTS
- Check your results and the searched sequences
- If you have no result, check the way your search
- Refine your results with the Identity filters
- The good percentage (%) identities is…
- Other filters to consider
- Learn more about OBS specific columns and display
- The alignment tab: a powerful display of sequences
Check your results and the searched sequences
When launching an OBS run, you need to set a number of unique sequences as a maximum number. Since you never know how many results you might have, a priori, you can leave it to the default or set it up a bit to, say, 500. Once the results are computed, you need to see if that number was correct. To verify that, go to the last page and check the worst results. If those are below what you would consider as good, then you should have the results you want. If the worst results are still good enough, you need to re-launch your OBS run and increase the maximal number of unique sequence results.
More at How to search for a sequence
If you have no result, check the way your search
That is always worrisome, indeed. Do I have 0 result because my sequence or parameters are not correct or, is there really 0 result (which is usually good news for you!). If you have done a MOTIF search, there is often a real possibility of getting 0 result since your motif might be too restrictive. You might want to loosen it a bit to verify that you are not mistaken. If you have done a blast search with a very short sequence, it might be due to the set up. For very short DNA sequences (such as a 10 residue primer) or a particularly short CDR (5 amino acid long for instance), you need to select the “short input sequences” flag and increase the Max E-value to its maximum (5M). Those are common mistakes. In doubt, our wonderful support team is always there to help.
More at Biosequence: MOTIF searching
Refine your results with the Identity filters
Let us first define what percentage identities mean. A percentage identity is always computed over something where this something is the query, subject or alignment. So, the percentage identity of the query is the percentage of query residues that match. Similarly over the subject and alignment. Note that this might look a bit strange with gaps. For instance, a query where all residues match, but gaps are introduced in the query sequence still means that 100% of residues of the query match. Thus, a 100% query identity does not mean a perfect matching sequence is found.
So, which one should you use? Let’s say you want to find similar subject sequences to your query or queries. In this case, you can set both %identity over the query and subject to a high value, say, 80. That will guarantee that all your hits will be very close, i.e. the queries and subjects will be very similar to each other. Another case is when you have one or more short queries and you want them embedded in subject sequences (think CDRs and chains). Here you will want to set only the %query identity.
The good percentage (%) identities is…
Unfortunately, and this get asked often, there is no good answer to this question. Some patents will claim sequences with 70% identity, some 90%. If the sequences are very short, the number of mismatches or some specific substitutions are mentioned or claimed. Roughly speaking, 80% identity over the query or subject are generally considered good percentage identities, but again, this might vary from case to case.
Other filters to consider
Here is a list of other filters and there uses:
• Number of errors
o This is the number of errors in the alignment, errors being mismatches and gaps. It can be used to control the quality of the alignment with more finesse.
• Number of gaps
o When one wants to separate gapped alignments from non-gapped alignments.
• Limit to claims
o Only claimed sequences will be shown as hits.
• Query name
o When using several query sequences, selecting all or some query names will show only families where all the selected queries have hits. This is particularly useful when one wants to find families that have hits with all one’s CDRs.
• Subject length and alignment length
o Those filters apply to the length of the subject or alignment. One uses those filters to control for long subjects (think genomic subsequences) or very long alignments which can occur with MOTIF searching.
o We normalize organism names linked to patent sequences up to a certain point. Though not a perfect system, it allows for finer control over aligned sequences.
More at Biosequence specific filters
Learn more about OBS specific columns and display
There are 7 OBS specific columns that can be shown next to the FAMPAT columns (Title, Assignee, …). The “display” menu (next to the printer symbol) above the family rows controls which columns are displayed. Note that those numbers are only computed once when you open your results. Any subsequent filtering will not change those static numbers.
• Best %QID
o The largest percentage identity over the query for this family
• Claimed seq.
o Yes or no: Is any hit subject sequence claimed?
• Unique seq. hits
o The number of unique sequences that is a hit
• Longest Alignment
• Nb queries w/ hits
o If you use several queries, the number of queries that have hits in this family, otherwise it is 1.
• Nb pub. w/ hits
o Number of different publications of the family that have hits.
• List of queries w/ hits
o The names of the queries that have hits in the family.
The alignment tab: a powerful display of sequences
The right-hand side alignment tab is dynamically recomputed depending on the filters you have used. It shows alignment information on the currently highlighted family.
It first shows the queries with hits, then the publications in this family with the number of hits per publications and the total number of sequences known in this publication.
Following those headers, the query name is visible. You can click on the little triangle on its left to open and close all following alignments relating to the query; for instance, to see other query hits.
The alignment area starts with a “sequence” and a list of publications and SEQ ID NO. This sequence is common, in this family, to all the combinations of publications and SEQ ID Nos listed. Clicking on the word sequence will pop-up a little window with the raw sequence.
The alignments are shown in two forms, a graphical representation and, by clicking on “Details”, the traditional textual alignment. The graphical representation shows a lot of information in the nicest way possible. It includes query and subject sequences size, start and stop of the alignment, frames or strands (FW for forward, REV for reverse complement, -3 to +3 for forward and reverse complement frames) as well as the number of errors and matches, and the blast scores and e-values. Note that the coordinates are always the original sequence coordinates even when the sequence is used in a specific frame.
The detailed alignment is more textual and is followed by more alignment features (number of gaps, …) and some details on each published versions such as claim status, organism.
More at How to read your alignments
Lastly, please, do not hesitate asking our support team (email@example.com) in case of doubt.