You can search with a motif. Searching for a group of CDRs, specific peptide patterns or SNPs has never been simpler.
Motif syntax
The extensive syntax is as follow:
- A letter matching itself (ambiguity characters are expanded)
- . (dot) for any letter
- ? for the previous entity 0 or 1 time
- * for the previous entity 0 or more times
- + for the previous entity 1 or more times
- [ ] which contains a list of alternative letters
- [^] means do not match any of the characters after the ^
- ( ) to group entities
- (|) for alternatives
- {n} where n is a number. Previous entity matches n times exactly.
- {n,m} where n and m are numbers. Previous entity at least n times and at most m times. n or m can be empty, meaning any number. {1,5}: from 1 to 5 times.
- ^ meaning must start with: ^ATC: must start with ATC
- $ meaning must stop with
- DNA and amino acid ambiguity characters are fully expanded. For instance DNA ambiguity B (meaning all but A) is expanded to [BCGTU], T and U are expanded to [TU], …
- \X is a special case. For motif searches against proteins, it will match an X. This notation only works as \X, i.e. not with \P or \A.
Search examples:
A simple motif with an alternative amino acid:
[EK]FWEVISDEHGIDPS
3 CDRs with any space in between:
SYWMY.*RIDPNSGSTKYNEKFKN.*DYRKGLYAMDY
Note that .* means any space including none.
Starting alternate triplets (one of which is ambiguous), one to four H or W:
^(DYR|SYW|W.W)EVISDE[HW]{1,4}GID
An exact sequence (starts with ^, ends with $):
^RIDPNSGSTKYNEKFKN$
A list of mutations: S24G, S33T, S53G, S78N, S101N, G128A and L217Q
>motif_WT
^.{23}S.{8}S.{19}S.{24}S.{22}S.{26}G.{88}L
>motif_MUT
^.{23}G.{8}T.{19}G.{24}N.{22}N.{26}A.{88}Q
>motif_BOTH
^.{23}[SG].{8}[ST].{19}[SG].{24}[SN].{22}[SN].{26}[GA].{88}[LQ]
When is a motif search (unexpectedly) useful?
Here are a few cases where a motif search is useful that you might not have thought of:
- Using an extremely short sequence. Blast cannot use amino acid sequences shorter than 4 residues. So to search with a 3 amino acid sequence, you need to use the motif search
- Are you looking for cases where your exact sequence is found ? Just use the motif search and add ^ at the beginning of your sequence and $ at the end.
- Similarly if you are looking for cases where your sequence is found exactly or included in a larger sequence, just use the query sequence as is.