You can search with a motif.  Searching for a group of CDRs, specific peptide patterns or SNPs has never been simpler.  


Motif syntax

The extensive syntax is as follow:

  • A letter matching itself (ambiguity characters are expanded)
  • . (dot) for any letter
  • ? for the previous entity 0 or 1 time
  • * for the previous entity 0 or more times
  • + for the previous entity 1 or more times
  • [ ] which contains a list of alternative letters
  • [^] means do not match any of the characters after the ^
  • ( ) to group entities
  • (|) for alternatives
  • {n} where n is a number.  Previous entity matches n times exactly.
  • {n,m} where n and m are numbers.  Previous entity at least n times and at most m times.  n or m can be empty, meaning any number. {1,5}: from 1 to 5 times.
  • ^ meaning must start with: ^ATC: must start with ATC
  • $ meaning must stop with
  • DNA and amino acid ambiguity characters are fully expanded.  For instance DNA ambiguity B (meaning all but A) is expanded to [BCGTU], T and U are expanded to [TU], …
  •  \X is a special case.  For motif searches against proteins, it will match an X. This notation only works as \X, i.e. not with \P or \A.


Search examples:

A simple motif with an alternative amino acid:

[EK]FWEVISDEHGIDPS

3 CDRs with any space in between:

SYWMY.*RIDPNSGSTKYNEKFKN.*DYRKGLYAMDY

Note that .* means any space including none.

Starting alternate triplets (one of which is ambiguous), one to four H or W:

^(DYR|SYW|W.W)EVISDE[HW]{1,4}GID

An exact sequence (starts with ^, ends with $):

^RIDPNSGSTKYNEKFKN$

A list of mutations: S24G, S33T, S53G, S78N, S101N, G128A and L217Q

>motif_WT

^.{23}S.{8}S.{19}S.{24}S.{22}S.{26}G.{88}L

>motif_MUT

^.{23}G.{8}T.{19}G.{24}N.{22}N.{26}A.{88}Q

>motif_BOTH

^.{23}[SG].{8}[ST].{19}[SG].{24}[SN].{22}[SN].{26}[GA].{88}[LQ]


When is a motif search (unexpectedly) useful?

Here are a few cases where a motif search is useful that you might not have thought of:

  • Using an extremely short sequence.  Blast cannot use amino acid sequences shorter than 4 residues.  So to search with a 3 amino acid sequence, you need to use the motif search
  • Are you looking for cases where your exact sequence is found ?  Just use the motif search and add ^ at the beginning of your sequence and $ at the end.
  • Similarly if you are looking for cases where your sequence is found exactly or included in a larger sequence, just use the query sequence as is.