What is a FASTA format

Modified on Fri, 2 Oct, 2020 at 5:14 PM


  

The FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences. This description line begins with a '>' and gives a name or a unique identifier to the sequence. It may also contain additional information. 


 


A more complete example is shown below.  It contains identifiers, descriptions and multiple sequences.


>sp|J7RUA5|CAS9_STAAU Start of CRISPR-associated endonuclease Cas9 OS=Staphylococcus aureus

MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRR

RHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHN

VNEVEEDTGNELS

>sp|Q99ZW2|CAS9_STRP1 Start of CRISPR-associated endonuclease Cas9/Csn1 OS=Streptococcus pyogenes

MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAE

ATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFG

NIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSD

VDKLFIQLVQT

>sp|G3ECR1|CAS9_STRTR Start of CRISPR-associated endonuclease Cas9 OS=Streptococcus thermophilus

MLFNKCIIISINLDFSNKEKCMTKPYSIGLDIGTNSVGWAVITDNYKVPSKKMKVLGNTS

KKYIKKNLLGVLLFDSGITAEGRRLKRTARRRYTRRRNRILYLQEIFSTEMATLDDAFFQ

RLDDSFLVPDDKRDSKYPIF


An identifier is composed of alphanumeric characters, _ (underscores) and - (hyphens).  Do not put spaces in an identifier.


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article