MPEP § 2423 — Symbols and Format To Be Used for Nucleotide and/or Amino Acid Sequence Data for WIPO ST.25 (Annotated Rules)

§2423 Symbols and Format To Be Used for Nucleotide and/or Amino Acid Sequence Data for WIPO ST.25

USPTO MPEP version: BlueIron's Update: 2025-12-31

This page consolidates and annotates all enforceable requirements under MPEP § 2423, including statutory authority, regulatory rules, examiner guidance, and practice notes. It is provided as guidance, with links to the ground truth sources. This is information only, it is not legal advice.

Symbols and Format To Be Used for Nucleotide and/or Amino Acid Sequence Data for WIPO ST.25

This section addresses Symbols and Format To Be Used for Nucleotide and/or Amino Acid Sequence Data for WIPO ST.25. Primary authority: 37 CFR 1.822. Contains: 12 requirements.

Key Rules

Topic

Sequence Listing Content

24 rules
StatutoryInformativeAlways
[mpep-2423-1becb4a565ee284057b99005]
Symbols and Format for Nucleotide/Amino Acid Sequences
Note:
This rule outlines the symbols and format to be used for nucleotide and amino acid sequence data in applications filed before July 1, 2022.

[Editor Note: This section is not applicable to applications filed on or after July 1, 2022, having disclosures of nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). See MPEP §§ 2412 – 2419 for guidance on WIPO ST.26 requirements for applications filed on or after July 1, 2022.]

37 CFR 1.77 · 37 CFR 1.831(b)Sequence Listing ContentSequence Listing RequirementsSequence Listing Format
StatutoryRequiredAlways
[mpep-2423-0d52f6ab6415cc159b8004b2]
Symbols and Format for Sequence Data Must Conform to Requirements
Note:
The symbols and format used for nucleotide and/or amino acid sequence data must adhere to the specific requirements outlined in paragraphs (b) through (e) of this section.

(a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall conform to the requirements of paragraphs (b) through (e) of this section.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-31f6aa5f1f79390625033121]
Code for Nucleotide and Amino Acid Sequences Must Conform to Appendices A and C
Note:
The code used to represent nucleotide and amino acid sequences must follow the standards set in appendices A and C, with exceptions for modified bases and amino acids listed in B and D.

(b) The code for representing the nucleotide and/or amino acid sequence characters shall conform to the code set forth in appendices A and C to this subpart. No code other than that specified in these sections shall be used in nucleotide and amino acid sequences. A modified base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those listed in appendices B and D to this subpart, and the modification is also set forth in the Feature section. Otherwise, each occurrence of a base or amino acid not appearing in appendices A and C, shall be listed in a given sequence as “n” or “Xaa,” respectively, with further information, as appropriate, given in the Feature section, by including one or more feature keys listed in appendices E and F to this subpart.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-1a1664a9abef0e7a01f48244]
Nucleotide and Amino Acid Sequence Codes Must Conform to Specified Standards
Note:
All nucleotide and amino acid sequence codes must adhere to the standards set forth in appendices A, C, B, and D. Modified bases or amino acids listed in these appendices can be represented as their unmodified counterparts with additional information provided in the Feature section.

(b) The code for representing the nucleotide and/or amino acid sequence characters shall conform to the code set forth in appendices A and C to this subpart. No code other than that specified in these sections shall be used in nucleotide and amino acid sequences. A modified base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those listed in appendices B and D to this subpart, and the modification is also set forth in the Feature section. Otherwise, each occurrence of a base or amino acid not appearing in appendices A and C, shall be listed in a given sequence as “n” or “Xaa,” respectively, with further information, as appropriate, given in the Feature section, by including one or more feature keys listed in appendices E and F to this subpart.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing RequirementsSequence Listing Format
StatutoryPermittedAlways
[mpep-2423-6b4eb7f87621659c43564ae5]
Modified Base or Amino Acid May Use Unmodified Code If Listed
Note:
A modified base or amino acid may be represented by its unmodified code if listed in appendices B and D, and the modification is described in the Feature section.

(b) The code for representing the nucleotide and/or amino acid sequence characters shall conform to the code set forth in appendices A and C to this subpart. No code other than that specified in these sections shall be used in nucleotide and amino acid sequences. A modified base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those listed in appendices B and D to this subpart, and the modification is also set forth in the Feature section. Otherwise, each occurrence of a base or amino acid not appearing in appendices A and C, shall be listed in a given sequence as “n” or “Xaa,” respectively, with further information, as appropriate, given in the Feature section, by including one or more feature keys listed in appendices E and F to this subpart.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-bcbad5ccf9bf4511d6113802]
Representation of Unlisted Nucleotide and Amino Acid Characters
Note:
This rule requires that any nucleotide or amino acid not listed in appendices A and C be represented as 'n' or 'Xaa', respectively, with additional details provided in the Feature section using keys from appendices E and F.

(b) The code for representing the nucleotide and/or amino acid sequence characters shall conform to the code set forth in appendices A and C to this subpart. No code other than that specified in these sections shall be used in nucleotide and amino acid sequences. A modified base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those listed in appendices B and D to this subpart, and the modification is also set forth in the Feature section. Otherwise, each occurrence of a base or amino acid not appearing in appendices A and C, shall be listed in a given sequence as “n” or “Xaa,” respectively, with further information, as appropriate, given in the Feature section, by including one or more feature keys listed in appendices E and F to this subpart.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-33cfb0bf43ff60dc6d037619]
Nucleotide Sequence Must Use Lowercase Codes
Note:
The nucleotide sequence must be listed using lowercase letters representing the one-letter codes for bases as specified in appendix A.

(c) Format representation of nucleotides. (1) A nucleotide sequence shall be listed using the lowercase letter for representing the one-letter code for the nucleotide bases set forth in appendix A to this subpart.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-0c6be161f7fc336cc20e779c]
Bases in Nucleotide Sequences Must Be Grouped by 10 Except Coding Parts
Note:
The bases in nucleotide sequences (including introns) must be listed in groups of 10 except for coding parts. Leftover bases fewer than 10 at the end of noncoding parts should be grouped together and separated by a space.

(c) Format representation of nucleotides.

(2) The bases in a nucleotide sequence (including introns) shall be listed in groups of 10 bases except in the coding parts of the sequence.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-d9e54f93dde756f2fc1f2c68]
Coding Parts Must Be Listed as Triplets
Note:
The bases in the coding parts of a nucleotide sequence must be listed as triplets (codons), with corresponding amino acids listed below.

(c) Format representation of nucleotides.

(3) The bases in the coding parts of a nucleotide sequence shall be listed as triplets (codons).

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-79f706d272a19d936c859130]
Amino Acids Must Be Listed Below Corresponding Codons
Note:
The amino acids corresponding to the codons in coding parts of a nucleotide sequence must be listed immediately below each codon.

(c) Format representation of nucleotides.

The amino acids corresponding to the codons in the coding parts of a nucleotide sequence shall be listed immediately below the corresponding codons.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-a309e4e536253ad4c466b746]
Amino Acid Below Two Nucleotides in Codon Spanning Intron
Note:
When a codon spans an intron, the corresponding amino acid must be listed below the portion containing two nucleotides.

(c) Format representation of nucleotides.

Where a codon spans an intron, the amino acid symbol shall be listed below the portion of the codon containing two nucleotides.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing RequirementsSequence Listing Format
StatutoryRequiredAlways
[mpep-2423-81a871aa9b1cb8d8d4777910]
Nucleotide Sequence Must Be Listed in Codons or Bases
Note:
A nucleotide sequence must be listed with a maximum of 16 codons or 60 bases per line, and spaces should be provided between each codon or group of 10 bases.

(c) Format representation of nucleotides.

(4) A nucleotide sequence shall be listed with a maximum of 16 codons or 60 bases per line, with a space provided between each codon or group of 10 bases.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-9b6dbdbe983a8eb4b546230c]
Nucleotide Sequence Must Be Represented by a Single Strand in 5 to 3 Direction
Note:
The nucleotide sequence must be shown using only one strand, moving from left to right in the 5' to 3' direction.

(c) Format representation of nucleotides.

(5) A nucleotide sequence shall be represented, only by a single strand, in the 5 to 3 direction, from left to right.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-d3769f15230bc96e219dbf01]
Nucleotide Bases Must Start at Sequence Number 1
Note:
The enumeration of nucleotide bases in a sequence must start with the first base numbered as 1 and continue continuously from there.

(c) Format representation of nucleotides.

(6) The enumeration of nucleotide bases shall start at the first base of the sequence with number 1.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-7778cbd8ce416763d61acce5]
Nucleotide Base Enumeration Must Appear in Right Margin
Note:
The enumeration of nucleotide bases must be displayed in the right margin, next to the line containing one-letter codes for the bases and the number of the last base.

(c) Format representation of nucleotides.

The enumeration shall appear in the right margin, next to the line containing the one-letter codes for the bases and giving the number of the last base of that line.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing RequirementsSequence Listing Format
StatutoryRequiredAlways
[mpep-2423-06e32221d20d62cd5fd75fb4]
Representation of Amino Acids Must Follow Specific Format
Note:
The amino acids in a protein or peptide sequence must be listed using three-letter abbreviations, with specific formatting and enumeration rules.
(d) Representation of amino acids.
  • (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter abbreviation, with the first letter as an upper case character, as in Appendix C to this subpart.
  • (2) A protein or peptide sequence shall be listed with a maximum of 16 amino acids per line, with a space provided between each amino acid.
  • (3) An amino acid sequence shall be represented in the amino to carboxy direction, from left to right, and the amino and carboxy groups shall not be represented in the sequence.
  • (4) The enumeration of amino acids may start at the first amino acid of the first mature protein, with the number 1. When represented, the amino acids preceding the mature protein, (e.g., pre‑sequences, pro-sequences, pre‑pro-sequences, and signal sequences) shall have negative numbers, counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids shall start at the first amino acid at the amino terminal as number 1, and shall appear below every five amino acids of the sequence. The enumeration method for amino acid sequences that is set forth in this section remains applicable for amino acid sequences that are circular in configuration, with the exception that the designation of the first amino acid of the sequence may be made at the option of the applicant.
  • (5) An amino acid sequence that contains internal terminator symbols (e.g., “Ter,” “*,” or “.,” etc.) may not be represented as a single amino acid sequence but shall be represented as separate amino acid sequences.
Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-4246e00409f8cdba4b45646a]
Protein Sequence Must Be Listed in Groups of 16 Amino Acids
Note:
The protein or peptide sequence must be listed with a maximum of 16 amino acids per line, separated by spaces.

(d) Representation of amino acids.

(2) A protein or peptide sequence shall be listed with a maximum of 16 amino acids per line, with a space provided between each amino acid.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryProhibitedAlways
[mpep-2423-3e7bd0bdab1bbc2d49d8cbff]
Amino to Carboxy Direction Required for Sequences
Note:
The amino acid sequence must be represented from the amino end to the carboxy end, without including the amino and carboxy groups.

(d) Representation of amino acids.

(3) An amino acid sequence shall be represented in the amino to carboxy direction, from left to right, and the amino and carboxy groups shall not be represented in the sequence.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryPermittedAlways
[mpep-2423-395e13b69c1404e1862e7c9e]
Amino Acid Sequence Must Start at First Mature Protein
Note:
The first amino acid of the mature protein must be numbered as 1, with preceding sequences using negative numbers.

(d) Representation of amino acids.

(4) The enumeration of amino acids may start at the first amino acid of the first mature protein, with the number 1.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing RequirementsSequence Listing Format
StatutoryRequiredAlways
[mpep-2423-dacf791b9c076ec9c9815a1b]
Negative Numbers for Amino Acids Before Mature Protein
Note:
The rule requires that amino acids preceding the mature protein, such as pre-sequences and pro-sequences, be assigned negative numbers starting from the amino acid immediately after number 1.

(d) Representation of amino acids.

When represented, the amino acids preceding the mature protein, (e.g., pre‑sequences, pro-sequences, pre‑pro-sequences, and signal sequences) shall have negative numbers, counting backwards starting with the amino acid next to number 1.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing RequirementsSequence Listing Format
StatutoryRequiredAlways
[mpep-2423-36e600b0658e9381814932ae]
Amino Acid Sequence Must Start at N-Terminal
Note:
The first amino acid of the sequence must be numbered 1 and located at the amino terminal. Each group of five amino acids is to be labeled with their respective positions.

(d) Representation of amino acids.

Otherwise, the enumeration of amino acids shall start at the first amino acid at the amino terminal as number 1, and shall appear below every five amino acids of the sequence.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryPermittedAlways
[mpep-2423-fedb2bd443d86bc176b75328]
Option for Designating First Amino Acid in Circular Sequences
Note:
Allows applicants to choose the first amino acid designation in circular amino acid sequences.

(d) Representation of amino acids.

The enumeration method for amino acid sequences that is set forth in this section remains applicable for amino acid sequences that are circular in configuration, with the exception that the designation of the first amino acid of the sequence may be made at the option of the applicant.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryProhibitedAlways
[mpep-2423-b0809c59cdd7d45fd3c4a565]
Amino Acid Sequences With Terminators Must Be Separated
Note:
An amino acid sequence containing terminator symbols must be represented as separate sequences rather than a single sequence.

(d) Representation of amino acids.

(5) An amino acid sequence that contains internal terminator symbols (e.g., “Ter,” “*,” or “.,” etc.) may not be represented as a single amino acid sequence but shall be represented as separate amino acid sequences.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
StatutoryInformativeAlways
[mpep-2423-4e1226ed9e231251cc49cdfa]
Appendices A-F Reproduced in MPEP §2422(I)
Note:
The appendices referenced in 37 CFR 1.822 are reproduced in section MPEP §2422(I) for sequence listing content.

Appendices A through F referenced in 37 CFR 1.822 are reproduced in MPEP § 2422(I).

Jump to MPEP Source · 37 CFR 1.822Sequence Listing ContentSequence Listing FormatSequence Listing Requirements
Topic

Sequence Listing Format

6 rules
StatutoryRequiredAlways
[mpep-2423-45906d672c2f88465e6abfa7]
Format for Representing Nucleotides in Sequences
Note:
Describes how nucleotide sequences must be formatted, including grouping bases and listing amino acids corresponding to coding parts.
(c) Format representation of nucleotides.
  • (1) A nucleotide sequence shall be listed using the lowercase letter for representing the one-letter code for the nucleotide bases set forth in appendix A to this subpart.
  • (2) The bases in a nucleotide sequence (including introns) shall be listed in groups of 10 bases except in the coding parts of the sequence. Leftover bases, fewer than 10 in number, at the end of noncoding parts of a sequence shall be grouped together and separated from adjacent groups of 10 or 3 bases by a space.
  • (3) The bases in the coding parts of a nucleotide sequence shall be listed as triplets (codons). The amino acids corresponding to the codons in the coding parts of a nucleotide sequence shall be listed immediately below the corresponding codons. Where a codon spans an intron, the amino acid symbol shall be listed below the portion of the codon containing two nucleotides.
  • (4) A nucleotide sequence shall be listed with a maximum of 16 codons or 60 bases per line, with a space provided between each codon or group of 10 bases.
  • (5) A nucleotide sequence shall be represented, only by a single strand, in the 5 to 3 direction, from left to right.
  • (6) The enumeration of nucleotide bases shall start at the first base of the sequence with number 1. The enumeration shall be continuous through the whole sequence in the direction 5 to 3. The enumeration shall appear in the right margin, next to the line containing the one-letter codes for the bases and giving the number of the last base of that line.
  • (7) For those nucleotide sequences that are circular in configuration, the enumeration method set forth in paragraph (c)(6) of this section remains applicable with the exception that the designation of the first base of the nucleotide sequence may be made at the option of the applicant.
Jump to MPEP Source · 37 CFR 1.822Sequence Listing FormatSequence Listing ContentSequence Listing Requirements
StatutoryRequiredAlways
[mpep-2423-728e6c1b5ef1214a5ba09b24]
Noncoding Bases Must Be Grouped Separately
Note:
Bases fewer than 10 in noncoding parts of a sequence must be grouped together and separated by spaces from adjacent groups.

(c) Format representation of nucleotides.

Leftover bases, fewer than 10 in number, at the end of noncoding parts of a sequence shall be grouped together and separated from adjacent groups of 10 or 3 bases by a space.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2423-fde032f7375e6f733504a368]
Nucleotide Bases Must Be Continuously Enumerated From 5' to 3'
Note:
The enumeration of nucleotide bases must start at the 5' end and continue continuously through the entire sequence.

(c) Format representation of nucleotides.

The enumeration shall be continuous through the whole sequence in the direction 5 to 3.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2423-0c95c1f70e2002e7a6f48be6]
Amino Acid Sequences Must Use Three-Letter Abbreviations
Note:
The amino acids in a protein or peptide sequence must be listed using three-letter abbreviations with the first letter as an uppercase character, as specified in Appendix C.

(d) Representation of amino acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter abbreviation, with the first letter as an upper case character, as in Appendix C to this subpart.

Jump to MPEP Source · 37 CFR 1.822Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2423-9b19ba7321c1bd66668199ef]
Sequences With Gaps Must Be Separated
Note:
Sequences containing gaps must be represented as separate sequences, each with its own identifier, equal in number to the continuous segments of data.

(e) A sequence with a gap or gaps shall be represented as a plurality of separate sequences, with separate sequence identifiers (§ 1.823(a)(5)), with the number of separate sequences being equal in number to the number of continuous strings of sequence data. A sequence composed of one or more noncontiguous segments of a larger sequence or segments from different sequences shall be presented as a separate sequence.

Jump to MPEP Source · 37 CFR 1.823(a)(5)Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2423-18b17b1f08533050a201b277]
Noncontiguous Sequence Segments Must Be Presented Separately
Note:
A sequence composed of noncontiguous segments must be represented as separate sequences with distinct identifiers.

(e) A sequence with a gap or gaps shall be represented as a plurality of separate sequences, with separate sequence identifiers (§ 1.823(a)(5)), with the number of separate sequences being equal in number to the number of continuous strings of sequence data. A sequence composed of one or more noncontiguous segments of a larger sequence or segments from different sequences shall be presented as a separate sequence.

Jump to MPEP Source · 37 CFR 1.823(a)(5)Sequence Listing FormatSequence Listing RequirementsSequence Listing Content

Citations

Primary topicCitation
Sequence Listing Content37 CFR § 1.822
Sequence Listing Format37 CFR § 1.823(a)(5)
Sequence Listing Content37 CFR § 1.831(b)
Sequence Listing ContentMPEP § 2412
Sequence Listing ContentMPEP § 2422(I)

Source Text from USPTO’s MPEP

This is an exact copy of the MPEP from the USPTO. It is here for your reference to see the section in context.

BlueIron Last Updated: 2025-12-31