MPEP § 2413.01(a) — The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 (Annotated Rules)

§2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8

USPTO MPEP version: BlueIron's Update: 2025-12-31

This page consolidates and annotates all enforceable requirements under MPEP § 2413.01(a), including statutory authority, regulatory rules, examiner guidance, and practice notes. It is provided as guidance, with links to the ground truth sources. This is information only, it is not legal advice.

The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8

This section addresses The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8. Primary authority: 37 CFR 1.831(b) and 37 CFR 1.833. Contains: 3 requirements, 2 permissions, and 1 other statement.

Key Rules

Topic

Sequence Listing Format

6 rules
StatutoryPermittedAlways
[mpep-2413-01-a-92e05d79d8357620a301be4e]
XML Format for Sequence Listings Required
Note:
The 'Sequence Listing XML' must be a single file encoded using Unicode UTF-8 for applications filed on or after July 1, 2022.

[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

Jump to MPEP Source · 37 CFR 1.831(b)Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2413-01-a-1f09a6a207e1ec039b784ff4]
Sequence Listing XML Must Be Single File UTF-8
Note:
The 'Sequence Listing XML' must be presented as a single file encoded in Unicode UTF-8, complying with specific character set requirements.

(a) The “Sequence Listing XML” as required by § 1.831(a) must be presented as a single file in XML 1.0 encoded using Unicode UTF–8, where the character set complies with paragraphs 40 and 41 and Annex IV of WIPO Standard ST.26 (incorporated by reference, see § 1.839).

Jump to MPEP Source · 37 CFR 1.831(a)Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2413-01-a-4f6a5a5a46049eba88f14afb]
Sequence Listing XML Must Be Single File Encoded UTF-8
Note:
The entire sequence listing must be presented as a single file encoded in Unicode UTF-8, with specific character encoding restrictions.
According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
  • (1) the information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F. The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth the table below; and
  • (2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E). The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40). See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”
Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2413-01-a-ec349bc270336cf104d6bf3f]
Sequence Listing XML Character Restrictions
Note:
The Sequence Listing XML must use printable characters from the Unicode Basic Latin code table (U+0020 to U+007E) and encode using UTF-8, with reserved characters replaced as specified.

According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:

(2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E).

Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryRequiredAlways
[mpep-2413-01-a-b69cec4ccee087180e202603]
Reserved Characters Must Be Replaced In Sequence Listing XML
Note:
The rule requires that reserved characters such as ‘, &, <, and > must be replaced in the sequence listing XML according to WIPO Standard ST.26.

According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:

The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40). See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”

Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing RequirementsSequence Listing Content
StatutoryPermittedAlways
[mpep-2413-01-a-57d34d4cb8d2dc0e17173064]
Character Entity References for Sequence Listing XML
Note:
The rule specifies that only predefined character entity references as per WIPO Standard ST.26, paragraph 41, are permitted in a Sequence Listing XML.

The only character entity references permitted are the predefined entities set forth above (WIPO Standard ST.26, paragraph 41).

Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing ContentSequence Listing Requirements
Topic

Sequence Listing Content

1 rules
StatutoryInformativeAlways
[mpep-2413-01-a-a31d20c5701f009b2e78e8d9]
XML Format for Sequence Listings Required
Note:
The sequence listing must be in a single file encoded using Unicode UTF-8.

[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

Jump to MPEP Source · 37 CFR 1.831(b)Sequence Listing ContentSequence Listing RequirementsSequence Listing Format

Citations

Primary topicCitation
Sequence Listing Format37 CFR § 1.831(a)
Sequence Listing Content
Sequence Listing Format
37 CFR § 1.831(b)
Sequence Listing Format37 CFR § 1.833
Sequence Listing Format37 CFR § 1.839
Sequence Listing FormatMPEP § 2413.01(f)
Sequence Listing FormatMPEP § 2413.01(g)

Source Text from USPTO’s MPEP

This is an exact copy of the MPEP from the USPTO. It is here for your reference to see the section in context.

BlueIron Last Updated: 2025-12-31