MPEP § 2413.01(a) — The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 (Annotated Rules)

This page consolidates and annotates all enforceable requirements under MPEP § 2413.01(a), including statutory authority, regulatory rules, examiner guidance, and practice notes. It is provided as guidance, with links to the ground truth sources. This is information only, it is not legal advice.

The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8

This section addresses The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8. Primary authority: 37 CFR 1.831(b) and 37 CFR 1.833. Contains: 3 requirements, 2 permissions, and 1 other statement.

Key Rules

StatutoryPermittedAlways

[mpep-2413-01-a-92e05d79d8357620a301be4e]

XML Format for Sequence Listings Required

Note:

The 'Sequence Listing XML' must be a single file encoded using Unicode UTF-8 for applications filed on or after July 1, 2022.

[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

Jump to MPEP Source · 37 CFR 1.831(b)Sequence Listing FormatSequence Listing RequirementsSequence Listing Content

StatutoryRequiredAlways

[mpep-2413-01-a-1f09a6a207e1ec039b784ff4]

Sequence Listing XML Must Be Single File UTF-8

Note:

The 'Sequence Listing XML' must be presented as a single file encoded in Unicode UTF-8, complying with specific character set requirements.

(a) The “Sequence Listing XML” as required by § 1.831(a) must be presented as a single file in XML 1.0 encoded using Unicode UTF–8, where the character set complies with paragraphs 40 and 41 and Annex IV of WIPO Standard ST.26 (incorporated by reference, see § 1.839).

Jump to MPEP Source · 37 CFR 1.831(a)Sequence Listing FormatSequence Listing RequirementsSequence Listing Content

StatutoryRequiredAlways

[mpep-2413-01-a-4f6a5a5a46049eba88f14afb]

Sequence Listing XML Must Be Single File Encoded UTF-8

Note:

The entire sequence listing must be presented as a single file encoded in Unicode UTF-8, with specific character encoding restrictions.

According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
(1) the information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F. The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth the table below; and
(2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E). The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40). See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”

Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing RequirementsSequence Listing Content

StatutoryRequiredAlways

[mpep-2413-01-a-ec349bc270336cf104d6bf3f]

Sequence Listing XML Character Restrictions

Note:

The Sequence Listing XML must use printable characters from the Unicode Basic Latin code table (U+0020 to U+007E) and encode using UTF-8, with reserved characters replaced as specified.

According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
…
(2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E).

Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing RequirementsSequence Listing Content

StatutoryRequiredAlways

[mpep-2413-01-a-b69cec4ccee087180e202603]

Reserved Characters Must Be Replaced In Sequence Listing XML

Note:

The rule requires that reserved characters such as ‘, &, <, and > must be replaced in the sequence listing XML according to WIPO Standard ST.26.

According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
…
The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40). See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”

Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing RequirementsSequence Listing Content

StatutoryPermittedAlways

[mpep-2413-01-a-57d34d4cb8d2dc0e17173064]

Character Entity References for Sequence Listing XML

Note:

The rule specifies that only predefined character entity references as per WIPO Standard ST.26, paragraph 41, are permitted in a Sequence Listing XML.

The only character entity references permitted are the predefined entities set forth above (WIPO Standard ST.26, paragraph 41).

Jump to MPEP Source · 37 CFR 1.833Sequence Listing FormatSequence Listing ContentSequence Listing Requirements

StatutoryInformativeAlways

[mpep-2413-01-a-a31d20c5701f009b2e78e8d9]

XML Format for Sequence Listings Required

Note:

The sequence listing must be in a single file encoded using Unicode UTF-8.

[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

Jump to MPEP Source · 37 CFR 1.831(b)Sequence Listing ContentSequence Listing RequirementsSequence Listing Format

Citations

Primary topic	Citation
Sequence Listing Format	37 CFR § 1.831(a)
Sequence Listing Content Sequence Listing Format	37 CFR § 1.831(b)
Sequence Listing Format	37 CFR § 1.833
Sequence Listing Format	37 CFR § 1.839
Sequence Listing Format	MPEP § 2413.01(f)
Sequence Listing Format	MPEP § 2413.01(g)

Source Text from USPTO’s MPEP

This is an exact copy of the MPEP from the USPTO. It is here for your reference to see the section in context.

2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 [R-01.2024]

[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.

(a) The “Sequence Listing XML” as required by § 1.831(a) must be presented as a single file in XML 1.0 encoded using Unicode UTF–8, where the character set complies with paragraphs 40 and 41 and Annex IV of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
*****

According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:

(1) the information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F. The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth the table below; and
(2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E). The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40).

See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”

WIPO Standard ST.26 specifies that in an XML instance of a “Sequence Listing XML”, numeric character references must not be used and the following reserved characters must be replaced by the corresponding predefined entities when used in a value of an attribute or content of an element:

List of Reserved Characters and Predefined Entities
Reserved Character	Predefined Entities
<	<
>	>
&	&
“	"
‘	'

Reproduced from WIPO Standard ST.26, paragraph 41. See also WIPO Standard ST.26, paragraph 41, footnote 1 of the WIPO standard for details about “numeric character references.”

The only character entity references permitted are the predefined entities set forth above (WIPO Standard ST.26, paragraph 41).

BlueIron Last Updated: 2025-12-31

MPEP § 2413.01(a) — The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 (Annotated Rules)

§2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8

The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8

Key Rules

Sequence Listing Format

Sequence Listing Content

Citations

Source Text from USPTO’s MPEP

Official MPEP § 2413.01(a) — The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8

2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 [R-01.2024]

37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.