MPEP § 2413.01(a) — The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 (Annotated Rules)
§2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8
This page consolidates and annotates all enforceable requirements under MPEP § 2413.01(a), including statutory authority, regulatory rules, examiner guidance, and practice notes. It is provided as guidance, with links to the ground truth sources. This is information only, it is not legal advice.
The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8
This section addresses The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8. Primary authority: 37 CFR 1.831(b) and 37 CFR 1.833. Contains: 3 requirements, 2 permissions, and 1 other statement.
Key Rules
Sequence Listing Format
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
(a) The “Sequence Listing XML” as required by § 1.831(a) must be presented as a single file in XML 1.0 encoded using Unicode UTF–8, where the character set complies with paragraphs 40 and 41 and Annex IV of WIPO Standard ST.26 (incorporated by reference, see § 1.839).
According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
- (1) the information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F. The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth the table below; and
- (2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E). The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40). See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”
According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
…
(2) the information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E).
According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
…
The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth in the table below (WIPO Standard ST.26, paragraph 40). See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”
The only character entity references permitted are the predefined entities set forth above (WIPO Standard ST.26, paragraph 41).
Sequence Listing Content
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
Citations
| Primary topic | Citation |
|---|---|
| Sequence Listing Format | 37 CFR § 1.831(a) |
| Sequence Listing Content Sequence Listing Format | 37 CFR § 1.831(b) |
| Sequence Listing Format | 37 CFR § 1.833 |
| Sequence Listing Format | 37 CFR § 1.839 |
| Sequence Listing Format | MPEP § 2413.01(f) |
| Sequence Listing Format | MPEP § 2413.01(g) |
Source Text from USPTO’s MPEP
This is an exact copy of the MPEP from the USPTO. It is here for your reference to see the section in context.
Official MPEP § 2413.01(a) — The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8
Source: USPTO2413.01(a) The “Sequence Listing XML” is a Single File Encoded Using Unicode UTF-8 [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
37 CFR 1.833 Requirements for a “Sequence Listing XML” for nucleotide and/or amino acid sequences as part of a patent application filed on or after July 1, 2022.
According to 37 CFR 1.833, the entire “Sequence Listing XML” must be presented as a single file. WIPO Standard ST.26 specifies that the file must be encoded using Unicode UTF-8, with the following restrictions:
- (1) the information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F. The reserved characters “, &, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E respectively), must be replaced as set forth the table below; and
- (2) the information contained in all other elements
and attributes of the general information part and in all other elements and
attributes of the sequence data part must be composed of printable characters
(including the space character) from the Unicode Basic Latin code table (i.e.,
limited to Unicode code points 0020 through 007E). The reserved characters “,
&, ‘, <, and > (Unicode code points 0022, 0026, 0027, 003C and 003E
respectively), must be replaced as set forth in the table below (WIPO Standard
ST.26, paragraph 40).
See MPEP § 2413.01(f) for details about the “general information part” and MPEP § 2413.01(g) for details about the “sequence data part.”
WIPO Standard ST.26 specifies that in an XML instance of a “Sequence Listing XML”, numeric character references must not be used and the following reserved characters must be replaced by the corresponding predefined entities when used in a value of an attribute or content of an element:
| Reserved Character | Predefined Entities |
|---|---|
| < | < |
| > | > |
| & | & |
| “ | " |
| ‘ | ' |
Reproduced from WIPO Standard ST.26, paragraph 41. See also WIPO Standard ST.26, paragraph 41, footnote 1 of the WIPO standard for details about “numeric character references.”
The only character entity references permitted are the predefined entities set forth above (WIPO Standard ST.26, paragraph 41).