MPEP § 2412.05(b) — Representation and Symbols of Nucleotide Sequence Data (Annotated Rules)
§2412.05(b) Representation and Symbols of Nucleotide Sequence Data
This page consolidates and annotates all enforceable requirements under MPEP § 2412.05(b), including statutory authority, regulatory rules, examiner guidance, and practice notes. It is provided as guidance, with links to the ground truth sources. This is information only, it is not legal advice.
Representation and Symbols of Nucleotide Sequence Data
This section addresses Representation and Symbols of Nucleotide Sequence Data. Primary authority: 37 CFR 1.831(b) and 37 CFR 1.832. Contains: 11 requirements, 1 prohibition, 1 guidance statement, and 4 permissions.
Key Rules
Sequence Listing Content
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
(b) The representation and symbols for nucleotide sequence data shall conform to the requirements of paragraphs (b)(1) through (4) of this section.
- (1) A nucleotide sequence must be represented in the manner described in paragraphs 11–12 of WIPO Standard ST.26.
- (2) All nucleotides, including nucleotide analogs, modified nucleotides, and “unknown” nucleotides, within a nucleotide sequence must be represented using the symbols set forth in paragraphs 13–16, 19, and 21 of WIPO Standard ST.26.
- (3) Modified nucleotides within a nucleotide sequence must be described in the manner discussed in paragraphs 17, 18, and 19 of WIPO Standard ST.26.
- (4) A region containing a known number of contiguous “a,” “c,” “g,” “t,” or “n” residues for which the same description applies may be jointly described in the manner described in paragraph 22 of WIPO Standard ST.26.
(b) The representation and symbols for nucleotide sequence data shall conform to the requirements of paragraphs (b)(1) through (4) of this section. (1) A nucleotide sequence must be represented in the manner described in paragraphs 11–12 of WIPO Standard ST.26.
(b) The representation and symbols for nucleotide sequence data shall conform to the requirements of paragraphs (b)(1) through (4) of this section.
…
(2) All nucleotides, including nucleotide analogs, modified nucleotides, and “unknown” nucleotides, within a nucleotide sequence must be represented using the symbols set forth in paragraphs 13–16, 19, and 21 of WIPO Standard ST.26.
(b) The representation and symbols for nucleotide sequence data shall conform to the requirements of paragraphs (b)(1) through (4) of this section.
…
(3) Modified nucleotides within a nucleotide sequence must be described in the manner discussed in paragraphs 17, 18, and 19 of WIPO Standard ST.26.
WIPO Standard ST.26, paragraph 11, provides that a nucleotide sequence must be represented only by a single strand, in the 5’ to 3’ direction from left to right, or in the direction from left to right that mimics the 5’ to 3’ direction. The designations 5’ and 3’ or any other similar designations must not be included in the sequence. A double-stranded nucleotide sequence disclosed by enumeration of the residues of both strands must be represented as:
- (a) a single sequence or as two separate sequences, each assigned its own sequence identifier, where the two separate strands are fully complementary to each other, or
- (b) two separate sequences, each assigned its own sequence identifier, where the two strands are not fully complementary to each other.
WIPO Standard ST.26, paragraph 11, provides that a nucleotide sequence must be represented only by a single strand, in the 5’ to 3’ direction from left to right, or in the direction from left to right that mimics the 5’ to 3’ direction. The designations 5’ and 3’ or any other similar designations must not be included in the sequence. A double-stranded nucleotide sequence disclosed by enumeration of the residues of both strands must be represented as:
…
(b) two separate sequences, each assigned its own sequence identifier, where the two strands are not fully complementary to each other.
WIPO Standard ST.26, paragraph 12, provides that the first nucleotide presented in the sequence is residue position number 1. When nucleotide sequences are circular in configuration, applicant must choose the nucleotide in residue position number 1. Numbering is continuous throughout the entire sequence in the 5’ to 3’ direction, or in the direction that mimics the 5’ to 3’ direction. The last residue position number must equal the number of nucleotides in the sequence.
WIPO Standard ST.26, paragraph 12, provides that the first nucleotide presented in the sequence is residue position number 1. When nucleotide sequences are circular in configuration, applicant must choose the nucleotide in residue position number 1. Numbering is continuous throughout the entire sequence in the 5’ to 3’ direction, or in the direction that mimics the 5’ to 3’ direction. The last residue position number must equal the number of nucleotides in the sequence.
WIPO Standard ST.26, paragraph 12, provides that the first nucleotide presented in the sequence is residue position number 1. When nucleotide sequences are circular in configuration, applicant must choose the nucleotide in residue position number 1. Numbering is continuous throughout the entire sequence in the 5’ to 3’ direction, or in the direction that mimics the 5’ to 3’ direction. The last residue position number must equal the number of nucleotides in the sequence.
WIPO Standard ST.26, paragraph 13, provides that all nucleotides in a sequence must be represented using the symbols as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). Only lower-case letters must be used. Any symbol used to represent a nucleotide is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 13, provides that all nucleotides in a sequence must be represented using the symbols as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). Only lower-case letters must be used. Any symbol used to represent a nucleotide is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 14, sets forth that the symbol “t” will be construed as thymine in deoxyribonucleic acid (DNA) and uracil in ribonucleic acid (RNA). Uracil in DNA or thymine in RNA is considered a modified nucleotide and must be further described in a feature table. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 16, sets forth that modified nucleotides should be represented in the sequence as the corresponding unmodified nucleotides, i.e., “a”, “c”, “g” or “t” whenever possible. Any modified nucleotide in a sequence that cannot otherwise be represented by any other symbol in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)), i.e., an “other” nucleotide, such as a non-naturally occurring nucleotide, must be represented by the symbol “n”. The symbol “n” is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 16, sets forth that modified nucleotides should be represented in the sequence as the corresponding unmodified nucleotides, i.e., “a”, “c”, “g” or “t” whenever possible. Any modified nucleotide in a sequence that cannot otherwise be represented by any other symbol in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)), i.e., an “other” nucleotide, such as a non-naturally occurring nucleotide, must be represented by the symbol “n”. The symbol “n” is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 21, provides that any “unknown” nucleotide must be represented by the symbol “n” in the sequence. An “unknown” nucleotide should be further described in a feature table using the feature key “unsure”. The symbol “n” is the equivalent of only one residue. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 21, provides that any “unknown” nucleotide must be represented by the symbol “n” in the sequence. An “unknown” nucleotide should be further described in a feature table using the feature key “unsure”. The symbol “n” is the equivalent of only one residue. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 21, provides that any “unknown” nucleotide must be represented by the symbol “n” in the sequence. An “unknown” nucleotide should be further described in a feature table using the feature key “unsure”. The symbol “n” is the equivalent of only one residue. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 17, specifies that a modified nucleotide must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”) using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Table 2: List of Modified Nucleotides, below, as the qualifier value. See MPEP § 2413.01(g) subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g) subsections V and VI, for more information regarding use of a qualifier. If the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier. The abbreviations (or full names) provided in Table 2 must not be used in the sequence itself.
WIPO Standard ST.26, paragraph 17, specifies that a modified nucleotide must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”) using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Table 2: List of Modified Nucleotides, below, as the qualifier value. See MPEP § 2413.01(g) subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g) subsections V and VI, for more information regarding use of a qualifier. If the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier. The abbreviations (or full names) provided in Table 2 must not be used in the sequence itself.
WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element of a “feature table” as described below. See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in the feature table as required for a modified nucleotide.
WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element of a “feature table” as described below. See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in the feature table as required for a modified nucleotide.
WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element of a “feature table” as described below. See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in the feature table as required for a modified nucleotide.
Sequence Listing Format
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
(b) The representation and symbols for nucleotide sequence data shall conform to the requirements of paragraphs (b)(1) through (4) of this section.
…
(4) A region containing a known number of contiguous “a,” “c,” “g,” “t,” or “n” residues for which the same description applies may be jointly described in the manner described in paragraph 22 of WIPO Standard ST.26.
WIPO Standard ST.26, paragraph 12, provides that the first nucleotide presented in the sequence is residue position number 1. When nucleotide sequences are circular in configuration, applicant must choose the nucleotide in residue position number 1. Numbering is continuous throughout the entire sequence in the 5’ to 3’ direction, or in the direction that mimics the 5’ to 3’ direction. The last residue position number must equal the number of nucleotides in the sequence.
WIPO Standard ST.26, paragraph 13, provides that all nucleotides in a sequence must be represented using the symbols as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). Only lower-case letters must be used. Any symbol used to represent a nucleotide is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 14, sets forth that the symbol “t” will be construed as thymine in deoxyribonucleic acid (DNA) and uracil in ribonucleic acid (RNA). Uracil in DNA or thymine in RNA is considered a modified nucleotide and must be further described in a feature table. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 19, specifies that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 17, specifies that a modified nucleotide must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”) using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Table 2: List of Modified Nucleotides, below, as the qualifier value. See MPEP § 2413.01(g) subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g) subsections V and VI, for more information regarding use of a qualifier. If the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier. The abbreviations (or full names) provided in Table 2 must not be used in the sequence itself.
WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element of a “feature table” as described below. See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in the feature table as required for a modified nucleotide.
WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element of a “feature table” as described below. See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in the feature table as required for a modified nucleotide.
WIPO Standard ST.26, paragraph 19, provides that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value.See MPEP § 2413.01(g), subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g), subsections V and VI, for more information regarding use of a qualifier.
WIPO Standard ST.26, paragraph 22, specifies that a region containing a known number of contiguous “a”, “c”, “g”, “t”, or “n” residues for which the same description applies may be jointly described using a single INSDFeature element with the syntax “x..y” as the location descriptor in the element INSDFeature_location. See MPEP § 2413.01(g) subsection IV, for information regarding INSDFeature_location. For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions, see MPEP § 2412.05(c) and MPEP § 2413.01(g), subsection XI.
WIPO Standard ST.26, paragraph 22, specifies that a region containing a known number of contiguous “a”, “c”, “g”, “t”, or “n” residues for which the same description applies may be jointly described using a single INSDFeature element with the syntax “x..y” as the location descriptor in the element INSDFeature_location. See MPEP § 2413.01(g) subsection IV, for information regarding INSDFeature_location. For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions, see MPEP § 2412.05(c) and MPEP § 2413.01(g), subsection XI.
Sequence Listing Requirements
WIPO Standard ST.26, paragraph 17, specifies that a modified nucleotide must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”) using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Table 2: List of Modified Nucleotides, below, as the qualifier value. See MPEP § 2413.01(g) subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g) subsections V and VI, for more information regarding use of a qualifier. If the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier. The abbreviations (or full names) provided in Table 2 must not be used in the sequence itself.
Citations
| Primary topic | Citation |
|---|---|
| Sequence Listing Content Sequence Listing Format | 37 CFR § 1.831(b) |
| Sequence Listing Content Sequence Listing Format | MPEP § 2412.03(a) |
| Sequence Listing Content Sequence Listing Format | MPEP § 2412.03(e) |
| Sequence Listing Content Sequence Listing Format | MPEP § 2412.05(c) |
| Sequence Listing Content Sequence Listing Format Sequence Listing Requirements | MPEP § 2413.01(g) |
Source Text from USPTO’s MPEP
This is an exact copy of the MPEP from the USPTO. It is here for your reference to see the section in context.
Official MPEP § 2412.05(b) — Representation and Symbols of Nucleotide Sequence Data
Source: USPTO2412.05(b) Representation and Symbols of Nucleotide Sequence Data [R-01.2024]
[Editor Note: This section is applicable to all applications with a filing date, or, for national phase applications, an international filing date, on or after July 1, 2022, having disclosure of one or more nucleotide and/or amino acid sequences as defined in 37 CFR 1.831(b). Formatting representations of XML (eXtensible Markup Language) elements in this section appear different than shown in Standard ST.26, which may be accessed at: www.wipo.int /export/sites/www/standards/en/pdf/03-26-01.pdf.]
37 CFR 1.832 Representation of nucleotide and/or amino acid sequence data in the “Sequence Listing XML” part of a patent application filed on or after July 1, 2022.
-
*****
- (b) The representation and symbols for nucleotide
sequence data shall conform to the requirements of paragraphs (b)(1) through
(4) of this section.
- (1) A nucleotide sequence must be represented in the manner described in paragraphs 11–12 of WIPO Standard ST.26.
- (2) All nucleotides, including nucleotide analogs, modified nucleotides, and “unknown” nucleotides, within a nucleotide sequence must be represented using the symbols set forth in paragraphs 13–16, 19, and 21 of WIPO Standard ST.26.
- (3) Modified nucleotides within a nucleotide sequence must be described in the manner discussed in paragraphs 17, 18, and 19 of WIPO Standard ST.26.
- (4) A region containing a known number of contiguous “a,” “c,” “g,” “t,” or “n” residues for which the same description applies may be jointly described in the manner described in paragraph 22 of WIPO Standard ST.26.
-
*****
WIPO Standard ST.26, paragraph 11, provides that a nucleotide sequence must be represented only by a single strand, in the 5’ to 3’ direction from left to right, or in the direction from left to right that mimics the 5’ to 3’ direction. The designations 5’ and 3’ or any other similar designations must not be included in the sequence. A double-stranded nucleotide sequence disclosed by enumeration of the residues of both strands must be represented as:
- (a) a single sequence or as two separate sequences, each assigned its own sequence identifier, where the two separate strands are fully complementary to each other, or
- (b) two separate sequences, each assigned its own sequence identifier, where the two strands are not fully complementary to each other.
WIPO Standard ST.26, paragraph 12, provides that the first nucleotide presented in the sequence is residue position number 1. When nucleotide sequences are circular in configuration, applicant must choose the nucleotide in residue position number 1. Numbering is continuous throughout the entire sequence in the 5’ to 3’ direction, or in the direction that mimics the 5’ to 3’ direction. The last residue position number must equal the number of nucleotides in the sequence.
II. SYMBOLS FOR A NUCLEOTIDE SEQUENCEWIPO Standard ST.26, paragraph 13, provides that all nucleotides in a sequence must be represented using the symbols as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). Only lower-case letters must be used. Any symbol used to represent a nucleotide is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 14, sets forth that the symbol “t” will be construed as thymine in deoxyribonucleic acid (DNA) and uracil in ribonucleic acid (RNA). Uracil in DNA or thymine in RNA is considered a modified nucleotide and must be further described in a feature table. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 15, provides that where an ambiguity symbol (representing two or more alternative nucleotides) is appropriate, the most restrictive symbol should be used, as listed in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)). For example, if a nucleotide in a given position could be “a” or “g”, then “r” should be used, rather than “n”. The symbol “n” will be construed as any one of “a”, “c”, “g”, or “t/u” except where it is used with a further description in a feature table. The symbol “n” must not be used to represent anything other than a nucleotide. A single modified or “unknown” nucleotide may be represented by the symbol “n”, together with a further description in a feature table. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.” For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions relative to a primary sequence, see MPEP § 2412.05(c); and also MPEP § 2413.01(g), subsection XII for information on variants.
WIPO Standard ST.26, paragraph 16, sets forth that modified nucleotides should be represented in the sequence as the corresponding unmodified nucleotides, i.e., “a”, “c”, “g” or “t” whenever possible. Any modified nucleotide in a sequence that cannot otherwise be represented by any other symbol in Table 1: List of Nucleotides Symbols (see MPEP § 2412.03(a)), i.e., an “other” nucleotide, such as a non-naturally occurring nucleotide, must be represented by the symbol “n”. The symbol “n” is the equivalent of only one residue.
WIPO Standard ST.26, paragraph 19, specifies that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value. See MPEP § 2413.01(g), subsection I for more detail regarding a “feature table.”
WIPO Standard ST.26, paragraph 21, provides that any “unknown” nucleotide must be represented by the symbol “n” in the sequence. An “unknown” nucleotide should be further described in a feature table using the feature key “unsure”. The symbol “n” is the equivalent of only one residue. See MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table.”
III. DESCRIPTION OF MODIFIED NUCLEOTIDES WITHIN A NUCLEOTIDE SEQUENCEWIPO Standard ST.26, paragraph 17, specifies that a modified nucleotide must be further described in a feature table (see MPEP § 2413.01(g), subsection I, for more detail regarding a “feature table”) using the feature key “modified_base” and the mandatory qualifier “mod_base” in conjunction with a single abbreviation from Table 2: List of Modified Nucleotides, below, as the qualifier value. See MPEP § 2413.01(g) subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g) subsections V and VI, for more information regarding use of a qualifier. If the abbreviation is “OTHER”, the complete unabbreviated name of the modified nucleotide must be provided as the value in a “note” qualifier. For a listing of alternative modified nucleotides, the qualifier value “OTHER” may be used in conjunction with a further “note” qualifier. The abbreviations (or full names) provided in Table 2 must not be used in the sequence itself.
WIPO Standard ST.26, paragraph 18, describes that a nucleotide sequence including one or more regions of consecutive modified nucleotides that share the same backbone moiety must be further described in a feature table as required for a modified nucleotide. See MPEP § 2413.01(g), subsection I, for information regarding a feature table and MPEP § 2412.03(e) regarding modified nucleotides. The modified nucleotides of each such region may be jointly described in a single INSDFeature element of a “feature table” as described below. See MPEP § 2413.01(g), subsection I, for information regarding INSDFeature elements of a feature table. The most restrictive unabbreviated chemical name that encompasses all of the modified nucleotides in the range or a list of the chemical names of all the nucleotides in the range must be provided as the value in the “note” qualifier. For example, a glycol nucleic acid sequence containing “a”, “c”, “g”, or “t” nucleobases may be described in the “note” qualifier as “2,3-dihydroxypropyl nucleosides.” Alternatively, the same sequence may be described in the “note” qualifier as “2,3-dihydroxypropyladenine, 2,3-dihydroxypropylthymine, 2,3-dihydroxypropylguanine, or 2,3-dihydroxypropylcytosine.” Where an individual modified nucleotide in the region includes an additional modification, then the modified nucleotide must also be further described in the feature table as required for a modified nucleotide.
WIPO Standard ST.26, paragraph 19, provides that uracil in DNA or thymine in RNA are considered modified nucleotides and must be represented in the sequence as “t” and be further described in a feature table using the feature key “modified_base”, the qualifier “mod_base” with “OTHER” as the qualifier value and the qualifier “note” with “uracil” or “thymine”, respectively, as the qualifier value.See MPEP § 2413.01(g), subsections II and III, for more information regarding use of a feature key; and MPEP § 2413.01(g), subsections V and VI, for more information regarding use of a qualifier.
| Abbreviation | Definition |
|---|---|
| ac4c | 4-acetylcytidine |
| chm5u | 5-(carboxyhydroxymethyl)uridine |
| cm | 2′-O-methylcytidine |
| cmnm5s2u | 5-carboxymethylaminomethyl-2- thiouridine |
| cmnm5u | 5-carboxymethylaminomethyluridine |
| dhu | dihydrouridine |
| fm | 2′-O-methylpseudouridine |
| gal q | beta, D-galactosylqueuosine |
| gm | 2′-O-methylguanosine |
| i | inosine |
| i6a | N6-isopentenyladenosine |
| m1a | 1-methyladenosine |
| m1f | 1-methylpseudouridine |
| m1g | 1-methylguanosine |
| m1i | 1-methylinosine |
| m22g | 2,2-dimethylguanosine |
| m2a | 2-methyladenosine |
| m2g | 2-methylguanosine |
| m3c | 3-methylcytidine |
| m4c | N4-methylcytosine |
| m5c | 5-methylcytidine |
| m6a | N6-methyladenosine |
| m7g | 7-methylguanosine |
| mam5u | 5-methylaminomethyluridine |
| mam5s2u | 5-methoxyaminomethyl-2-thiouridine |
| man q | beta, D-mannosylqueuosine |
| mcm5s2u | 5-methoxycarbonylmethyl-2- thiouridine |
| mcm5u | 5-methoxycarbonylmethyluridine |
| mo5u | 5-methoxyuridine |
| ms2i6a | 2-methylthio-N6- isopentenyladenosine |
| ms2t6a | N-((9-beta-D-ribofuranosyl-2- methylthiopurine-6- yl)carbamoyl)threonine |
| mt6a | N-((9-beta-D-ribofuranosylpurine-6- yl)N-methylcarbamoyl)threonine |
| mv | uridine-5-oxyacetic acid-methylester |
| o5u | uridine-5-oxyacetic acid |
| osyw | wybutoxosine |
| p | pseudouridine |
| q | queuosine |
| s2c | 2-thiocytidine |
| s2t | 5-methyl-2-thiouridine |
| s2u | 2-thiouridine |
| s4u | 4-thiouridine |
| m5u | 5-methyluridine |
| t6a | N-((9-beta-D-ribofuranosylpurine-6- yl)-carbamoyl)threonine |
| tm | 2′-O-methyl-5-methyluridine |
| um | 2′-O-methyluridine |
| yw | wybutosine |
| x | 3-(3-amino-3-carboxy-propyl)uridine, (acp3)u |
| OTHER | (requires note qualifier) |
Reproduced from WIPO Standard ST. 26, Annex I, Section 2.
IV. JOINTLY DESCRIBING A REGION OF A NUCLEOTIDE SEQUENCEWIPO Standard ST.26, paragraph 22, specifies that a region containing a known number of contiguous “a”, “c”, “g”, “t”, or “n” residues for which the same description applies may be jointly described using a single INSDFeature element with the syntax “x..y” as the location descriptor in the element INSDFeature_location. See MPEP § 2413.01(g) subsection IV, for information regarding INSDFeature_location. For representation of sequence variants, i.e., alternatives, deletions, insertions or substitutions, see MPEP § 2412.05(c) and MPEP § 2413.01(g), subsection XI.