How should sequences with gaps or unknowns be represented?

Sequences with gaps or unknowns should be represented as follows:

  1. For sequences with known numbers of unknown residues:
    • Include as one sequence in the sequence listing
    • Use “n” for unknown nucleotides and “X” for unknown amino acids
    • Specify the exact number of “n” or “X” residues
  2. For sequences with unknown numbers of residues between regions:
    • Do not represent as a single sequence
    • Include each region of specifically defined residues as a separate sequence
    • Assign each region its own sequence identifier

The MPEP states: “A sequence that contains regions of specifically defined residues separated by one or more gaps of an unknown or undisclosed number of residues must not be represented in the “Sequence Listing XML” as a single sequence. Each region of specifically defined residues (as encompassed by the definitions in 37 CFR 1.831(b)) must be included in the “Sequence Listing XML” as a separate sequence and assigned its own sequence identifier.

To learn more:

Topics: MPEP 2400 - Biotechnology, MPEP 2412.05 - Representation And Symbols For Nucleotide And/Or Amino Acid Sequences, Patent Law, Patent Procedure
Tags: Sequence Representation, Sequences With Gaps, Unknown Residues, wipo standard st.26