What character restrictions apply to different parts of the Sequence Listing XML?

The MPEP 2413.01(a) outlines specific character restrictions for different parts of the Sequence Listing XML:

  1. General information and NonEnglishQualifier_value: “The information contained in the elements ApplicantName, InventorName and InventionTitle of the general information part, and the NonEnglishQualifier_value of the sequence data part, may be composed of any valid Unicode characters indicated in the XML 1.0 specification except the Unicode Control code points 0000-001F and 007F-009F.”
  2. All other elements and attributes: “The information contained in all other elements and attributes of the general information part and in all other elements and attributes of the sequence data part must be composed of printable characters (including the space character) from the Unicode Basic Latin code table (i.e., limited to Unicode code points 0020 through 007E – see Annex IV).”

These restrictions ensure consistency and proper encoding of the Sequence Listing XML file.

