Skip to content

Support set operations in regular expression character classes #152100

Description

@serhiy-storchaka

Feature or enhancement

Implement the set operations of Unicode Technical Standard #18 RL1.3 in re character classes, together with nested sets.

gh-74534 added FutureWarnings in Python 3.7 for the ambiguous constructs (--, &&, ~~, ||, and a leading [) as preparation for this syntax; this issue turns them into operators.

Two character sets are combined by an operator, where an operand may be a nested set in brackets:

  • [A--B] — difference: a character in A but not in B.
  • [A&&B] — intersection: a character in both A and B.
  • [A||B] — union: a character in A or B.
  • [A~~B] — symmetric difference: a character in A or B but not both.

Operators have no precedence and apply left to right; nested sets are used to group. A leading ^ complements the whole result.

For example, [a-z--[aeiou]] matches an ASCII lowercase consonant and [\w&&[a-z]] matches an ASCII lowercase letter.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-regextype-featureA feature request or enhancement
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions