Skip to content

datetime/time.fromisoformat: C accelerator accepts an empty fraction before a timezone designator #152157

Description

@tonghuaroot

Bug report

The C accelerator for datetime.fromisoformat() and time.fromisoformat() accepts a
decimal separator (. or ,) that is followed by zero fractional digits when a
timezone designator (+, -, or Z) comes next. The pure-Python implementation
(_pydatetime) correctly rejects these strings with
ValueError: Invalid isoformat string.

Reproduction

>>> import datetime
>>> datetime.datetime.fromisoformat('2020-01-01T12:34:56.+05:00')
datetime.datetime(2020, 1, 1, 12, 34, 56, tzinfo=datetime.timezone(datetime.timedelta(seconds=18000)))
>>> datetime.time.fromisoformat('12:34:56.Z')
datetime.time(12, 34, 56, tzinfo=datetime.timezone.utc)

The C implementation parses these and silently sets microsecond=0. The pure-Python
implementation raises:

>>> import _pydatetime
>>> _pydatetime.datetime.fromisoformat('2020-01-01T12:34:56.+05:00')
Traceback (most recent call last):
  ...
ValueError: Invalid isoformat string: '2020-01-01T12:34:56.+05:00'

C-vs-pure-Python divergence

Input C accelerator Pure-Python (_pydatetime)
'2020-01-01T12:34:56.+05:00' parses, microsecond=0 ValueError
'2020-01-01T12:34:56.Z' parses, microsecond=0 ValueError
'2020-01-01T12:34:56,+05:00' parses, microsecond=0 ValueError
'12:34:56.+05:00' parses, microsecond=0 ValueError
'12:34:56,+05:00' parses, microsecond=0 ValueError
'12:34:56.Z' parses, microsecond=0 ValueError

Control cases that already agree (both reject / both accept), unchanged:

Input C / pure-Python
'12:34:56.' both ValueError
'12:34:56.5' both time(12, 34, 56, 500000)
'12:34:56+05:00' both accept (no fraction)

Which side is wrong

The C accelerator is the lenient (wrong) side; pure-Python is correct.

  • ISO 8601 §4.2.2.4 and RFC 3339 §5.6: a decimal sign in a time fraction must be
    followed by at least one fractional digit. An empty fraction is not valid.
  • The C implementation already rejects the standalone trailing decimal separator
    ('12:34:56.'), so rejecting '12:34:56.+05:00' is consistent with its own
    end-of-string behavior; the lenient outcome here is incidental rather than intentional.

Root cause

In Modules/_datetimemodule.c, parse_hh_mm_ss_ff() consumes the ./, separator and
then computes to_parse as the number of fractional-digit characters. When a timezone
designator follows immediately, the time substring handed to the function ends right after
the separator, so the generic end-of-substring check
(if (p >= p_end) return c != '\0';) fires before the decimal-separator branch and
returns "trailing content present" (1) rather than flagging an empty fraction. The caller
then accepts the value because a timezone is present. The standalone-. case is rejected
only because that same end-of-string check happens to map to a ValueError in the
no-timezone path.

Suggested fix

Reject parse_hh_mm_ss_ff() when a ./, separator is seen but no fractional digit
follows, by handling the decimal-separator case before the generic end-of-substring check.
This matches the pure-Python behavior and the spec. A patch and regression tests are ready.

Alternative the maintainers may prefer: make pure-Python lenient instead. The spec
(≥1 digit required) and the C implementation's own rejection of the standalone trailing
separator both argue for fixing C to reject, so that is what the attached patch does.

Environment

Reproduced on current main (3.16.0a0). Affects all maintained branches that ship the C
accelerator.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions