You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
_pydatetime.date.fromisoformat (the pure-Python reference used when the C
accelerator is unavailable, and via _pydatetime directly) returns a wrong-but-plausible date for several strings that are not valid ISO-8601
dates. The C accelerator raises ValueError for every one of them. Because the
result is a silently incorrect date rather than an error, malformed input
becomes valid-looking data with no signal that anything went wrong.
There are two surface forms of the same underlying defect in _parse_isoformat_date, which slices fixed-width substrings and calls int()
on them without checking that each slice is exactly N ASCII digits:
(1) int() tolerates a leading + / - / space in a basic-format field.
'2020061' is 7 chars; the gate len(date_string) in (7, 8, 10) lets it
through, the month slice reads '06' and the day slice dtstr[6:8] reads the
1-character tail '1', giving date(2020, 6, 1). '2020-W2' reads a 1-digit
week int('2'). The C parse_digits(p, ..., 2) requires exactly two digits, so
C rejects all of these.
datetime.fromisoformat inherits the same defect via the date branch, e.g. _pydatetime.datetime.fromisoformat('2020061') returns datetime.datetime(2020, 6, 1, 0, 0) while the C path raises.
C vs pure-Python
input
C _datetime
pure-Python _pydatetime
date.fromisoformat('2020+12')
ValueError
date(2020, 1, 2)
date.fromisoformat('+020-06-15')
ValueError
date(20, 6, 15)
date.fromisoformat('2020-W 5')
ValueError
date(2020, 1, 27)
date.fromisoformat('202012+9')
ValueError
date(2020, 12, 9)
date.fromisoformat('2020061')
ValueError
date(2020, 6, 1)
date.fromisoformat('2020123')
ValueError
date(2020, 12, 3)
date.fromisoformat('2020-W2')
ValueError
date(2020, 1, 6)
date.fromisoformat('9999121')
ValueError
date(9999, 12, 1)
Root cause
Lib/_pydatetime.py, _parse_isoformat_date (the function's own comment notes
it "assumes an ASCII-only string of lengths 7, 8 or 10"). On current main the
function body is:
def_parse_isoformat_date(dtstr):
# It is assumed that this is an ASCII-only string of lengths 7, 8 or 10,# see the comment on Modules/_datetimemodule.c:_find_isoformat_datetime_separatoriflen(dtstr) notin (7, 8, 10): # line 361raiseValueError("Invalid isoformat string")
year=int(dtstr[0:4]) # line 363
...
weekno=int(dtstr[pos:pos+2]) # line 370 (week field)
...
dayno=int(dtstr[pos:pos+1]) # line 380 (week day field)
...
month=int(dtstr[pos:pos+2]) # line 384 (month field)
...
day=int(dtstr[pos:pos+2]) # line 390 (day field)
The if len(dtstr) not in (7, 8, 10) gate at line 361 only bounds the total
length; date.fromisoformat (the caller, lines 1059-1060) applies the same
length gate before calling in. Neither gate checks the content of the
fixed-width fields. Each field is read with int(dtstr[pos:pos+N]). int()
accepts a leading +/-/whitespace and a short string, so:
a +/-/space that lands in a month/day/week field is silently consumed
(form 1), and
on a 7-char string the day/week slice runs off the end and int() happily
parses the 1-character remainder (form 2).
Wrong side: pure-Python, which over-accepts. ISO-8601 calendar dates contain
no sign or space inside the date, and have no 1-digit month-day or 1-digit
week; date.fromisoformat's docstring promises a string "in the format emitted
by date.isoformat()". The C accelerator's parse_digits rejects any non-digit
byte and requires the exact field width, then verifies the whole string was
consumed, so C is correct here.
Suggested fix
Validate each slice in _parse_isoformat_date before converting: require that
the year/month/day/week/weekday slice is exactly N ASCII digits (mirroring the
C parse_digits). The module already has an _is_ascii_digit helper used by
the fraction path (_parse_hh_mm_ss_ff does all(map(_is_ascii_digit, tstr[pos:]))), so reusing it keeps the check
consistent, e.g. raise ValueError unless len(s) == N and all(map(_is_ascii_digit, s)) before calling int(s). That
makes the malformed basic-format strings above raise ValueError on the
pure-Python path exactly as the C accelerator does, and closes the 7-char
short-slice hole at the same time. (The length gate (7, 8, 10) can stay; the
per-field width check is what rejects '2020061', since its day slice is then a
1-char string.)
The same slice-based _parse_isoformat_date exists in the 3.14 / 3.15
branches (the pure-Python module is _pydatetime in all of them), so those
branches are affected wherever the pure-Python path is exercised (the C
accelerator masks it when present).
Bug description
_pydatetime.date.fromisoformat(the pure-Python reference used when the Caccelerator is unavailable, and via
_pydatetimedirectly) returns awrong-but-plausible
datefor several strings that are not valid ISO-8601dates. The C accelerator raises
ValueErrorfor every one of them. Because theresult is a silently incorrect date rather than an error, malformed input
becomes valid-looking data with no signal that anything went wrong.
There are two surface forms of the same underlying defect in
_parse_isoformat_date, which slices fixed-width substrings and callsint()on them without checking that each slice is exactly N ASCII digits:
(1)
int()tolerates a leading+/-/ space in a basic-format field.Here
int('+1') == 1,int(' 1') == 1andint('+9') == 9, so the month /day / week fields parse a sign or space that is not part of any ISO-8601 date.
(2) The length gate admits 7-character strings, and a fixed-width slice then
reads a 1-character tail.
'2020061'is 7 chars; the gatelen(date_string) in (7, 8, 10)lets itthrough, the month slice reads
'06'and the day slicedtstr[6:8]reads the1-character tail
'1', givingdate(2020, 6, 1).'2020-W2'reads a 1-digitweek
int('2'). The Cparse_digits(p, ..., 2)requires exactly two digits, soC rejects all of these.
datetime.fromisoformatinherits the same defect via the date branch, e.g._pydatetime.datetime.fromisoformat('2020061')returnsdatetime.datetime(2020, 6, 1, 0, 0)while the C path raises.C vs pure-Python
_datetime_pydatetimedate.fromisoformat('2020+12')ValueErrordate(2020, 1, 2)date.fromisoformat('+020-06-15')ValueErrordate(20, 6, 15)date.fromisoformat('2020-W 5')ValueErrordate(2020, 1, 27)date.fromisoformat('202012+9')ValueErrordate(2020, 12, 9)date.fromisoformat('2020061')ValueErrordate(2020, 6, 1)date.fromisoformat('2020123')ValueErrordate(2020, 12, 3)date.fromisoformat('2020-W2')ValueErrordate(2020, 1, 6)date.fromisoformat('9999121')ValueErrordate(9999, 12, 1)Root cause
Lib/_pydatetime.py,_parse_isoformat_date(the function's own comment notesit "assumes an ASCII-only string of lengths 7, 8 or 10"). On current
mainthefunction body is:
The
if len(dtstr) not in (7, 8, 10)gate at line 361 only bounds the totallength;
date.fromisoformat(the caller, lines 1059-1060) applies the samelength gate before calling in. Neither gate checks the content of the
fixed-width fields. Each field is read with
int(dtstr[pos:pos+N]).int()accepts a leading
+/-/whitespace and a short string, so:+/-/space that lands in a month/day/week field is silently consumed(form 1), and
int()happilyparses the 1-character remainder (form 2).
Wrong side: pure-Python, which over-accepts. ISO-8601 calendar dates contain
no sign or space inside the date, and have no 1-digit month-day or 1-digit
week;
date.fromisoformat's docstring promises a string "in the format emittedby
date.isoformat()". The C accelerator'sparse_digitsrejects any non-digitbyte and requires the exact field width, then verifies the whole string was
consumed, so C is correct here.
Suggested fix
Validate each slice in
_parse_isoformat_datebefore converting: require thatthe year/month/day/week/weekday slice is exactly N ASCII digits (mirroring the
C
parse_digits). The module already has an_is_ascii_digithelper used bythe fraction path (
_parse_hh_mm_ss_ffdoesall(map(_is_ascii_digit, tstr[pos:]))), so reusing it keeps the checkconsistent, e.g. raise
ValueErrorunlesslen(s) == N and all(map(_is_ascii_digit, s))before callingint(s). Thatmakes the malformed basic-format strings above raise
ValueErroron thepure-Python path exactly as the C accelerator does, and closes the 7-char
short-slice hole at the same time. (The length gate
(7, 8, 10)can stay; theper-field width check is what rejects
'2020061', since its day slice is then a1-char string.)
Environment
main,3.16.0a0.main, the length gate in_parse_isoformat_dateis already anif len(dtstr) not in (7, 8, 10): raise ValueError(datetime.fromisoformat() raises AssertionError instead of ValueError in the pure-Python implementation #152060 / PR gh-152060: Fix datetime.fromisoformat() raising AssertionError in pure Python #152061,merged, replaced the earlier
assert len(dtstr) in (7, 8, 10)so that a badlength raises
ValueErrorinstead ofAssertionError). That change onlytouched the length-gate exception type; it did not touch the
int(dtstr[...])slices, so both mis-parse forms above still reproduce onmain. (A pre-PR-gh-152060: Fix datetime.fromisoformat() raising AssertionError in pure Python #152061 checkout still carries theassert; the slicedefect is the same either way.)
_parse_isoformat_dateexists in the 3.14 / 3.15branches (the pure-Python module is
_pydatetimein all of them), so thosebranches are affected wherever the pure-Python path is exercised (the C
accelerator masks it when present).
Relation to existing issues
This is distinct from the known nearby issues:
separator", open; PR gh-107779: Check for the separator between date and time in
datetime.datetime.fromisoformat#107791) is about_find_isoformat_datetime_separatorreturning an index that is not a real separator in
datetime.fromisoformat.That is a different function and mechanism; it does not address the
int()-slice leniency in_parse_isoformat_date. Verified on this build that'2024-01-17T15:21:00-0800'(the basic/extended-mixing class) is accepted byboth implementations, i.e. not a C-vs-pure-Python divergence.
wrong-length
dtstrused to raiseAssertionError(from the oldassert len(...)); PR gh-152060: Fix datetime.fromisoformat() raising AssertionError in pure Python #152061 turned that into aValueError. That is adisjoint defect: it is about the length gate's exception type, whereas this
issue is about strings of a valid length (7/8/10) whose fixed-width fields
are mis-sliced into a silently wrong value. The per-field slices PR gh-152060: Fix datetime.fromisoformat() raising AssertionError in pure Python #152061
left untouched are exactly the ones at fault here.
Found with a differential C-vs-pure-Python
fromisoformattesting harness(AI-assisted, each case hand-verified).
Linked PRs