Skip to content

gh-152204: Validate date fields in pure-Python date.fromisoformat#152205

Open
tonghuaroot wants to merge 2 commits into
python:mainfrom
tonghuaroot:fix-gh-152204-fromisoformat-date-fields
Open

gh-152204: Validate date fields in pure-Python date.fromisoformat#152205
tonghuaroot wants to merge 2 commits into
python:mainfrom
tonghuaroot:fix-gh-152204-fromisoformat-date-fields

Conversation

@tonghuaroot

@tonghuaroot tonghuaroot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

_pydatetime._parse_isoformat_date reads each fixed-width field with int() on a slice, without checking that the slice is exactly N ASCII digits. int() accepts a leading +/-/whitespace and a short string, so several malformed ISO 8601 basic-format dates are silently parsed into a wrong-but-plausible date instead of raising ValueError:

>>> import _pydatetime
>>> _pydatetime.date.fromisoformat('2020+12')
datetime.date(2020, 1, 2)
>>> _pydatetime.date.fromisoformat('+020-06-15')
datetime.date(20, 6, 15)
>>> _pydatetime.date.fromisoformat('2020061')   # 7 chars: day slice reads '1'
datetime.date(2020, 6, 1)
>>> _pydatetime.date.fromisoformat('2020-W2')   # 1-digit week number
datetime.date(2020, 1, 6)

The C accelerator rejects all of these via parse_digits() (which requires the exact field width and digit-only content), so this is a C-vs-pure-Python divergence. The pure-Python path is used when the _datetime C extension is unavailable, and directly via _pydatetime.

This validates each field slice (year / month / day / weekno / weekday) to be exactly N ASCII digits before converting, mirroring the C parse_digits(), and extends datetimetester's test_fromisoformat_fails with the affected inputs (the new cases now reject on both implementations).

Fixes #152204.


Prepared with AI assistance (Claude Code) and verified by hand against a debug build, against both the C and pure-Python implementations.

The pure-Python _parse_isoformat_date read each fixed-width field with
int() on a slice, which silently accepts a leading sign or whitespace, or
a short slice that runs off the end of the string.  Malformed basic-format
inputs such as '2020+12' or '2020061' were therefore parsed into a
wrong-but-plausible date instead of raising, while the C accelerator
rejects them via parse_digits().  Validate that each field slice is exactly
N ASCII digits before converting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pure-Python date.fromisoformat silently mis-parses malformed basic-format dates

1 participant