Skip to content

zoneinfo: pure-Python parser accepts a POSIX TZ footer with no STD offset (C rejects) #152212

Description

@tonghuaroot

Bug description

A POSIX TZ string requires an offset after the std abbreviation (e.g. EST5). When the
offset is missing and the std field is just an abbreviation (AAA, A, AA, B, ...),
the two zoneinfo implementations disagree:

  • The C accelerator (_zoneinfo) raises ValueError: Invalid STD offset.
  • The pure-Python parser (zoneinfo._zoneinfo) silently accepts it and builds a
    fixed-offset-0 zone named after the abbreviation.

Reproduced by embedding the footer in a minimal TZif v2 file and loading it through
ZoneInfo.from_file against both implementations:

>>> import io, struct, datetime as dt
>>> import _zoneinfo                 # C accelerator
>>> import zoneinfo._zoneinfo        # pure-Python reference
>>>
>>> def tzif(footer):               # minimal TZif v2: 0 transitions, 1 ttinfo (UTC), footer
...     def block():
...         return (b"TZif" + b"\x32" + b"\x00" * 15
...                 + struct.pack(">6l", 0, 0, 0, 0, 1, 4)
...                 + struct.pack(">lbb", 0, 0, 0) + b"UTC\x00")
...     return block() + block() + b"\n" + footer.encode() + b"\n"
...
>>> # C accelerator: rejects the offset-less std field
>>> _zoneinfo.ZoneInfo.from_file(io.BytesIO(tzif("AAA")), key="AAA")
Traceback (most recent call last):
  ...
ValueError: Invalid STD offset in b'AAA'
>>>
>>> # pure-Python: accepts it as a fixed offset-0 zone
>>> zi = zoneinfo._zoneinfo.ZoneInfo.from_file(io.BytesIO(tzif("AAA")), key="AAA")
>>> zi.utcoffset(dt.datetime(2025, 1, 15, 12))
datetime.timedelta(0)
>>> zi.tzname(dt.datetime(2025, 1, 15, 12))
'AAA'

The same divergence occurs for A, AA, B, and any other bare std abbreviation.

Root cause

Lib/zoneinfo/_zoneinfo.py, in _parse_tz_str, the std-offset branch (around L669-675 on
main):

    if std_offset := m.group("stdoff"):
        try:
            std_offset = _parse_tz_delta(std_offset)
        except ValueError as e:
            raise ValueError(f"Invalid STD offset in {tz_str}") from e
    else:
        std_offset = 0          # <-- treats a missing std offset as 0

When the regex captures a std abbreviation but no stdoff group, the else branch
defaults the offset to 0 instead of raising. The C accelerator has no such default and
raises Invalid STD offset when the std offset is absent.

Spec

POSIX.1-2024 (Issue 8), §8.3 "Other Environment Variables" (the TZ rule format) gives
the std field as std offset and states that the offset following std shall be
required
; only the dst offset is optional (DST then defaults to one hour ahead of std).
RFC 8536 §3.3 ("TZif Footer"), which governs the embedded TZ-string footer of a TZif file,
specifies that the footer uses the POSIX TZ grammar from Base Definitions §8.3; its
extensions (§3.3.1) only widen the offset hour range and add year-round-DST syntax; none
relax the required std offset. So a footer like AAA is not a valid POSIX TZ string, and
the C accelerator's rejection is the correct behavior. (For reference, macOS libc likewise
treats TZ=AAA as invalid and falls back to UTC: tzname=('UTC', 'UTC'), offset 0.)

Suggested fix

Make the pure-Python parser reject a missing std offset, matching the C accelerator. The
else branch becomes a raise with the same message wording the accelerator uses:

    else:
        raise ValueError(f"Invalid STD offset in {tz_str}")

This is non-breaking for real data: across the full IANA database (598 zones loaded
through the pure parser) no zone is newly rejected, and well-formed strings such as EST5,
<ABC>5, and AAA5 continue to parse identically on both implementations. A PR with the
one-line fix and a regression test (covering AAA, A, AA, B in test_invalid_tzstr,
which runs against both TZStrTest and CTZStrTest) follows.

Environment

  • Reproduced on main (3.16.0a0).
  • The pure-Python parser is used whenever the _zoneinfo C extension is unavailable, and
    is also reachable directly via zoneinfo._zoneinfo.
  • The same else: std_offset = 0 code is present on 3.13, 3.14, and 3.15, which are
    likewise affected.

This is a correctness/parity issue, not a security issue.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions