Skip to content

pgn: accept tag pairs with leading whitespace (fixes #1115)#1195

Open
gaoflow wants to merge 1 commit into
niklasf:masterfrom
gaoflow:fix/1115-pgn-whitespace-headers
Open

pgn: accept tag pairs with leading whitespace (fixes #1115)#1195
gaoflow wants to merge 1 commit into
niklasf:masterfrom
gaoflow:fix/1115-pgn-whitespace-headers

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 25, 2026

Copy link
Copy Markdown

Summary

read_game() (and therefore read_headers()) failed to parse game
headers that are preceded by horizontal whitespace, e.g. a PGN string
that begins with " [Event ...]".

The PGN standard (section 2.1) specifies that white space is not
significant, so a tag pair indented with spaces or tabs should be
accepted.

Root cause

Two checks in the header-parsing loop used the raw line string:

  1. if not line.startswith("["): — breaks out of the header loop for any
    line that doesn't begin immediately with [.
  2. TAG_REGEX.match(line) — the regex is anchored at ^ and will not
    match a line with leading whitespace.

Together these caused the parser to treat an indented first tag as the
start of the movetext section, so no headers were ever parsed and the
game was returned with default header values.

Fix

Strip leading whitespace into a local stripped variable before the
startswith guard and the regex match. The original line value is
preserved for the movetext tokeniser that runs after the header loop.

Reproducer (from #1115)

import chess.pgn, io

pgn = io.StringIO(' [Event "Test"]\n[White "Alice"]\n[Black "Bob"]\n[Result "*"]\n\n1. e4 e5 *')
game = chess.pgn.read_game(pgn)
assert game.headers["Event"] == "Test"   # previously returned "?"
assert game.headers["White"] == "Alice"  # previously returned "?"

This pull request was prepared with the assistance of AI, under my direction and review.

The PGN standard specifies that whitespace is not significant.
`read_game()` was checking `line.startswith("[")` and matching
`TAG_REGEX` against the raw line, both of which fail when a tag pair
is preceded by horizontal whitespace (e.g. a file that begins with
" [Event ...]").

Strip leading whitespace before the `startswith` guard and the regex
match so that indented tag pairs are recognised correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant