pgn: accept tag pairs with leading whitespace (fixes #1115)#1195
Open
gaoflow wants to merge 1 commit into
Open
pgn: accept tag pairs with leading whitespace (fixes #1115)#1195gaoflow wants to merge 1 commit into
gaoflow wants to merge 1 commit into
Conversation
The PGN standard specifies that whitespace is not significant.
`read_game()` was checking `line.startswith("[")` and matching
`TAG_REGEX` against the raw line, both of which fail when a tag pair
is preceded by horizontal whitespace (e.g. a file that begins with
" [Event ...]").
Strip leading whitespace before the `startswith` guard and the regex
match so that indented tag pairs are recognised correctly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
read_game()(and thereforeread_headers()) failed to parse gameheaders that are preceded by horizontal whitespace, e.g. a PGN string
that begins with
" [Event ...]".The PGN standard (section 2.1) specifies that white space is not
significant, so a tag pair indented with spaces or tabs should be
accepted.
Root cause
Two checks in the header-parsing loop used the raw
linestring:if not line.startswith("["):— breaks out of the header loop for anyline that doesn't begin immediately with
[.TAG_REGEX.match(line)— the regex is anchored at^and will notmatch a line with leading whitespace.
Together these caused the parser to treat an indented first tag as the
start of the movetext section, so no headers were ever parsed and the
game was returned with default header values.
Fix
Strip leading whitespace into a local
strippedvariable before thestartswithguard and the regex match. The originallinevalue ispreserved for the movetext tokeniser that runs after the header loop.
Reproducer (from #1115)
This pull request was prepared with the assistance of AI, under my direction and review.