GH-3632: Fix record-level notIn evaluation by wgtmac · Pull Request #3631 · apache/parquet-java

wgtmac · 2026-06-23T08:00:33Z

Rationale for this change

The generated record-level filter for notIn can accept a row after checking only the first non-matching value in the literal set. For example, notIn(col, {1.0, 3.0}) should reject 3.0, but the current logic can accept it after seeing 3.0 != 1.0.

What changes are included in this PR?

Evaluate notIn by checking whether any literal equals the row value before deciding the result.
Add a regression test for multi-value notIn record-level filtering.

Are these changes tested?

./mvnw -q -Drat.skip=true -DskipTests -pl parquet-generator,parquet-column install
./mvnw -q -Drat.skip=true -pl parquet-hadoop -Dtest=TestRecordLevelFilters -Dsurefire.failIfNoSpecifiedTests=false test

Are there any user-facing changes?

This fixes record-level filtering for notIn predicates with multiple literal values.

Closes #3632

wgtmac · 2026-06-23T08:57:46Z

Just caught this bug while working on #3393. Could you please take a look at this? @gszadovszky @Fokko

gszadovszky · 2026-06-23T11:26:54Z

This branch still looks incorrect for notIn sets that contain null plus non-null values. For example, notIn(col, {null, 3.0}) will emit setResult(true) for any non-null value, so a row where col == 3.0 is still kept.

Fixed in the latest push. The generated evaluator now handles null membership only in updateNull(), and update(value) still scans the non-null values with the column comparator. Added a regression test for in/notIn with {null, 3.0}; notIn now rejects col == 3.0 and keeps only the non-matching non-null value.

gszadovszky

Thanks @wgtmac for fixing this!
LGTM.

wgtmac · 2026-06-23T13:44:20Z

Thank you, @gszadovszky!

wgtmac changed the title ~~Fix record-level notIn evaluation~~ GH-3632: Fix record-level notIn evaluation Jun 23, 2026

gszadovszky reviewed Jun 23, 2026

View reviewed changes

Fix record-level notIn evaluation

91862f8

wgtmac force-pushed the codex/fix-record-level-notin branch from 4fbc73e to 91862f8 Compare June 23, 2026 12:39

gszadovszky approved these changes Jun 23, 2026

View reviewed changes

wgtmac merged commit 393727e into apache:master Jun 24, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-3632: Fix record-level notIn evaluation#3631

GH-3632: Fix record-level notIn evaluation#3631
wgtmac merged 1 commit into
apache:masterfrom
wgtmac:codex/fix-record-level-notin

wgtmac commented Jun 23, 2026 •

edited

Loading

Uh oh!

wgtmac commented Jun 23, 2026

Uh oh!

gszadovszky Jun 23, 2026

Uh oh!

wgtmac Jun 23, 2026

Uh oh!

gszadovszky left a comment

Uh oh!

wgtmac commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

wgtmac commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

wgtmac commented Jun 23, 2026

Uh oh!

gszadovszky Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

wgtmac Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

gszadovszky left a comment

Choose a reason for hiding this comment

Uh oh!

wgtmac commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wgtmac commented Jun 23, 2026 •

edited

Loading