Skip to content

Resolve Parquet shard count via bucket index to optimize storage calls#7648

Open
SungJin1212 wants to merge 2 commits into
cortexproject:masterfrom
SungJin1212:parquet-shard-count-from-bucket-index
Open

Resolve Parquet shard count via bucket index to optimize storage calls#7648
SungJin1212 wants to merge 2 commits into
cortexproject:masterfrom
SungJin1212:parquet-shard-count-from-bucket-index

Conversation

@SungJin1212

Copy link
Copy Markdown
Member

What this PR does:
This PR updates the Parquet shard resolution logic to utilize the bucket index, reducing the number of object storage calls.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

shardCounts := make(map[string]int, len(blockIDs))

if p.bucketIndexEnabled {
idx, err := bucketindex.ReadIndex(ctx, p.indexBucket, p.userID, p.limits, p.logger)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cached with some TTL instead? Or we would rather have a separate goroutine to sync bucket index periodically rather than resolving it at query time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use bucketindex.Loader it has built-in caching

@SungJin1212 SungJin1212 force-pushed the parquet-shard-count-from-bucket-index branch from 0fdab66 to 641b69d Compare June 29, 2026 00:54
…e calls

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
@SungJin1212 SungJin1212 force-pushed the parquet-shard-count-from-bucket-index branch from 6656f19 to 2f7ecc9 Compare June 29, 2026 02:21
@SungJin1212

Copy link
Copy Markdown
Member Author

@yeya24
I adapted bucketindex.Loader to the parquet store gateway.

  • In InitialSync, I start the indexLoader.
  • Its metrics are tagged with component="store-gateway" so they don't collide with the querier's loader in single-binary mode.

@yeya24

yeya24 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

I wonder how this bucket index integration works with the Store gateway hybrid mode mentioned in #7140.

Do we imagine the hybrid mode to be as part of TSDB bucket store, or Parquet bucket store or a new federated store.

@SungJin1212

SungJin1212 commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

@yeya24
I haven't thought about hybrid mode deeply yet.. but my preference would be a new federated store that holds both the TSDB and parquet bucket stores and splits blocks using the bucket index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants