Skip to content

Parquet add no convert marker#7625

Open
siddarth2810 wants to merge 5 commits into
cortexproject:masterfrom
siddarth2810:parquet-add-no-convert-marker
Open

Parquet add no convert marker#7625
siddarth2810 wants to merge 5 commits into
cortexproject:masterfrom
siddarth2810:parquet-add-no-convert-marker

Conversation

@siddarth2810

@siddarth2810 siddarth2810 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What this PR does:
If a TSDB block exceeds a configurable threshold of distinct label names, the converter writes a parquet-no-convert-mark.json marker and skips the block.

  • Added no-convert marker with read/write logic
  • Added parquet-converter.max-block-label-names limit

Which issue(s) this PR fixes:
Fixes #7195

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

Notes from the Previous PR review

  • Removed the incorrect log under cortex_parquet.ValidConverterMarkVersion
  • Embedded LabelNamesCount and Threshold into Reason field
  • Added buffer for system columns and generated data columns
  • Cleaned up skipped-block metrics when deleting tenant metrics

@siddarth2810 siddarth2810 marked this pull request as ready for review June 16, 2026 12:19
Comment thread pkg/parquetconverter/converter.go Outdated
Comment on lines +410 to +413
if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) {
level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String())
c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc()
continue

@SungJin1212 SungJin1212 Jun 22, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If noConvertMark is upgraded (v1 -> v2), is it correct that the noConvertMark file is overwritten to v2?
  2. When the user increases this limit (1000 -> 2000), I think we should convert the previous unconverted blocks. (ex. 1500)
    ㄴ We can track the current applied limit in noConvertMark and utilize it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only v1 version right now. I'm not quite sure in which cases the noConvertMark would be upgraded.

My implementation is based on comparing the current limit and label count. By tracking the applied limit, do you mean comparing the limits ?

Like, if current limit > old noConvertMark limit -> Retry conversion

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the configured limit changes, I think we should re-evaluate the block against the new limit, instead of unconditionally skipping it just because a noConvertMark already exists.

Comment thread pkg/parquetconverter/converter.go Outdated
Comment on lines +412 to +415
if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) && noConvertMark.ShouldSkipBlock(maxBlockLabelNames) {
level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String())
c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc()
continue

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt?

Suggested change
if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) && noConvertMark.ShouldSkipBlock(maxBlockLabelNames) {
level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String())
c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc()
continue
if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) {
level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String())
c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc()
continue
}
if noConvertMark.ShouldSkipBlock(maxBlockLabelNames) {
level.Debug(logger).Log(
"msg", "skipping block because label count still exceeds current limit",
"block", b.ULID.String(),
"label_names_count", noConvertMark.LabelNamesCount,
"current_limit", maxBlockLabelNames,
)
c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonTooManyLabels).Inc()
continue
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change to this way, it means that if the no converter marker already exists we won't re-evaluate the no convert marker check?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I mean

if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) {
	if noConvertMark.Reason != cortex_parquet.NoConvertReasonTooManyLabels {
		level.Debug(logger).Log(
			"msg", "skipping block, no-convert marker already exists",
			"block", b.ULID.String(),
		)
		c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc()
		continue
	}

	if noConvertMark.ShouldSkipBlock(maxBlockLabelNames) {
		level.Debug(logger).Log(
			"msg", "skipping block because label count still exceeds current limit",
			"block", b.ULID.String(),
			"label_names_count", noConvertMark.LabelNamesCount,
			"current_limit", maxBlockLabelNames,
		)
		c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonTooManyLabels).Inc()
		continue
	}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, if we change this way, then every no converter marker will be skipped immediately. ShouldSkipBlock(maxBlockLabelNames) is never be checked

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I mean

if cortex_parquet.ValidNoConvertMarkVersion(noConvertMark.Version) {
	if noConvertMark.Reason != cortex_parquet.NoConvertReasonTooManyLabels {
		level.Debug(logger).Log(
			"msg", "skipping block, no-convert marker already exists",
			"block", b.ULID.String(),
		)
		c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonMarkerExists).Inc()
		continue
	}

	if noConvertMark.ShouldSkipBlock(maxBlockLabelNames) {
		level.Debug(logger).Log(
			"msg", "skipping block because label count still exceeds current limit",
			"block", b.ULID.String(),
			"label_names_count", noConvertMark.LabelNamesCount,
			"current_limit", maxBlockLabelNames,
		)
		c.metrics.skippedBlocks.WithLabelValues(userID, cortex_parquet.NoConvertReasonTooManyLabels).Inc()
		continue
	}
}

This makes sense. I made the same check here:

func (m NoConvertMark) ShouldSkipBlock(currentMaxBlockLabelNamesLimit int) bool {
	// Manual no-convert marks are not tied to the label-name limit
	if m.Reason != NoConvertReasonTooManyLabels {
		return true
	}

But, I think writing in the way you suggested make it easier to understand on the first read.

Changes:
- Add parquet no-convert marker and read/write logic
- Add max-block-label-names limit, blocks exceeding it get a no-convert marker instead of being converted.
- Add parquet_converter_max_block_label_names to exporter test
- Add integration test for parquet no-convert marker

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
The converter only read no-convert markers when the label-name limit was
enabled, so manually marked blocks were still converted when the limit was 0.
Read the marker unconditionally before conversion so these blocks stay skipped.

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Retry conversion if the current limit has increased beyond the label count stored
in the old no-convert mark.

Update converter tests for the new marker fields and retry behavior

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
- Update tests to check skip with lower current limit

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
@siddarth2810 siddarth2810 force-pushed the parquet-add-no-convert-marker branch from ee56065 to 6299415 Compare June 29, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] Stop converting TSDB block to parquet if it has too many labels

3 participants