Add mtree vs table-diff performance benchmark (not run in CI) by mason-sharp · Pull Request #127 · pgEdge/ace

mason-sharp · 2026-06-25T18:44:52Z

Add TestMtreeVsTableDiffPerformance, an integration benchmark comparing the wall-clock cost of mtree table-diff (DiffMtree) against brute-force table-diff on a 1M-row table as divergence grows. It validates that the concurrent CDC drain and prompt LSN-based stop keep mtree competitive on large tables.

The test builds a merkle tree on synced data, then runs three rounds of deterministic, evenly-scattered divergence (0.01% / 0.1% / 1%, selected via (index * 2654435761) % 1000000 < threshold). For each it times both diff paths and logs the speedup. Only the 0.01% round asserts mtree is faster; 0.1% and 1% are informational, since scattered divergence saturates most leaf blocks at higher rates.

This test is intentionally NOT wired into .github/workflows/test.yml: it is a benchmark whose pass/fail at low divergence rides on a narrow timing margin (~1.15x), which would be flaky on shared CI runners. Run it on demand with:

go test -count=1 -v ./tests/integration -run 'TestMtreeVsTableDiffPerformance$'

It is gated by testing.Short() so it never runs in short-mode passes.

Add TestMtreeVsTableDiffPerformance, an integration benchmark comparing the wall-clock cost of mtree table-diff (DiffMtree) against brute-force table-diff on a 1M-row table as divergence grows. It validates that the concurrent CDC drain and prompt LSN-based stop keep mtree competitive on large tables. The test builds a merkle tree on synced data, then runs three rounds of deterministic, evenly-scattered divergence (0.01% / 0.1% / 1%, selected via (index * 2654435761) % 1000000 < threshold). For each it times both diff paths and logs the speedup. Only the 0.01% round asserts mtree is faster; 0.1% and 1% are informational, since scattered divergence saturates most leaf blocks at higher rates. This test is intentionally NOT wired into .github/workflows/test.yml: it is a benchmark whose pass/fail at low divergence rides on a narrow timing margin (~1.15x), which would be flaky on shared CI runners. Run it on demand with: go test -count=1 -v ./tests/integration -run 'TestMtreeVsTableDiffPerformance$' It is gated by testing.Short() so it never runs in short-mode passes.

coderabbitai · 2026-06-25T18:45:06Z

📝 Walkthrough

Walkthrough

Added TestMtreeVsTableDiffPerformance, which builds a baseline mtree on synced node data, injects deterministic node2-only divergence at three thresholds, and measures DiffMtree() against brute-force table diff while checking detected row counts.

Changes

Mtree diff performance test

Layer / File(s)	Summary
Harness setup and baseline mtree `tests/integration/mtree_perf_test.go`	Adds the test entrypoint, shared performance constants, node/table setup, baseline data population, one-time mtree build timing, and cleanup for the mtree, table, and generated diff artifacts.
Comparison helpers `tests/integration/mtree_perf_test.go`	Adds helper logic for canonical node-pair lookup, divergent row counting, deterministic node2-only updates under repair mode, and timing `DiffMtree()` against a fresh table-diff task.
Threshold loop and assertions `tests/integration/mtree_perf_test.go`	Runs the 0.01%, 0.1%, and 1% divergence cases, compares both diff paths after each update, asserts divergence detection, and logs completion.

Poem

I hopped through rows with timing bright,
The mtree flashed through data light.
The table-diff went thump, thump, thud,
Yet both found splashes in the mud.
🐇 Baseline set; the burrow sings.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the new mtree vs table-diff performance benchmark and its CI exclusion.
Description check	✅ Passed	The description is detailed and directly matches the benchmark test, timing behavior, and CI omission in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ace-192-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codacy-production · 2026-06-25T18:49:23Z

Not up to standards ⛔

🔴 Issues 3 critical · 2 medium

Alerts:
⚠ 3 issues (≤ 0 issues of at least critical severity)
⚠ 3 issues (≤ 0 issues of at least minor severity)

Results:
5 new issues

Category Results

Security 3 critical (3 false positives)

Complexity 2 medium

View in Codacy

🟢 Metrics 19 complexity · 2 duplication

Metric Results

Complexity 19

Duplication 2

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/integration/mtree_perf_test.go (1)
65-76: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Tie scatterMod directly to targetRows.

The comments say scatterMod must match targetRows, but the constants can drift independently and silently invalidate the deterministic row-count assumption.
Proposed change
-		scatterMod  = 1000000
+		scatterMod  = targetRows
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/mtree_perf_test.go` around lines 65 - 76, The deterministic
row-selection setup in the performance test can drift because scatterMod is
hardcoded separately from targetRows. Update the constants in mtree_perf_test.go
so scatterMod is derived directly from targetRows in the same const block, and
keep the existing scatterMult/scatterMod comment aligned with targetRows to
preserve the exact row-count assumption.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/mtree_perf_test.go`:
- Around line 222-224: The current assertions in runComparison only check that
mtree and brute-force detect some divergence, but they do not verify the
expected number of differing rows; update the test to assert the exact expected
diff counts for both mc and bc in mtree_perf_test.go using runComparison and the
r.label-driven predicate, so the test fails if either implementation misses most
divergent rows.
- Around line 120-123: The cleanup in the mtree performance test is too broad
and can delete unrelated diff reports. Update the cleanup logic in the test
around the filepath.Glob and os.Remove loop to only target diff files created by
this schema/table, or better, use the exact report paths returned by the diff
writer functions so the test only removes its own artifacts.
- Around line 88-111: Register cleanup before any setup that can fail in
mtree_perf_test: the current cleanup for mtreeTask is added only after table
creation, inserts, RunChecks, and MtreeInit, so failures can leave
customers_perf_test or partial merkle-tree state behind. Move the MtreeTeardown
registration to immediately after newMerkleTreeTask creates mtreeTask, before
calling RunChecks or MtreeInit, so the task is always cleaned up even if setup
aborts early.

---

Nitpick comments:
In `@tests/integration/mtree_perf_test.go`:
- Around line 65-76: The deterministic row-selection setup in the performance
test can drift because scatterMod is hardcoded separately from targetRows.
Update the constants in mtree_perf_test.go so scatterMod is derived directly
from targetRows in the same const block, and keep the existing
scatterMult/scatterMod comment aligned with targetRows to preserve the exact
row-count assumption.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e930132-d08f-42be-8d2e-e393e274a34b

📥 Commits

Reviewing files that changed from the base of the PR and between 5fb45b6 and 49735ff.

📒 Files selected for processing (1)

tests/integration/mtree_perf_test.go

coderabbitai · 2026-06-25T18:50:20Z

+	for i, pool := range []*pgxpool.Pool{env.N1Pool, env.N2Pool} {
+		nodeName := pgCluster.ClusterNodes[i]["Name"].(string)
+		if err := createTestTable(ctx, pool, testSchema, tableName); err != nil {
+			t.Fatalf("failed to create table on %s: %v", nodeName, err)
+		}
+		_, err := pool.Exec(ctx, fmt.Sprintf(
+			`INSERT INTO "%s"."%s" SELECT * FROM "%s"."customers_1M" WHERE index <= %d`,
+			testSchema, tableName, testSchema, targetRows,
+		))
+		if err != nil {
+			t.Fatalf("failed to populate table on %s: %v", nodeName, err)
+		}
+		log.Printf("Populated %s with %d rows on %s", qualifiedTableName, targetRows, nodeName)
+	}
+
+	// --- Build the merkle tree on the synced data (one-time setup) ----------
+	mtreeTask := env.newMerkleTreeTask(t, qualifiedTableName, nodes)
+	mtreeTask.BlockSize = blockSize
+	mtreeTask.MaxCpuRatio = 1.0 // parallelise build/recompute across CPUs
+
+	require.NoError(t, mtreeTask.RunChecks(false), "mtree RunChecks should succeed")
+	require.NoError(t, mtreeTask.MtreeInit(), "MtreeInit should succeed")
+
+	t.Cleanup(func() {


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Register cleanup before setup can fail.

Cleanup is registered only after table creation, bulk inserts, RunChecks, and MtreeInit. Any earlier failure can leave customers_perf_test or partial mtree state behind and poison later on-demand runs.

Suggested structure

tableName := "customers_perf_test" qualifiedTableName := fmt.Sprintf("%s.%s", testSchema, tableName) nodes := []string{env.ServiceN1, env.ServiceN2} + + t.Cleanup(func() { + for _, pool := range []*pgxpool.Pool{env.N1Pool, env.N2Pool} { + if _, err := pool.Exec(ctx, fmt.Sprintf(`DROP TABLE IF EXISTS "%s"."%s" CASCADE`, testSchema, tableName)); err != nil { + t.Logf("cleanup: failed to drop table: %v", err) + } + } + }) // --- Seed both nodes with identical data -------------------------------

Then register MtreeTeardown immediately after mtreeTask is created, before RunChecks/MtreeInit.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/mtree_perf_test.go` around lines 88 - 111, Register cleanup before any setup that can fail in mtree_perf_test: the current cleanup for mtreeTask is added only after table creation, inserts, RunChecks, and MtreeInit, so failures can leave customers_perf_test or partial merkle-tree state behind. Move the MtreeTeardown registration to immediately after newMerkleTreeTask creates mtreeTask, before calling RunChecks or MtreeInit, so the task is always cleaned up even if setup aborts early.

coderabbitai · 2026-06-25T18:50:21Z

+		files, _ := filepath.Glob("*_diffs-*.json")
+		for _, f := range files {
+			os.Remove(f)
+		}


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Scope diff-file cleanup to this test.

*_diffs-*.json can remove unrelated diff reports from the working directory. Limit the glob to this schema/table, or track the exact report paths returned by the diff writers.

Example scoped glob

- files, _ := filepath.Glob("*_diffs-*.json") + files, _ := filepath.Glob(fmt.Sprintf("*%s*%s*_diffs-*.json", testSchema, tableName))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

files, _ := filepath.Glob("*_diffs-*.json")

for _, f := range files {

os.Remove(f)

}

files, _ := filepath.Glob(fmt.Sprintf("*%s*%s*_diffs-*.json", testSchema, tableName))

for _, f := range files {

os.Remove(f)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/mtree_perf_test.go` around lines 120 - 123, The cleanup in the mtree performance test is too broad and can delete unrelated diff reports. Update the cleanup logic in the test around the filepath.Glob and os.Remove loop to only target diff files created by this schema/table, or better, use the exact report paths returned by the diff writer functions so the test only removes its own artifacts.

coderabbitai · 2026-06-25T18:50:21Z

+		m, b, mc, bc := runComparison(r.label)
+		require.Greater(t, mc, 0, "[%s] mtree should detect the divergence", r.label)
+		require.Greater(t, bc, 0, "[%s] brute-force should detect the divergence", r.label)


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Assert exact diff counts, not just non-zero detection.

A diff path could miss most divergent rows and still pass. Since the predicate is deterministic for a 1M-row table, assert the expected threshold for both implementations.

Proposed assertion tightening

- require.Greater(t, mc, 0, "[%s] mtree should detect the divergence", r.label) - require.Greater(t, bc, 0, "[%s] brute-force should detect the divergence", r.label) + require.Equal(t, r.threshold, mc, "[%s] mtree should detect every divergent row", r.label) + require.Equal(t, r.threshold, bc, "[%s] brute-force should detect every divergent row", r.label)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

m, b, mc, bc := runComparison(r.label)

require.Greater(t, mc, 0, "[%s] mtree should detect the divergence", r.label)

require.Greater(t, bc, 0, "[%s] brute-force should detect the divergence", r.label)

m, b, mc, bc := runComparison(r.label)

require.Equal(t, r.threshold, mc, "[%s] mtree should detect every divergent row", r.label)

require.Equal(t, r.threshold, bc, "[%s] brute-force should detect every divergent row", r.label)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/mtree_perf_test.go` around lines 222 - 224, The current assertions in runComparison only check that mtree and brute-force detect some divergence, but they do not verify the expected number of differing rows; update the test to assert the exact expected diff counts for both mc and bc in mtree_perf_test.go using runComparison and the r.label-driven predicate, so the test fails if either implementation misses most divergent rows.

mason-sharp requested a review from danolivo June 25, 2026 18:45

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

danolivo approved these changes Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mtree vs table-diff performance benchmark (not run in CI)#127

Add mtree vs table-diff performance benchmark (not run in CI)#127
mason-sharp wants to merge 1 commit into
mainfrom
ace-192-test

mason-sharp commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Walkthrough

Changes

Poem

Uh oh!

codacy-production Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Uh oh!

coderabbitai Bot Jun 25, 2026

Uh oh!

coderabbitai Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mason-sharp commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Uh oh!

codacy-production Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

codacy-production Bot commented Jun 25, 2026 •

edited

Loading