Skip to content

ci(sdk-regression): auto-run SDK regression on PRs via run-sdk-regression label#2322

Open
pranavz28 wants to merge 14 commits into
masterfrom
PER-9772_sdk-regression-pr-gate
Open

ci(sdk-regression): auto-run SDK regression on PRs via run-sdk-regression label#2322
pranavz28 wants to merge 14 commits into
masterfrom
PER-9772_sdk-regression-pr-gate

Conversation

@pranavz28

@pranavz28 pranavz28 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

One trigger (RUN_REGRESSION comment or run-sdk-regression label) runs the whole fleet against the PR's CLI branch:

  1. regression (web SDKs, GitHub Actions) — links the CLI branch into each web SDK, runs its suite, reports per-SDK commit statuses.
  2. app-poa-regression (App Percy + POA, Buildkite) — creates builds directly via the Buildkite REST API on app-percy-sdk-regression-suite + poa-sdk-regression-suite (CLI built from this branch), waits for them to finish, then posts a per-SDK pass/fail table comment on the PR (upserts a single marker comment) and fails the job if any job failed.
comment/label → web SDKs (Actions, commit statuses)
              → App Percy + POA (Buildkite, direct trigger) → wait → PR comment table

No percy-automation repo hop — Buildkite is triggered directly.

Why

Cross-SDK regression was manual / dev-dependent / not a gate. This makes it one button across all three surfaces, with a consolidated result on the PR. Part of PER-9772.

Safety

  • Internal-only: comment path requires write/admin; label path is GitHub-gated.
  • Untrusted head.ref flows via env: and is regex-validated before any Buildkite payload / downstream use.
  • Buildkite payload built with jq --arg; PR comment body via jq --rawfile + --input (no shell interpolation of content).

Setup required before merge

  • BUILDKITE_API_TOKEN secret on this repo — scope write_builds + read_builds on the percy org. The job fails fast with a clear error if missing.
  • (built-in) GITHUB_TOKEN with issues: write / pull-requests: write — already declared in the job's permissions.

Note: this supersedes the percy-automation repository_dispatch approach (PR #19) for the auto path — #19 remains only as an optional manual workflow_dispatch entry.

Tunables

  • MAX_WAIT_MIN (default 90) — cap on how long the job waits for Buildkite; on timeout it posts a partial table rather than hanging.
  • POLL_INTERVAL (default 30s).

Scope / deferred

  • Web-SDK suites are mocked (--testing); real-build coverage is a separate ticket.
  • Web matrix is missing 5 injection-capable SDKs (detox, playwright-python, robotframework, playwright-java, playwright-dotnet) and 2 are broken (appium-js typo, nightmare) — follow-up normalization.
  • The PR table currently covers App/POA (Buildkite); web results remain commit statuses — can be folded into the same comment later.

Test plan

  • Label a test PR → web matrix runs; Buildkite app/poa builds created with CLI_BRANCH_NAME; job waits; a result-table comment appears and updates in place.
  • A failing SDK job → table shows ❌ and the GH job fails.
  • Legacy RUN_REGRESSION comment still works.

🤖 Generated with Claude Code

Make SDK regression runnable as an automatic PR check, not only via a
manual `RUN_REGRESSION` comment. Adds a `pull_request` trigger gated by
the `run-sdk-regression` label; resolves the PR head ref/sha from either
event; keeps the comment path and its write/admin permission guard intact.

Untrusted head.ref is passed via env (not interpolated into the shell) and
is still validated by the existing regex-match step before any downstream
workflow is triggered.

Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pranavz28

Copy link
Copy Markdown
Contributor Author

RUN_regression

@pranavz28

Copy link
Copy Markdown
Contributor Author

RUN_REGRESSION

pranavz28 and others added 10 commits June 29, 2026 01:08
The same regression trigger (RUN_REGRESSION comment or run-sdk-regression
label) now also fires the App Percy + POA suites, which run on Buildkite
(real BrowserStack devices/browsers). A new trigger-app-poa job
repository_dispatches to percy/percy-automation, whose workflow creates the
Buildkite builds against this CLI branch. percy-automation remains the single
owner of App/POA-on-Buildkite; this is just the trigger.

Internal-only guard (write/admin or label) and env-based, regex-validated
branch handling mirror the web job. Requires a PERCY_AUTOMATION_DISPATCH_TOKEN
secret.

Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the percy-automation repository_dispatch hop with a direct Buildkite
REST call: the app-poa-regression job creates builds on the app-percy and poa
SDK regression suites (CLI built from this branch), polls them to completion,
and upserts a per-SDK pass/fail table comment on the PR.

- Direct Buildkite trigger (BUILDKITE_API_TOKEN in this repo) — no extra repo hop.
- Waits for the builds (bounded by MAX_WAIT_MIN), then posts/edits a marker
  comment with each suite's per-job result + build links; fails the job if any
  job failed/canceled.
- Internal-only guard + env-based, regex-validated branch handling unchanged.

Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grounded against real builds of app-percy-sdk-regression-suite: the matrix
jobs are named per SDK+device (e.g. 'Python-Android [...]'), but the build
also has the bootstrap upload step ('App-Percy-SDK-tests'/'POA-SDK-tests')
and an unnamed wait job. Exclude both from the per-SDK pass/fail table and the
failure check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detox, playwright-python, robotframework, playwright-java, playwright-dotnet
all support CLI-branch injection in their test.yml but were missing from the
matrix, so a CLI change silently skipped them. Added as @main (their default
branch) since the split default ref is master.

Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emand

Lets the SDK regression matrix be triggered manually (and on the PR's own
branch) against a chosen CLI branch, without a comment/label. The Buildkite
App/POA job stays comment/label-only, so a dispatch tests the web fan-out only.

Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A regression fan-out must report every SDK's result; with default fail-fast
the first SDK failure cancels all other matrix jobs, hiding the rest.

Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
storybook has no test.yml — it uses test-storybook-vN.yml — so the fan-out
silently failed to trigger it. Dispatch test-storybook-v10.yml for storybook.

Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both now support CLI-branch injection (appium-dotnet#403, styleguidist#25).
Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds appium-python/java/wd/ruby, maestro-web/app, react-native-app,
tosca-dotnet, uipath, xcui-swift. react-native-app uses storybook-rn-ci.yml
(per-repo workflow filename). Skips puppeteer/ember (per decision) and
katalon/espresso (infeasible / needs emulator). Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Their tests assert on the PER-7348 readiness-gate contract and fail against an
ahead-of-release cli@master until they adapt + bump @percy/sdk-utils. Skip per
decision. Part of PER-9772.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pranavz28 pranavz28 marked this pull request as ready for review June 29, 2026 11:48
@pranavz28 pranavz28 requested a review from a team as a code owner June 29, 2026 11:48
pranavz28 and others added 3 commits June 29, 2026 20:11
The fan-out matrix has grown past 30 jobs (currently 35). The
`Get Current Job Log URL` step (Tiryoh/gha-jobid-action) defaults to
per_page=30, so every job on page 2 fails to find itself, resolves
job_id to null, and exits 1 *before* dispatching the SDK workflow —
producing false reds (appium-js, maestro-app, maestro-web,
selenium-ruby) that never actually ran a regression.

Set per_page=100 (jobs API max) to cover the whole matrix, and mark the
step continue-on-error since it only feeds the commit-status target_url
and must never gate the regression itself.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both were excluded because their tests assert on the PER-7348
readiness-gate contract and red against an ahead-of-release cli@master.
Re-adding them as-is: both have a workflow_dispatch trigger on their
default branch, so they dispatch and run. Expected to red on master
until they adapt to the two-call readiness contract + bump
@percy/sdk-utils; we'll fix the reds as they surface.

Matrix is now 36 SDKs (job-id lookup already paginated to per_page=100).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
percy-tosca-dotnet and percy-uipath have no test.yml — their @percy/cli
inject step lives in ci.yml (workflow_dispatch + branch input + "Set up
@percy/cli from git" cloning the injected branch are all present there).
The orchestrator was dispatching test.yml, so both 404'd at the trigger
step — previously mis-attributed to a token-access gap. Map both to
ci.yml in the workflow_file_name selector so the fan-out reaches their
real (correct) inject workflow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant