ci(sdk-regression): auto-run SDK regression on PRs via run-sdk-regression label#2322
Open
pranavz28 wants to merge 14 commits into
Open
ci(sdk-regression): auto-run SDK regression on PRs via run-sdk-regression label#2322pranavz28 wants to merge 14 commits into
run-sdk-regression label#2322pranavz28 wants to merge 14 commits into
Conversation
Make SDK regression runnable as an automatic PR check, not only via a manual `RUN_REGRESSION` comment. Adds a `pull_request` trigger gated by the `run-sdk-regression` label; resolves the PR head ref/sha from either event; keeps the comment path and its write/admin permission guard intact. Untrusted head.ref is passed via env (not interpolated into the shell) and is still validated by the existing regex-match step before any downstream workflow is triggered. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
RUN_regression |
Contributor
Author
|
RUN_REGRESSION |
The same regression trigger (RUN_REGRESSION comment or run-sdk-regression label) now also fires the App Percy + POA suites, which run on Buildkite (real BrowserStack devices/browsers). A new trigger-app-poa job repository_dispatches to percy/percy-automation, whose workflow creates the Buildkite builds against this CLI branch. percy-automation remains the single owner of App/POA-on-Buildkite; this is just the trigger. Internal-only guard (write/admin or label) and env-based, regex-validated branch handling mirror the web job. Requires a PERCY_AUTOMATION_DISPATCH_TOKEN secret. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the percy-automation repository_dispatch hop with a direct Buildkite REST call: the app-poa-regression job creates builds on the app-percy and poa SDK regression suites (CLI built from this branch), polls them to completion, and upserts a per-SDK pass/fail table comment on the PR. - Direct Buildkite trigger (BUILDKITE_API_TOKEN in this repo) — no extra repo hop. - Waits for the builds (bounded by MAX_WAIT_MIN), then posts/edits a marker comment with each suite's per-job result + build links; fails the job if any job failed/canceled. - Internal-only guard + env-based, regex-validated branch handling unchanged. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grounded against real builds of app-percy-sdk-regression-suite: the matrix
jobs are named per SDK+device (e.g. 'Python-Android [...]'), but the build
also has the bootstrap upload step ('App-Percy-SDK-tests'/'POA-SDK-tests')
and an unnamed wait job. Exclude both from the per-SDK pass/fail table and the
failure check.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detox, playwright-python, robotframework, playwright-java, playwright-dotnet all support CLI-branch injection in their test.yml but were missing from the matrix, so a CLI change silently skipped them. Added as @main (their default branch) since the split default ref is master. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emand Lets the SDK regression matrix be triggered manually (and on the PR's own branch) against a chosen CLI branch, without a comment/label. The Buildkite App/POA job stays comment/label-only, so a dispatch tests the web fan-out only. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A regression fan-out must report every SDK's result; with default fail-fast the first SDK failure cancels all other matrix jobs, hiding the rest. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
storybook has no test.yml — it uses test-storybook-vN.yml — so the fan-out silently failed to trigger it. Dispatch test-storybook-v10.yml for storybook. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both now support CLI-branch injection (appium-dotnet#403, styleguidist#25). Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds appium-python/java/wd/ruby, maestro-web/app, react-native-app, tosca-dotnet, uipath, xcui-swift. react-native-app uses storybook-rn-ci.yml (per-repo workflow filename). Skips puppeteer/ember (per decision) and katalon/espresso (infeasible / needs emulator). Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Their tests assert on the PER-7348 readiness-gate contract and fail against an ahead-of-release cli@master until they adapt + bump @percy/sdk-utils. Skip per decision. Part of PER-9772. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The fan-out matrix has grown past 30 jobs (currently 35). The `Get Current Job Log URL` step (Tiryoh/gha-jobid-action) defaults to per_page=30, so every job on page 2 fails to find itself, resolves job_id to null, and exits 1 *before* dispatching the SDK workflow — producing false reds (appium-js, maestro-app, maestro-web, selenium-ruby) that never actually ran a regression. Set per_page=100 (jobs API max) to cover the whole matrix, and mark the step continue-on-error since it only feeds the commit-status target_url and must never gate the regression itself. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both were excluded because their tests assert on the PER-7348 readiness-gate contract and red against an ahead-of-release cli@master. Re-adding them as-is: both have a workflow_dispatch trigger on their default branch, so they dispatch and run. Expected to red on master until they adapt to the two-call readiness contract + bump @percy/sdk-utils; we'll fix the reds as they surface. Matrix is now 36 SDKs (job-id lookup already paginated to per_page=100). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
percy-tosca-dotnet and percy-uipath have no test.yml — their @percy/cli inject step lives in ci.yml (workflow_dispatch + branch input + "Set up @percy/cli from git" cloning the injected branch are all present there). The orchestrator was dispatching test.yml, so both 404'd at the trigger step — previously mis-attributed to a token-access gap. Map both to ci.yml in the workflow_file_name selector so the fan-out reaches their real (correct) inject workflow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
One trigger (
RUN_REGRESSIONcomment orrun-sdk-regressionlabel) runs the whole fleet against the PR's CLI branch:regression(web SDKs, GitHub Actions) — links the CLI branch into each web SDK, runs its suite, reports per-SDK commit statuses.app-poa-regression(App Percy + POA, Buildkite) — creates builds directly via the Buildkite REST API onapp-percy-sdk-regression-suite+poa-sdk-regression-suite(CLI built from this branch), waits for them to finish, then posts a per-SDK pass/fail table comment on the PR (upserts a single marker comment) and fails the job if any job failed.No percy-automation repo hop — Buildkite is triggered directly.
Why
Cross-SDK regression was manual / dev-dependent / not a gate. This makes it one button across all three surfaces, with a consolidated result on the PR. Part of PER-9772.
Safety
head.refflows viaenv:and is regex-validated before any Buildkite payload / downstream use.jq --arg; PR comment body viajq --rawfile+--input(no shell interpolation of content).Setup required before merge
BUILDKITE_API_TOKENsecret on this repo — scopewrite_builds+read_buildson thepercyorg. The job fails fast with a clear error if missing.GITHUB_TOKENwithissues: write/pull-requests: write— already declared in the job'spermissions.Tunables
MAX_WAIT_MIN(default 90) — cap on how long the job waits for Buildkite; on timeout it posts a partial table rather than hanging.POLL_INTERVAL(default 30s).Scope / deferred
--testing); real-build coverage is a separate ticket.Test plan
CLI_BRANCH_NAME; job waits; a result-table comment appears and updates in place.RUN_REGRESSIONcomment still works.🤖 Generated with Claude Code