Global linear-scan register allocation for ZJIT SSA values#7
Draft
Copilot wants to merge 4 commits into
Draft
Conversation
Co-authored-by: tekknolagi <401167+tekknolagi@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Update register allocator to support global allocation
Implement global liveness-based register allocation for ZJIT
Feb 7, 2026
Owner
|
gave you more internet access. have fun |
Co-authored-by: tekknolagi <401167+tekknolagi@users.noreply.github.com>
Copilot
AI
changed the title
Implement global liveness-based register allocation for ZJIT
Global linear-scan register allocation for ZJIT
Feb 7, 2026
Co-authored-by: tekknolagi <401167+tekknolagi@users.noreply.github.com>
Copilot
AI
changed the title
Global linear-scan register allocation for ZJIT
Global linear-scan register allocation for ZJIT SSA values
Feb 7, 2026
tekknolagi
added a commit
that referenced
this pull request
Mar 18, 2026
Format differences from capstone: - movz/movn shown instead of mov alias - Immediates in hex (#0x7 vs #7) - Branch targets as relative decimal (+8 vs #0x10) - Condition on mnemonic (bne vs b.ne) - ldr/str with explicit #0 offset instead of ldur/stur - 32-bit ops use mnemonic suffix (addw) instead of w-prefix registers - Embedded data bytes show as "unknown" instead of fake instructions
tekknolagi
added a commit
that referenced
this pull request
Mar 18, 2026
- Small values (< 10) use decimal: #7, #0 - Larger values use hex: #0x20, #0x1000 - Signed negatives: #-8, #-0x10 - Branch conditions use b.cond format: b.ne, b.eq - Branch targets as absolute hex: #0x400 - Memory offsets use same decimal/hex convention - movk shift uses comma separator: , lsl Shopify#16 - All immediates have # prefix
tekknolagi
pushed a commit
that referenced
this pull request
Mar 31, 2026
Move compilation steps from the heaviest jobs to the lightest to reduce the critical path of the Compilations workflow. Before: jobs ranged from 13-41 min (compile#12 had 4 steps, compile#3 had 10 clang versions). After: jobs range from 7-9 steps each (excluding compile#1 which has the LTO build), bringing the estimated critical path from ~41 min to ~30 min. Moves: - clang 23, 22, 21 from #3 to Shopify#12 and Shopify#10 - GCC 8, 7 from #2 to Shopify#12 - `OPT_THREADED_CODE=1`, `OPT_THREADED_CODE=2` from #7 to Shopify#10
tekknolagi
pushed a commit
that referenced
this pull request
Jun 25, 2026
…ruby#17479) When we introduced the inliner we also added repeated passes of the optimization pipeline. The idea being that we want to optimize the results of inlining and, because we only inline one level deep, allow us to perform inlining on the result of the last inlining operation. The optimization loop would exit if we couldn't inline any more. If we could inline more, there's an upper bound that kicks us out of the loop so we don't try to inline the world. However, if we exited the loop by hitting that upper bound, we didn't end up specializing the results of the last inlining pass. This PR rectifies that. This is immediately visible in the 30k_methods benchmark, where performance roughly doubles. Before: ``` ❯ WARMUP_ITRS=0 MIN_BENCH_ITRS=10 MIN_BENCH_TIME=0 ./run_benchmarks.rb --chruby 'ruby-master --zjit-inline-threshold=30' 30k_methods Running benchmark "30k_methods" (1/1) + /Users/nirvdrum/.rubies/ruby-master/bin/ruby --zjit-inline-threshold\=30 -I harness /Users/nirvdrum/dev/worktrees/ruby-bench/main/benchmarks/30k_methods.rb ruby 4.1.0dev (2026-06-23T13:29:36Z master 13fe77d) +ZJIT dev +PRISM [arm64-darwin25] itr: time #1: 2689ms #2: 33ms #3: 32ms #4: 32ms #5: 32ms #6: 32ms #7: 32ms Shopify#8: 35ms Shopify#9: 33ms Shopify#10: 33ms ``` After: ``` ❯ WARMUP_ITRS=0 MIN_BENCH_ITRS=10 MIN_BENCH_TIME=0 ./run_benchmarks.rb --chruby 'ruby-zjit-opt-last-inline --zjit-inline-threshold=30' 30k_methods Running benchmark "30k_methods" (1/1) + /Users/nirvdrum/.rubies/ruby-zjit-opt-last-inline/bin/ruby --zjit-inline-threshold\=30 -I harness /Users/nirvdrum/dev/worktrees/ruby-bench/main/benchmarks/30k_methods.rb ruby 4.1.0dev (2026-06-25T13:56:41Z zjit-opt-last-inline 18ce64d) +ZJIT dev +PRISM [arm64-darwin25] itr: time #1: 2700ms #2: 17ms #3: 16ms #4: 16ms #5: 17ms #6: 16ms #7: 16ms Shopify#8: 17ms Shopify#9: 16ms Shopify#10: 16ms ``` Fixes Shopify#998.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ZJIT’s register allocator was block-local, preventing SSA values from spanning dominating blocks despite a CFG. This change makes allocation global with CFG liveness/intervals and preserves existing split/out-of-SSA passes.
Global liveness + intervals
BitSetwith union/subtract/iteration utilities to support the analysis.Out-of-SSA parameter handling
ParallelMovexpansion.Validation + tests
validate()while keeping the helper for tests.💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.