Split-explicit acoustic substepping for low-Mach time integration#1619
Split-explicit acoustic substepping for low-Mach time integration#1619sbryngelson wants to merge 18 commits into
Conversation
…ipation) for split mode
…ort, CFL example cases
…less, CFL-stable)
…eps overflow, fix cfl_const_dt restart
… (GPU acc-routine-seq portability)
…s (hcid 183/264/304)
…e EOS, velocity, slow alpha advection)
…zen snapshot The forward sweep advanced partial densities and total energy in place while the centered flux stencil read those same neighbours, so the flux divergence mixed pre- and post-update values, did not telescope, and lost per-species mass at sharp interfaces (~0.9% drift). Snapshot the transported conserved vars (q_snap) at the start of each microstep and compute every flux divergence from that frozen state, then apply. Restores machine-precision species-mass conservation for any num_fluids. Single-fluid energy transport shifts from Gauss-Seidel to Jacobi; golden 7F8ED027 regenerated.
… acoustic_substepping)
Claude Code ReviewHead SHA: cd928ed Files changed:
Findings:
The new acoustic-substepping regression test sets: "patch_icpp(1)%pres": "1.0 + 1e-3 * exp(-((x - xc) / 0.05)**2)",
"patch_icpp(1)%pres": f"1.0 + 1e-3 * exp(-((x - {0.5}) / 0.05)**2)",or switch to |
Summary
Adds a split-explicit acoustic-substepping time integrator (
acoustic_substepping) that removes the acoustic-CFL bottleneck at low Mach number without any global solve.Explicit compressible solvers are limited by
dt ~ dx/(|u|+c). Whenc >> |u|(low Mach), that step is roughly1/Msmaller than the advective scale, so almost every step is spent resolving fast acoustics that carry little of the actual dynamics. Pressure-projection / implicit-acoustic methods remove the restriction but require a global elliptic pressure solve — global reductions and sparse solves that scale poorly on GPU clusters. This mode keeps the solver fully explicit and nearest-neighbor: the slow (advective) terms advance at the advective CFL, and the fast (acoustic) terms are subcycled with a cheap forward–backward scheme.This is the Klemp–Wicker–Skamarock split-explicit approach used in atmospheric dynamical cores, adapted to MFC's Godunov finite-volume framework (cf. Nazari & Nair, JAMES 2017).
How it works
Per SSP-RK3 stage, the governing equations are split into a slow part advanced once at the large step
dt ~ dx/|u|, and a fast part subcycled onns ≈ (|u|+c)/|u|micro-steps of sizedtau = dt/ns:The expensive WENO+Riemann work therefore runs at the advective rate, while each acoustic micro-step is a low-order stencil — that asymmetry is where the speedup comes from.
Slow flux. Rather than a new flux scheme, the slow flux reuses HLLC with two changes (gated on
acoustic_substepping): the pressure term is removed from the momentum/energy flux (it moves to the substep), and the wave-speed dissipation is capped to the contact speeds_S, so the convective flux is|u|-stable at the advective CFL instead of carrying the|u|+cdissipation that would be unstable there. MFC's existinglow_Machcorrection handles low-Mach accuracy.Acoustic substep (
m_acoustic_substep.fpp). A 2nd-order centered forward–backward update of mass/energy transport and the pressure gradient, with the slow forcing added as a frozendtau-scaled source each micro-step. Stabilized by grad–div divergence damping whose discrete operator is rank-one (-v vᵀsymbol), so it damps only compressive/acoustic modes and leaves divergence-free vortical flow untouched. The forward sweep computes flux divergences from a frozen snapshot of the conserved state and applies them afterward, which keeps species mass conservative (an in-place stencil update is not).Time step.
s_compute_dtcomputesdtfrom the advective CFL andn_substepsfrom a domain-max reduction of(|u|+c)/|u|; the only global collective is that existing timestep reduction.Multifluid. The substep uses the stiffened-gas mixture EOS and mixture velocity, and advects volume fractions on the slow step. For
num_fluids == 1the path reduces exactly to the single-fluid expressions.GPU. All device regions use the backend-agnostic
GPU_*Fypp macros, so the same source runs on OpenACC and OpenMP target offload.What's implemented
acoustic_substepping,n_acoustic_substeps,acoustic_div_dampparameters (+ checker/validator constraints).m_acoustic_substep.fpp: forward–backward subcycle kernel (EOS, grad–div damping, snapshot transport, mixture support).m_time_steppers.fpp:s_split_explicit_rkorchestration and the two-CFLs_compute_dt.m_riemann_solver_hllc.fpp: the slow-flux variant (5-equation block).m_sim_helpers.fpp: advective-CFL helper.hcid183/264/304) so benchmark/example cases sweep grid and Mach number without recompiling.Validation
Speedup vs. the standard explicit scheme to the same physical time, at M = 0.01 (flat to ~5× at M = 0.1; the win is bounded by the substep cost, not by M):
num_fluids == 1is bit-identical to the pre-feature result on the regression golden.Scope and limitations
model_eqns = 2only; first-order-in-time (operator splitting; spatial order unchanged).Draft — opening for visibility and review of the approach.