Skip to content

Split-explicit acoustic substepping for low-Mach time integration#1619

Draft
sbryngelson wants to merge 18 commits into
MFlowCode:masterfrom
sbryngelson:lomach-upstream
Draft

Split-explicit acoustic substepping for low-Mach time integration#1619
sbryngelson wants to merge 18 commits into
MFlowCode:masterfrom
sbryngelson:lomach-upstream

Conversation

@sbryngelson

Copy link
Copy Markdown
Member

Summary

Adds a split-explicit acoustic-substepping time integrator (acoustic_substepping) that removes the acoustic-CFL bottleneck at low Mach number without any global solve.

Explicit compressible solvers are limited by dt ~ dx/(|u|+c). When c >> |u| (low Mach), that step is roughly 1/M smaller than the advective scale, so almost every step is spent resolving fast acoustics that carry little of the actual dynamics. Pressure-projection / implicit-acoustic methods remove the restriction but require a global elliptic pressure solve — global reductions and sparse solves that scale poorly on GPU clusters. This mode keeps the solver fully explicit and nearest-neighbor: the slow (advective) terms advance at the advective CFL, and the fast (acoustic) terms are subcycled with a cheap forward–backward scheme.

This is the Klemp–Wicker–Skamarock split-explicit approach used in atmospheric dynamical cores, adapted to MFC's Godunov finite-volume framework (cf. Nazari & Nair, JAMES 2017).

How it works

Per SSP-RK3 stage, the governing equations are split into a slow part advanced once at the large step dt ~ dx/|u|, and a fast part subcycled on ns ≈ (|u|+c)/|u| micro-steps of size dtau = dt/ns:

  • Slow (one expensive evaluation per stage, frozen during the subcycle): momentum advection, volume-fraction advection, viscous/source terms.
  • Fast (cheap, subcycled): mass transport, energy transport, and the pressure gradient in the momentum equation.

The expensive WENO+Riemann work therefore runs at the advective rate, while each acoustic micro-step is a low-order stencil — that asymmetry is where the speedup comes from.

Slow flux. Rather than a new flux scheme, the slow flux reuses HLLC with two changes (gated on acoustic_substepping): the pressure term is removed from the momentum/energy flux (it moves to the substep), and the wave-speed dissipation is capped to the contact speed s_S, so the convective flux is |u|-stable at the advective CFL instead of carrying the |u|+c dissipation that would be unstable there. MFC's existing low_Mach correction handles low-Mach accuracy.

Acoustic substep (m_acoustic_substep.fpp). A 2nd-order centered forward–backward update of mass/energy transport and the pressure gradient, with the slow forcing added as a frozen dtau-scaled source each micro-step. Stabilized by grad–div divergence damping whose discrete operator is rank-one (-v vᵀ symbol), so it damps only compressive/acoustic modes and leaves divergence-free vortical flow untouched. The forward sweep computes flux divergences from a frozen snapshot of the conserved state and applies them afterward, which keeps species mass conservative (an in-place stencil update is not).

Time step. s_compute_dt computes dt from the advective CFL and n_substeps from a domain-max reduction of (|u|+c)/|u|; the only global collective is that existing timestep reduction.

Multifluid. The substep uses the stiffened-gas mixture EOS and mixture velocity, and advects volume fractions on the slow step. For num_fluids == 1 the path reduces exactly to the single-fluid expressions.

GPU. All device regions use the backend-agnostic GPU_* Fypp macros, so the same source runs on OpenACC and OpenMP target offload.

What's implemented

  • acoustic_substepping, n_acoustic_substeps, acoustic_div_damp parameters (+ checker/validator constraints).
  • m_acoustic_substep.fpp: forward–backward subcycle kernel (EOS, grad–div damping, snapshot transport, mixture support).
  • m_time_steppers.fpp: s_split_explicit_rk orchestration and the two-CFL s_compute_dt.
  • m_riemann_solver_hllc.fpp: the slow-flux variant (5-equation block).
  • m_sim_helpers.fpp: advective-CFL helper.
  • Parameterized low-Mach acoustic-pulse hardcoded ICs (hcid 183/264/304) so benchmark/example cases sweep grid and Mach number without recompiling.
  • Example cases and a regression golden test.

Validation

Speedup vs. the standard explicit scheme to the same physical time, at M = 0.01 (flat to ~5× at M = 0.1; the win is bounded by the substep cost, not by M):

1-D 2-D 3-D
CPU ~18× ~24× ~27×
GPU OpenACC ~64× ~54×
GPU OpenMP-offload ~43×
  • Mass conserved to ~1e-15; per-species mass at sharp material interfaces to ~1e-14.
  • 2-D Gresho vortex (M = 0.001) preserved to machine precision vs. ~18% speed decay in the standard scheme — the low-Mach accuracy benefit.
  • Speedup is flat across 1–8 MPI ranks (the per-substep narrow halo exchanges do not erode it).
  • num_fluids == 1 is bit-identical to the pre-feature result on the regression golden.

Scope and limitations

  • Targets smooth low-Mach flow: the acoustic substep is centered/non-upwinded, so embedded shocks are out of scope for now.
  • model_eqns = 2 only; first-order-in-time (operator splitting; spatial order unchanged).
  • GPU validated on a single V100 (ACC and OMP-offload). True multi-node / multi-GPU communication cost is not yet measured (intra-node MPI is flat).
  • Not yet done: second-order-in-time (Strang), shock/mixed-regime robustness, OpenMP-offload on Cray/AMD-flang.

Draft — opening for visibility and review of the approach.

…zen snapshot

The forward sweep advanced partial densities and total energy in place while the
centered flux stencil read those same neighbours, so the flux divergence mixed
pre- and post-update values, did not telescope, and lost per-species mass at sharp
interfaces (~0.9% drift). Snapshot the transported conserved vars (q_snap) at the
start of each microstep and compute every flux divergence from that frozen state,
then apply. Restores machine-precision species-mass conservation for any num_fluids.
Single-fluid energy transport shifts from Gauss-Seidel to Jacobi; golden 7F8ED027
regenerated.
@github-actions

Copy link
Copy Markdown

Claude Code Review

Head SHA: cd928ed

Files changed:

  • 23
  • src/simulation/m_acoustic_substep.fpp
  • src/simulation/m_time_steppers.fpp
  • src/simulation/m_sim_helpers.fpp
  • src/simulation/m_riemann_solver_hllc.fpp
  • src/simulation/m_checker.fpp
  • src/simulation/m_start_up.fpp
  • src/simulation/m_data_output.fpp
  • src/simulation/m_global_parameters.fpp
  • toolchain/mfc/test/cases.py
  • toolchain/mfc/case_validator.py

Findings:

toolchain/mfc/test/cases.py — undefined variable xc in analytic IC expression

The new acoustic-substepping regression test sets:

"patch_icpp(1)%pres": "1.0 + 1e-3 * exp(-((x - xc) / 0.05)**2)",

xc is not a recognized MFC IC expression variable. MFC's analytic-IC system validates expressions at case load and rejects unknown identifiers as immediate, named errors (per CLAUDE.md: "syntax errors and unknown variables are immediate, named errors"). The analogous example cases in this PR avoid this correctly — 1D_acoustic_lowmach/case.py uses a Python f-string to substitute the centroid numerically, and 2D_twofluid_lowmach/case.py routes through hcid 264. The test case should use one of those approaches:

"patch_icpp(1)%pres": f"1.0 + 1e-3 * exp(-((x - {0.5}) / 0.05)**2)",

or switch to hcid=183 (the new hardcoded IC added in this PR for exactly this 1D Gaussian-pulse pattern). As written, the test fails at case validation before any solver code runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant