STILL IN DEVELOPMENT For now it nothing is guaranteed, the example monte carlo works, but cleanup is needed
nb2slurm seamlessly makes your notebook workflow ready for SLURM upscaling.
It takes a notebook workflow that runs for one subject (one catchment, one region, one number...) and generates everything needed to run it for many subjects on a SLURM HPC — driven entirely from a notebook, no command line.
This is the reusable generalisation of the eWaterCycle
CCI-analysis-seamless
project (its hardcoded cci.py + run_cci.slurm + submit_*.sh).
nb2slurm is a package and a scaffolder:
- Package (logic): the importable
nb2slurm— theWorkflowclass plus helpers for settings, done-tracking, and an SSH transport. - Scaffolder (templates): bundled Jinja2 templates that
Workflow.build()renders into a concretescripts/directory for your project.
The project starter (notebooks + a control notebook) is intended to live in a separate template repository that you clone and fill in; it imports this package.
pip install -e ".[dev]" # from a cloneDependencies: papermill, filelock, jinja2, paramiko.
import nb2slurm
wf = nb2slurm.Workflow(
name="myproject",
notebooks=[
"notebooks/0_settings.ipynb", # first nb writes settings.json
"notebooks/1_analysis.ipynb", # later nbs read settings.json
],
kernel="myenv",
varying=["region_id", "country"], # what changes per job
resources=dict(nodes=1, cpus=2, time="04:00:00"),
conda_env="myenv", # activated in the SLURM job
mounts=[ # optional rclone mounts
{"remote": "dcache:/climate-data/caravan", "mountpoint": "/scratch/caravan"},
],
concurrency=3, # max jobs running at once per submit
# output_dir="output", # where per-subject outputs go (root-relative by default)
# done_csv="done/done.csv", # idempotency ledger (root-relative by default)
)
wf.build() # render scripts/ into the project
# prove it works on one subject locally first:
# python scripts/run_workflow.py 123 NL
cfg = nb2slurm.SSHConfig(host="spider.surf.nl", user="me",
remote_dir="/home/me/myproject", key_filename="~/.ssh/id_ed25519")
wf.check(ssh=cfg) # preflight: ready to submit?
wf.submit([("123", "NL"), ("456", "DE")], ssh=cfg) # sbatch one job per subject
wf.status(ssh=cfg) # parsed squeue
wf.cancel(ssh=cfg) # scancel what we submittedThe SLURM job does conda activate <env> and papermill needs a registered
Jupyter kernel — both must exist on the cluster first. nb2slurm can build them for
you so you never touch conda or the command line:
import nb2slurm
env = nb2slurm.Environment(
name="myenv",
kernel="myenv", # must match Workflow(kernel=...)
conda_packages=["xarray", "numpy"],
pip_packages=["nb2slurm", "ewatercycle"],
)
wf = nb2slurm.Workflow(name="myproject", notebooks=[...], kernel="myenv",
varying=["region_id"], environment=env)
wf.create_environment(ssh=cfg) # one-time: env + kernel on the HPC (writes environment.yml)Passing environment=env to Workflow keeps the names in sync (it errors if
kernel/conda_env disagree) and makes build() also write environment.yml.
create_environment() uses mamba when available, falls back to conda, and
registers the kernel via ipykernel. Omit ssh= to build the env locally instead.
environment is optional. Many clusters already provide Python via a module
system or a shared environment. In that case skip Environment entirely and
either point at an existing conda env or run setup commands yourself:
# existing conda env on the cluster
wf = nb2slurm.Workflow(..., kernel="hydro_kernel", conda_env="hydro")
# module-based cluster (no conda): raw shell lines run before the job
wf = nb2slurm.Workflow(..., kernel="hydro_kernel",
setup=["module load 2023", "source /opt/envs/hydro/bin/activate"])setup lines are emitted at the top of the SLURM script (before mounts and the
runner). With no environment/conda_env, no conda activate is generated —
the job just uses whatever Python your setup puts on the PATH. The only hard
requirement is that kernel names a Jupyter kernel that exists on the cluster.
Most notebooks share one kernel, but a few may need another (e.g. a calibration
step). Set per-notebook overrides with kernels (everything not listed uses the
default kernel), and list any extra environments to create with
extra_environments:
env1 = nb2slurm.Environment(name="myenv1", kernel="myenv1", conda_packages=["xarray"])
env2 = nb2slurm.Environment(name="myenv2", kernel="myenv2", pip_packages=["sceua"])
wf = nb2slurm.Workflow(
name="proj", notebooks=nbs, kernel="myenv1", varying=["region"],
environment=env1, # default for most notebooks
kernels={"notebooks/step_8.ipynb": "myenv2"}, # this one runs under myenv2
extra_environments=[env2], # so myenv2 is created too
)
wf.create_environment(ssh=cfg) # builds BOTH envs + registers BOTH kernelsThe runner picks kernels.get(notebook, kernel) per notebook. Drop
extra_environments if myenv2 already exists on the cluster (or is a provided
module) — then you only need the kernels mapping.
New to HPC/SLURM/conda? See docs/hpc-for-beginners.md for a plain-language primer (no Linux required).
submit also accepts dry_run=True to print the exact sbatch commands without
running them, and works without ssh= (local subprocess) when run on a cluster
login/Jupyter node.
check(ssh=cfg) is an optional preflight: it verifies that remote_dir, your
notebooks, the built scripts/, the conda env and the Jupyter kernel all exist,
printing an OK/FAIL line per check and raising on the first failure (pass
raise_on_error=False to get the full report back as a list instead). Run it once
before your first submit to turn a cryptic SLURM failure into a clear message.
Into <project>/scripts/:
| file | role |
|---|---|
run_workflow.py |
papermill driver: skip-if-done → run nb 0 (makes settings.json) → run the rest → mark done |
job.slurm |
#SBATCH resources, conda activate, rclone mounts, then runs the driver |
submit_batch.sh |
CLI fallback: submit every job at once (simple, no concurrency — easy to read) |
submit_jobs.sh |
CLI fallback: same, but throttles how many run concurrently |
cancel_jobs.sh |
CLI fallback: cancel jobs by name |
jobs.txt |
flat one-job-per-line list generated from jobs.json (what the bash scripts read) |
structure.json |
the resolved config used to render everything |
The notebook (wf.submit(...)) is the primary path; the bash scripts are a
fallback for when you're SSH'd into the cluster. They never parse JSON — nb2slurm
flattens jobs.json into jobs.txt for them, so they stay short and readable.
- The first notebook has a papermill
parameterscell; nb2slurm injects thevaryingvalues plusoutdir. It should callnb2slurm.Settings.write(outdir, {...}). - Every later notebook has a
parameterscell withsettings_path, and starts withsettings = nb2slurm.Settings.load(settings_path).
This keeps per-run details in one place (settings.json) and means only the first
notebook is parameterised.
For paths that genuinely differ between your laptop and the cluster (shared data
dirs, etc.), branch on nb2slurm.on_hpc() — it detects a batch run via the SLURM
environment (and an NB2SLURM sentinel), so it works for any user, unlike
grepping Path.home():
import nb2slurm
data_dir = "/project/ewater/Data" if nb2slurm.on_hpc() else "/data/shared"Converting an existing local-only workflow? See
docs/setup_notebooks.ipynb — a step-by-step guide
(with before/after snippets) covering the seven in-notebook changes that make your
notebooks run both locally and on SLURM.
Instead of a flat subject list, the jobs to run live in a nested JSON file. Each
root-to-leaf path is one SLURM job, and the levels line up with varying. The
output directory mirrors that hierarchy, so the job list and the folder tree can
never drift apart.
{
"NL": { "123": ["ssp126", "ssp245"] },
"DE": { "789": ["ssp585"] }
}With varying=["country", "region", "scenario"] this means:
jobs: (NL,123,ssp126) (NL,123,ssp245) (DE,789,ssp585)
dirs: output/NL/123/ssp126 output/NL/123/ssp245 output/DE/789/ssp585
Format rules (so you can generate the file however you like — a literal dict, a comprehension, from a CSV, ...):
- a dict nests one more level (its keys are the values of the next
varyingdimension); - a list at the bottom means several jobs sharing the same parent path;
null/[]/{}ends the path there (a job/leaf with no deeper level).
It's just JSON, so build it any way that suits you and write it to jobs.json.
For example, in Python:
import json
countries = {"NL": ["123", "456"], "DE": ["789"]}
scenarios = ["ssp126", "ssp245", "ssp585"]
jobs = {c: {r: scenarios for r in regions} for c, regions in countries.items()}
json.dump(jobs, open("jobs.json", "w"), indent=2)submit() reads this file by default (no arguments needed):
wf = nb2slurm.Workflow(..., varying=["country","region","scenario"], jobs_json="jobs.json")
wf.build_outputs() # optional: pre-create the whole output/... tree from the JSON
wf.submit(ssh=cfg) # reads jobs.json, one job per leaf path
wf.submit([("NL","123","ssp126")], ssh=cfg) # override: run an explicit subset instead
wf.submit(ssh=cfg, jobs_json="rerun.json") # override: use a different fileBecause each job's output dir is built from the JSON, your first notebook never
builds folders — it just receives outdir and writes settings.json. The
underlying parser is exposed as nb2slurm.Structure if you want it directly
(Structure.from_json(path).jobs() / .build(base)).
nb2slurm wraps rsync (so you need rsync available locally) with two
deliberately one-directional helpers:
wf.push(ssh=cfg) # local project -> cluster: notebooks/, scripts/, jobs.json, ...
wf.pull(ssh=cfg) # cluster results -> local: output/ and done/ onlyThe split is the safety mechanism:
pushnever uploadsoutput//done/— re-uploading your latest notebook edits can't wipe results already produced on the cluster.pullnever fetchesnotebooks//scripts//jobs.json— syncing results back can't overwrite a notebook you changed locally while jobs were running.
So the normal loop after editing a notebook is: push the change, submit again
(finished work is skipped via done.csv), then pull results when ready.
For real use the control surface is split into four notebooks (see
docs/control/):
| notebook | does |
|---|---|
0_config.ipynb |
guided settings; saves control_config.json + jobs.json |
1_build.ipynb |
wf.build() → wf.push() → wf.create_environment() → wf.check() |
2_submit.ipynb |
wf.submit() / wf.status() / wf.cancel() |
3_sync.ipynb |
wf.pull() — results only, never your notebooks |
You only edit 0_config.ipynb. It builds the objects from your settings and
saves them with nb2slurm.save_config(...); the other three start with
wf, cfg = nb2slurm.load_config("control_config.json"), so they always share one
source of truth and never duplicate settings.
docs/walkthrough.ipynb is the single-notebook narrative overview of the whole
flow; the four above are the practical, modular version.
python -m pytest -q