Package for EIR (Entomological Inoculation Rate) estimation using machine learning.
It estimates EIR from prevalence, converts between EIR and human biting rate (including the effect of changes in mosquito density), and turns a bednet specification (net type and resistance level) into the dn0 killing parameter.
pip install estimint # core: inference only (numpy, pandas, xgboost, scipy)Optional extras, by use case:
pip install "estimint[train]" # data prep + model training (duckdb, scikit-learn, pyarrow)
pip install "estimint[viz]" # plotting (matplotlib)
pip install "estimint[scenarios]" # run_scenarios pipeline (stateMINT emulator)
pip install "estimint[all]"
pip install "estimint[dev]" # test/lint/type-check toolchainThe run_scenarios pipeline also needs the stateMINT emulator (Python 3.12+). For now it
comes from the mamba2-train branch. With uv this is handled for you:
uv sync --extra scenariosWith plain pip, install stateMINT from the branch yourself, then estiMINT:
pip install "git+https://github.com/mrc-ide/stateMINT.git@mamba2-train"
pip install estimintFor local development with uv:
uv sync --all-extras --devAll training data lives in datasets/estimint_simulations_y9.parquet. Two model folders
derive their views from it and train:
datasets/ # training data (see datasets/README.md)
models/
prevalence/ # prev_y9 -> EIR (estiMINT_model.pkl)
hbr/ # HBR<->EIR sub-models (estiMINT_HBR_model.pkl, estiMINT_EIR_to_HBR_model.pkl)
Retrain a model end-to-end, e.g. the prevalence model:
python models/prevalence/prepare.py # derive the training view from the parquet
python models/prevalence/train.py # train -> estiMINT_model.pkl + metrics/ + plots/The deployed models shipped with the package live in src/estimint/data/ and are loaded by
name (prevalence, hbr, eir_to_hbr). This is independent of the training pipeline above.
from estimint import load_xgb_model, run_xgb_model
import pandas as pd
# Load a bundled model by name: "prevalence", "hbr", or "eir_to_hbr"
model = load_xgb_model("prevalence")
# Prepare input data
new_data = pd.DataFrame({
"dn0_use": [0.5],
"Q0": [0.3],
"phi_bednets": [0.6],
"seasonal": [1],
"itn_use": [0.7],
"irs_use": [0.2],
"prev_y9": [0.15] # or "prevalence"
})
# Run prediction
eir_predictions = run_xgb_model(new_data, model)
print(f"Predicted EIR: {eir_predictions[0]:.2f}")from estimint import load_xgb_model, run_xgb_model, set_global_model
# Set global model once
model = load_xgb_model("prevalence")
set_global_model(model)
# Run predictions without passing model
predictions = run_xgb_model(new_data) # Uses global modelTurn a bednet specification (a mix of net types and an insecticide resistance level) into
the dn0 covariate, the probability a mosquito dies on contact, along with total ITN usage.
from estimint import calculate_dn0, net_types
net_types() # ['pyrethroid_only', 'pyrethroid_pbo', 'pyrethroid_ppf', 'pyrethroid_pyrrole']
res = calculate_dn0(0.5, py_only=0.4, py_pbo=0.3, py_pyrrole=0.2, py_ppf=0.1)
res.dn0, res.itn_use # weighted dn0, total net usagerun_scenarios runs the whole pipeline in one call. You give it a list of scenarios and
get back a DataFrame. For each scenario it works out the bednet killing effect, estimates
the EIR (from prevalence, from biting rate, or taken directly), optionally adjusts for a
change in mosquito density, then runs the stateMINT emulator forward to the prevalence and
cases trajectories.
This needs the stateMINT package installed as well
as estiMINT. estiMINT only loads it when you call run_scenarios, and the model weights
download from HuggingFace.
from estimint import run_scenarios
from estimint.scenarios import Scenario, EirTarget
scenarios = [
Scenario(name="PBO nets, prevalence input, 60% more mosquitoes",
eir_target=EirTarget(0.30, "prevalence"),
res_use=0.55, py_pbo=0.85,
Q0=0.90, phi=0.85, seasonal=1, irs=0.40, lsm=0.0,
mosquito_delta=0.60),
Scenario(name="Biting rate input, mixed nets",
eir_target=EirTarget(250000.0, "hbr"),
res_use=0.45, py_only=0.30, py_ppf=0.20,
Q0=0.80, phi=0.82, seasonal=0, irs=0.0),
Scenario(name="EIR supplied directly, no nets",
eir_target=EirTarget(20.0, "eir"),
res_use=0.0,
Q0=0.88, phi=0.78, seasonal=1, irs=0.60),
]
df = run_scenarios(scenarios)
print(df[["name", "eir_baseline", "eir_final", "prev_y9", "cases_endline"]])Every scenario is a Scenario and needs name, res_use, eir_target, Q0,
phi, seasonal and irs. lsm, routine and irs_future default to 0 (note
irs_future does not default to irs — set it explicitly if you want IRS to
continue). Current nets: give a net-type usage mix (py_only, py_pbo,
py_pyrrole, py_ppf shares), or leave the net keys out for none; current and
future legs share the same res_use. Future nets: give net_type_future +
itn_future to switch net type; omit net_type_future and the future leg is zeroed
(it does not carry the current mix forward), or set itn_future=0 to remove
nets explicitly. mosquito_delta only applies when eir_target.input_mode is "prevalence".
The returned DataFrame has one row per scenario. Alongside the inputs it gives the
estimated EIR (eir_baseline, and eir_final after any mosquito-density change) and the
stateMINT output. That output is year-9 prevalence (prev_y9), endline prevalence and
cases, and the full 157-step prevalence and cases series. What you do with it is up to
you.
The estimint.scenarios module is also where the simulation-based inference and experiment
code will go.
from estimint import (
r2, rmse, mse, mae, median_ae, mae_rel, rmsle, smape,
fit_qmap_w, predict_qmap_w, scale_pos
)
# Calculate metrics
y_true = [1, 2, 3, 4, 5]
y_pred = [1.1, 2.2, 2.9, 4.1, 4.8]
print(f"R²: {r2(y_true, y_pred):.4f}")
print(f"RMSE: {rmse(y_true, y_pred):.4f}")
print(f"MAE: {mae(y_true, y_pred):.4f}")
# Quantile mapping calibration
cal = fit_qmap_w(y_pred, y_true)
y_calibrated = predict_qmap_w(y_pred, cal)These functions need the training extras. Install them with pip install "estimint[train]",
which adds duckdb and scikit-learn.
from estimint import load_and_filter, make_value_weights, strata_and_split
# Load and filter parquet data
result = load_and_filter("data.parquet", thr_lo=0.02, thr_hi=0.95)
df = result["DT"]
df_excluded = result["DT_excluded"]
# Create inverse-frequency weights
weights = make_value_weights(df["eir"].values, digits=3)
# Stratified split
df["eir_log10"] = np.log10(df["eir"])
df = strata_and_split(df, k_strata=16, seed=42)uv sync --extra dev # or: pip install -e ".[dev]"
uv run pytest # or: pytestThis covers the metric and utility helpers, the EIR estimators (prevalence, HBR and direct EIR), the mosquito-density HBR pipeline, and the bednet calculation.
The test suite runs on every push and pull request across Python 3.10 to 3.14, defined in
.github/workflows/tests.yml.
Releases publish to PyPI from .github/workflows/publish.yml.
It builds with uv build and uploads with uv publish using
PyPI trusted publishing,
so no token is stored. To cut a release, bump version in pyproject.toml and publish a
GitHub Release. The first time, register this repository as a trusted publisher in the PyPI
project settings.
- File format: Models saved as
.pkl(pickle) instead of.rds - Data handling: Uses pandas instead of data.table
- Plotting: Uses matplotlib instead of ggplot2
- Global model: Use
set_global_model()/get_global_model()instead of.GlobalEnv
MIT License