Skip to content

Build quarterly model retraining pipeline with drift detection #7

Description

@djimrastephane

Motivation

Notebook 08 confirmed data drift: median elapsed_days at k=5 halved from 39d (2017Q1) to 16d (2018Q4). Without retraining, the model's probability estimates will drift from reality.

Proposed pipeline

  1. Drift trigger: monitor median elapsed_days at k=5 monthly. Alert if it shifts >5 days from the trailing 3-month baseline.
  2. Retraining: rolling 18-month window, exclude elapsed_days (leaky), use temporal CV for evaluation.
  3. Gate: only deploy if new model AUC ≥ 0.780 on the most recent quarter's hold-out. Otherwise keep previous model and page on-call.
  4. Logging: record Brier score, AUC, and feature importances for each retrain. Flag if SHAP rank correlation vs. previous model < 0.7 (concept drift).

Key decisions from Notebook 10

  • elapsed_days must remain excluded (temporal leakage confirmed)
  • Platt scaling is not needed (raw probabilities are well-calibrated, Brier 0.066)
  • Minimum training size for 95% of peak AUC at k=5: ~1,000–1,500 cases

References

  • Notebook 08: notebooks/08_temporal_cv.ipynb
  • Notebook 10: notebooks/10_leakage_calibration.ipynb

Metadata

Metadata

Assignees

No one assigned

    Labels

    deploymentProduction deployment and servingenhancementNew feature or requestmodel-qualityModel correctness, leakage, calibration

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions