Motivation
Notebook 08 confirmed data drift: median elapsed_days at k=5 halved from 39d (2017Q1) to 16d (2018Q4). Without retraining, the model's probability estimates will drift from reality.
Proposed pipeline
- Drift trigger: monitor median
elapsed_days at k=5 monthly. Alert if it shifts >5 days from the trailing 3-month baseline.
- Retraining: rolling 18-month window, exclude
elapsed_days (leaky), use temporal CV for evaluation.
- Gate: only deploy if new model AUC ≥ 0.780 on the most recent quarter's hold-out. Otherwise keep previous model and page on-call.
- Logging: record Brier score, AUC, and feature importances for each retrain. Flag if SHAP rank correlation vs. previous model < 0.7 (concept drift).
Key decisions from Notebook 10
elapsed_days must remain excluded (temporal leakage confirmed)
- Platt scaling is not needed (raw probabilities are well-calibrated, Brier 0.066)
- Minimum training size for 95% of peak AUC at k=5: ~1,000–1,500 cases
References
- Notebook 08:
notebooks/08_temporal_cv.ipynb
- Notebook 10:
notebooks/10_leakage_calibration.ipynb
Motivation
Notebook 08 confirmed data drift: median
elapsed_daysat k=5 halved from 39d (2017Q1) to 16d (2018Q4). Without retraining, the model's probability estimates will drift from reality.Proposed pipeline
elapsed_daysat k=5 monthly. Alert if it shifts >5 days from the trailing 3-month baseline.elapsed_days(leaky), use temporal CV for evaluation.Key decisions from Notebook 10
elapsed_daysmust remain excluded (temporal leakage confirmed)References
notebooks/08_temporal_cv.ipynbnotebooks/10_leakage_calibration.ipynb