Releases: djimrastephane/ProcessPath_AI
v1.0.0 — Initial release
ProcessPath_AI v1.0.0
Full process mining pipeline on the BPI Challenge 2020 Travel Permit dataset (TU/e) — 7,065 cases · 86,581 events · 18 months.
Live demo
https://processpathai-fejiac7urktgbcvbbwylhd.streamlit.app
What's included
13 notebooks covering the complete analysis:
| Notebook | Topic |
|---|---|
| 01 | Initial exploration — case/event stats, variant analysis |
| 02 | Process structure — DFG, transition matrix, happy path |
| 03 | Process discovery — Inductive Miner, Heuristics Miner |
| 04 | Bottleneck analysis — waiting/service time, stuck cases |
| 05 | Conformance analysis — token replay, violation detection |
| 06 | Predictive analytics — XGBoost/RF/LogReg (AUC 0.974, retrospective) |
| 07 | SHAP + prefix features — early warning curve (k=1–20) |
| 08 | Temporal cross-validation — drift, bias quantification |
| 09 | Final report — dashboard, priority matrix, findings, recommendations |
| 10 | Leakage & calibration audit — ablation study, Brier score, reliability diagram |
| 11 | Remaining time prediction — XGBoost regression, P10/P50/P90 quantile intervals, temporal CV, SHAP |
| 12 | Survival analysis — Kaplan-Meier, log-rank tests, Cox PH model, risk-group curves |
| 13 | Violation root cause — decision tree rules, XGBoost, SHAP, department risk exposure |
58+ figures · 40+ tables (pre-computed, committed)
Streamlit app — 7-page interactive dashboard (app/app.py)
Deployed models:
app/model/prefix_k8.joblib— early warning classifier (XGBoost k=8, AUC 0.810)app/model/remaining_time_k8.joblib— remaining time regressor (XGBoost k=8, MAE 12.4d, P10–P90 coverage 80.8%)app/model/survival_cox_k8.joblib— Cox Proportional Hazards model (concordance 0.814, all 7,065 cases)
Key findings
- 991 cases (14%) permanently stuck on
Send Reminder— 134d median vs 63d resolved - 17.1% travel-ordering violations — 746 Type A (departed before submission), 583 Type B (departed before approval)
- 44.9% conformance violations (fitness < 1.0 via token replay) — XGBoost predicts with AUC 0.956; department, duration, and event count are dominant drivers (Notebook 13)
- Scheduling dominates duration — 69% of case time is voluntary employee scheduling, not admin delay
- Early warning model at k=8 events — AUC 0.810 (leakage-free,
elapsed_daysexcluded) - Data drift confirmed —
elapsed_dayshalved 2017Q1→2018Q4; standard k-fold overstates AUC by +0.048 - Temporal leakage identified & corrected —
elapsed_daysalone achieves AUC 0.833; excluded from deployed classifier (Notebook 10) - Remaining time prediction at k=8 events — MAE 12.4 days, R² 0.42, P10–P90 interval coverage 80.8% (Notebook 11)
- Survival analysis on all cases — KM median survival 72.4d; Cox concordance 0.814; rejections significantly slow completion (log-rank p ≈ 0) (Notebook 12)
Leakage note
The initial k=8 AUC of 0.967 was inflated. elapsed_days alone achieves AUC 0.833. Notebook 10 confirmed this via ablation (AUC 0.967 → 0.810 on removal). The deployed classifier excludes elapsed_days. Calibration is good: Brier score 0.066, skill score 0.70.
For the remaining time regressor and Cox model, elapsed_days is legitimately included — predicting how much time is left requires knowing how fast the case has moved so far.
Dataset
van Dongen, Boudewijn (2020): BPI Challenge 2020: Travel Permit Data. Version 1. 4TU.ResearchData. dataset.
https://doi.org/10.4121/uuid:ea03d361-a7cd-4f5e-83d8-5fbdf0362550
The raw log (PermitLog.xes, 33 MB) is not bundled in this repo. Download from the link above and place at data/raw/PermitLog.xes.
Commits
| SHA | Description |
|---|---|
eae8a7f |
Add violation root cause analysis (Notebook 13) |
083e668 |
Add survival analysis (Notebook 12) and Streamlit page |
8b9c1e5 |
Update README for Notebook 11 and remaining time model |
2a1928c |
Add remaining time prediction (Notebook 11) |
2935207 |
Add live Streamlit app URL to README |
d33cf07 |
Prepare for Streamlit Community Cloud deployment |
bb2dca4 |
Add dataset citation (van Dongen 2020, 4TU.ResearchData) |
8e38528 |
Correct AUC across all artifacts after leakage audit |
110cbe5 |
Add Notebook 10: leakage and calibration audit |
d153cf5 |
Add Streamlit app with 4-page process mining dashboard |
Stack
Python 3.13 · pm4py 2.7.22.5 · XGBoost 3.x · SHAP 0.52 · lifelines 0.30 · scikit-learn · pandas · matplotlib · Streamlit
Run locally
pip install -r requirements.txt
streamlit run app/app.pySee README for full setup instructions.