Skip to content

Releases: djimrastephane/ProcessPath_AI

v1.0.0 — Initial release

26 Jun 19:18

Choose a tag to compare

ProcessPath_AI v1.0.0

Full process mining pipeline on the BPI Challenge 2020 Travel Permit dataset (TU/e) — 7,065 cases · 86,581 events · 18 months.

Live demo

https://processpathai-fejiac7urktgbcvbbwylhd.streamlit.app

What's included

13 notebooks covering the complete analysis:

Notebook Topic
01 Initial exploration — case/event stats, variant analysis
02 Process structure — DFG, transition matrix, happy path
03 Process discovery — Inductive Miner, Heuristics Miner
04 Bottleneck analysis — waiting/service time, stuck cases
05 Conformance analysis — token replay, violation detection
06 Predictive analytics — XGBoost/RF/LogReg (AUC 0.974, retrospective)
07 SHAP + prefix features — early warning curve (k=1–20)
08 Temporal cross-validation — drift, bias quantification
09 Final report — dashboard, priority matrix, findings, recommendations
10 Leakage & calibration audit — ablation study, Brier score, reliability diagram
11 Remaining time prediction — XGBoost regression, P10/P50/P90 quantile intervals, temporal CV, SHAP
12 Survival analysis — Kaplan-Meier, log-rank tests, Cox PH model, risk-group curves
13 Violation root cause — decision tree rules, XGBoost, SHAP, department risk exposure

58+ figures · 40+ tables (pre-computed, committed)

Streamlit app — 7-page interactive dashboard (app/app.py)

Deployed models:

  • app/model/prefix_k8.joblib — early warning classifier (XGBoost k=8, AUC 0.810)
  • app/model/remaining_time_k8.joblib — remaining time regressor (XGBoost k=8, MAE 12.4d, P10–P90 coverage 80.8%)
  • app/model/survival_cox_k8.joblib — Cox Proportional Hazards model (concordance 0.814, all 7,065 cases)

Key findings

  • 991 cases (14%) permanently stuck on Send Reminder — 134d median vs 63d resolved
  • 17.1% travel-ordering violations — 746 Type A (departed before submission), 583 Type B (departed before approval)
  • 44.9% conformance violations (fitness < 1.0 via token replay) — XGBoost predicts with AUC 0.956; department, duration, and event count are dominant drivers (Notebook 13)
  • Scheduling dominates duration — 69% of case time is voluntary employee scheduling, not admin delay
  • Early warning model at k=8 events — AUC 0.810 (leakage-free, elapsed_days excluded)
  • Data drift confirmedelapsed_days halved 2017Q1→2018Q4; standard k-fold overstates AUC by +0.048
  • Temporal leakage identified & correctedelapsed_days alone achieves AUC 0.833; excluded from deployed classifier (Notebook 10)
  • Remaining time prediction at k=8 events — MAE 12.4 days, R² 0.42, P10–P90 interval coverage 80.8% (Notebook 11)
  • Survival analysis on all cases — KM median survival 72.4d; Cox concordance 0.814; rejections significantly slow completion (log-rank p ≈ 0) (Notebook 12)

Leakage note

The initial k=8 AUC of 0.967 was inflated. elapsed_days alone achieves AUC 0.833. Notebook 10 confirmed this via ablation (AUC 0.967 → 0.810 on removal). The deployed classifier excludes elapsed_days. Calibration is good: Brier score 0.066, skill score 0.70.

For the remaining time regressor and Cox model, elapsed_days is legitimately included — predicting how much time is left requires knowing how fast the case has moved so far.

Dataset

van Dongen, Boudewijn (2020): BPI Challenge 2020: Travel Permit Data. Version 1. 4TU.ResearchData. dataset.
https://doi.org/10.4121/uuid:ea03d361-a7cd-4f5e-83d8-5fbdf0362550

The raw log (PermitLog.xes, 33 MB) is not bundled in this repo. Download from the link above and place at data/raw/PermitLog.xes.

Commits

SHA Description
eae8a7f Add violation root cause analysis (Notebook 13)
083e668 Add survival analysis (Notebook 12) and Streamlit page
8b9c1e5 Update README for Notebook 11 and remaining time model
2a1928c Add remaining time prediction (Notebook 11)
2935207 Add live Streamlit app URL to README
d33cf07 Prepare for Streamlit Community Cloud deployment
bb2dca4 Add dataset citation (van Dongen 2020, 4TU.ResearchData)
8e38528 Correct AUC across all artifacts after leakage audit
110cbe5 Add Notebook 10: leakage and calibration audit
d153cf5 Add Streamlit app with 4-page process mining dashboard

Stack

Python 3.13 · pm4py 2.7.22.5 · XGBoost 3.x · SHAP 0.52 · lifelines 0.30 · scikit-learn · pandas · matplotlib · Streamlit

Run locally

pip install -r requirements.txt
streamlit run app/app.py

See README for full setup instructions.