class AlanMonsonChacko:
role = "AI / ML Engineer"
company = "PropMarker, UK (Remote)"
location = "Kerala, India 🇮🇳"
def what_i_build(self):
return [
"Production RAG chatbots",
"Gradient-boosted ML pipelines",
"Agentic LLM systems",
"Enterprise NLP engines",
]
def looking_for(self):
return "Remote AI/ML role — available NOW"| Metric | Value | System | |
|---|---|---|---|
| 🏆 | Holdout ROC-AUC | 0.8628 | UK Sale Propensity Model |
| ✅ | QA pass rate | 76.9% → 100% | Enterprise RAG Chatbot |
| 🛡️ | Runtime crashes | 0% | NLP Tagging Engine (prod) |
| ⚡ | SHAP inference | < 100ms | FastAPI explainability endpoint |
| 📦 | Records processed | 149,000+ | Land Registry + ONS + EPC + IMD |
| 🔒 | Hallucination rate | 0% | RAG chatbot (strict prompt masks) |
| 🧩 | Semantic chunks | 452 | From 42 proprietary documents |
| 🏷️ | Tags extracted | 50+ | Per property listing, structured JSON |
🏠 UK Real Estate Sale Propensity Platform — ROC-AUC 0.8628 · Click to expand
Problem: PropMarker needed to rank 149,000+ UK properties by 12-month sale likelihood — manual scoring was impossible at scale.
What I built:
- 🎯 Gradient-boosted ensemble pipeline (XGBoost + LightGBM + CatBoost + Random Forest)
- 🔢 Feature engineering across 6 heterogeneous datasets — Land Registry (149k transactions), ONS Census, EPC ratings, IMD deprivation indices
- 🔍 Temporal cross-validation strategy with zero future-data leakage
- ⚡ SHAP explainability served via FastAPI — feature contributions in <100ms
- 🤖 Optuna Bayesian HPO — training in <10 seconds on 20k-row downsampled sets
- 📊 Index-adjusted ECV algorithm projecting current property values from HPI records
Stack: Python XGBoost LightGBM CatBoost Optuna SHAP FastAPI SQLite scikit-learn pandas
| Metric | Value |
|---|---|
| Holdout ROC-AUC | 0.8628 |
| Training time (Optuna) | < 10 seconds |
| SHAP inference | < 100ms / prediction |
| Data leakage | 0% |
🤖 Enterprise RAG Conversational Chatbot — 100% QA pass rate · Click to expand
Problem: 42 proprietary platform documents were unsearchable — support staff wasted hours finding answers manually.
What I built:
- 📄 Document ingestion pipeline — 452 semantic chunks via RecursiveCharacterTextSplitter with custom Markdown separators
- 🧠 Local BAAI/bge-m3 embeddings (1024-dim) on CPU/GPU — 100% data privacy, zero third-party API cost
- 🔄 History-aware pre-retriever reformulating follow-up questions from conversation history
- 🚫 Strict negative-control prompt masks — absolute context reliance, zero hallucinations
- 📎 Automated citation compiler — appends exact file sources and text snippets to every answer
- 📱 Deployed as Streamlit + Gradio interfaces with full auditability
Stack: LangChain ChromaDB HuggingFace BAAI/bge-m3 OpenAI API Streamlit Gradio Python
| Metric | Value |
|---|---|
| QA pass rate | 76.9% → 100% |
| Hallucination rate | 0% |
| Documents indexed | 42 (452 chunks) |
| Data privacy | 100% local embeddings |
🏷️ LLM Semantic NLP Tagging Engine — 50+ tags · 0% crashes · Click to expand
Problem: UK property listings contained unstructured text that needed 50+ structured tags extracted reliably at scale.
What I built:
- 🏗️ Type-safe Pydantic schema mapping LLM output to binary feature flags (0/1) with citation strings
- 🛡️ Rate-limit handling + graceful all-zero fallback — 0% runtime crashes in continuous production
- 🎯 Prompt-level disambiguation guardrails — e.g. distinguishing Notice of Offer vs In Receipt of Offer
- 📊 Visual evaluation dashboard — TP/FP/FN colour-coding, live API cost tracking (USD)
- 🔄 Dual-model architecture — GPT-4o-mini + Gemini 2.0 Flash for cost/performance tradeoffs
Stack: LangChain Pydantic GPT-4o-mini Gemini 2.0 Flash Python HTML/CSS
🔍 AI Job Search Automation Tool — Claude API · 4-stage agentic pipeline · Click to expand
Problem: Job seekers waste hours manually tailoring resumes, writing cover letters, and preparing for interviews.
What I built:
- 📊 Stage 1 — ATS Scorer: Keyword match analysis, gap identification, score 0–100
- ✍️ Stage 2 — Resume Tailorer: Rewrites bullets and summary using JD language
- 📝 Stage 3 — Cover Letter Generator: Personalised with candidate metrics + company context
- 🎤 Stage 4 — Interview Coach: Mock Q&A graded 1–10 with actionable improvement feedback
Stack: Claude API React Anthropic Structured JSON Prompt Engineering
alan = {
"🔭 working_on" : "Production AI systems @ PropMarker UK",
"🌱 learning" : ["FastAPI + Docker deployment", "MCP servers", "vLLM"],
"👯 open_to" : "Remote AI/ML collaborations",
"💬 ask_me" : "LangChain · RAG · XGBoost · Prompt Engineering",
"📫 reach_me" : "alanmonson44@gmail.com",
"⚡ fun_fact" : "My RAG chatbot has a 0% hallucination rate in prod 🎯",
"🚀 available" : True, # hire me!
}┌─────────────────────────────────────────────────┐
│ │
│ "If it's not in production, it doesn't count" │
│ │
│ Every project I build ships to real users, │
│ handles real data, and has real metrics. │
│ │
└─────────────────────────────────────────────────┘
⭐ Star a repo if it helped you • 🤝 Open to collabs & remote roles • 📧 alanmonson44@gmail.com