An AI-powered e-commerce recommendation engine built with Flask, React, and scikit-learn
CortexCart is a full-stack multi-modal product recommendation system that uses machine learning to find similar products from a catalog of 93,000+ luxury products sourced from JomaShop. It combines TF-IDF text analysis, price normalization, and brand encoding into a hybrid content-based recommendation engine, served through a Flask REST API and consumed by a premium React frontend with glassmorphism UI design.
The system supports three recommendation modes:
- Product-based — Select any product to find visually and contextually similar items
- Query-based — Type a natural language search with optional brand filtering
- Batch processing — Upload a CSV of queries and get bulk recommendations
The /insights page exposes the full PDF-aligned ML pipeline described in the
project report (Chapter 3 — System Design). It fuses three modalities
into a sparse feature matrix and feeds it through a stacking ensemble:
text ──► HashingVectorizer (2^18 features, ngram (1,2), stop_words="english",
alternate_sign=False, norm="l2")
brand ──► LabelEncoder ──► one-hot (top-200 brands + "Other")
price ──► StandardScaler(with_mean=False) on [finalPrice, discount_pct]
│
▼
sparse hstack ──► Stacking Ensemble
├── GradientBoostingClassifier (base #1, dense projection)
├── LightGBM (base #2, fallback: RandomForest)
├── NGBoost (base #3, fallback: ExtraTrees)
└── meta: CalibratedClassifierCV(LinearSVC, sigmoid, cv=3)
Implementation: backend/ml_pipeline.py Training script: scripts/train_classifier.py
| Endpoint | Purpose |
|---|---|
GET /api/ml/architecture |
Machine-readable pipeline spec (renders the diagram) |
GET /api/ml/metrics |
Accuracy / precision / recall / F1 per base model + ensemble |
POST /api/ml/classify |
Live classification — returns per-model + ensemble probabilities |
A dedicated ML Insights page (/insights) renders the architecture
diagram, per-model evaluation charts, and an interactive classifier
playground that calls POST /api/ml/classify and visualizes top-K
probabilities for each base model alongside the stacking verdict.
# Default (sampled, fast):
python scripts/train_classifier.py
# Force retrain (after schema changes):
python scripts/train_classifier.py --force
# Train on the full 93K-row CSV:
python scripts/train_classifier.py --fullTrained artefacts are cached under models_cache/ml/ and re-loaded by the
backend on subsequent boots. LightGBM and NGBoost are optional — uncomment
them in backend/requirements.txt to enable native acceleration; otherwise
scikit-learn equivalents are used automatically.
- Username / Password auth with bcrypt-hashed passwords and JWT sessions (
PyJWT) - Role-based access control —
userandadminroles, withRequireAuth/RequireAdminroute guards on the React side and@require_auth/@require_admindecorators on the Flask side - Supabase-backed when
SUPABASE_URL/SUPABASE_KEYare set, with a local JSON file fallback (backend/.local_users.json) for instant dev use - Admin dashboard at
/admin— overview KPIs, signup/activity time series, top products, recommendation source breakdown, search logs, recommendation logs, user management (promote/demote, enable/disable), and a one-click embedding re-seed runner - Analytics tracking — every search, recommendation request (realtime, smart, personalized, ai_chat), and recommendation click is logged for the dashboard
- Default admin bootstrap — set
DEFAULT_ADMIN_USERNAMEandDEFAULT_ADMIN_PASSWORDin.envto auto-create an admin on first boot
- Add to your
.env:JWT_SECRET=please-change-me-to-a-long-random-string JWT_TTL_HOURS=24 DEFAULT_ADMIN_USERNAME=admin DEFAULT_ADMIN_PASSWORD=ChangeThisStrongPassword! - (Optional) Apply the new tables to Supabase: re-run
supabase/schema.sql(it now includesapp_users,auth_events,recommendation_logs,recommendation_clicks,search_logs, and analytics RPC functions). - Install new deps:
pip install -r backend/requirements.txt cd frontend && npm install - Start the backend and frontend as usual. Visit
/signupto create your first user, or sign in as the bootstrapped admin and open/admin.
- Hybrid ML Pipeline — Combines TF-IDF text features (20,000 dimensions, bigrams, sublinear TF), MinMaxScaler price normalization, and LabelEncoder brand one-hot encoding
- Weighted Feature Fusion — Text (1.0), Price (0.3), Brand (0.5) weights for balanced recommendations
- Cosine Similarity Search — Fast similarity computation across 93,000+ products
- Model Caching — Trained models are serialized with joblib for fast restarts
- RESTful API — Clean Flask endpoints with pagination, search, and error handling
- Batch Processing — Upload CSV files for bulk recommendation generation
- Production Ready — Gunicorn server with configurable workers and timeout
- Premium Dark Theme — Glassmorphism design with frosted glass effects and gradient accents
- Animated UI — Smooth page transitions and card animations with Framer Motion
- Responsive Design — Fully responsive grid layouts that adapt to any screen size
- Real-time Search — Instant product catalog search with debouncing
- Paginated Catalog — Browse 93,000+ products with smooth page navigation
- Drag & Drop — CSV file upload with drag-and-drop support for batch mode
- Similarity Scores — Visual display of match percentages on recommended products
- Discount Badges — Automatic calculation and display of discount percentages
┌─────────────────────────────────────────────────────┐
│ React Frontend │
│ (Vite + React 18 + Framer Motion) │
│ │
│ ┌───────────┐ ┌───────────┐ ┌──────────────────┐ │
│ │ HomePage │ │ Catalog │ │ Recommendations │ │
│ │ │ │ Page │ │ Page │ │
│ └───────────┘ └───────────┘ └──────────────────┘ │
│ ┌───────────────────────────────────────────────┐ │
│ │ BatchPage (CSV Upload) │ │
│ └───────────────────────────────────────────────┘ │
└──────────────────────┬──────────────────────────────┘
│ HTTP (axios)
▼
┌─────────────────────────────────────────────────────┐
│ Flask REST API │
│ │
│ /api/health GET Health check │
│ /api/products GET Paginated catalog │
│ /api/products/:id GET Single product │
│ /api/recommend/ POST Real-time recs │
│ realtime │
│ /api/recommend/ POST Batch CSV recs │
│ batch │
└──────────────────────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Recommendation Engine (ML) │
│ │
│ ┌─────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ TF-IDF │ │ Price │ │ Brand │ │
│ │ Vectorizer │ │ Scaler │ │ Encoder │ │
│ │ (20K feat) │ │ (MinMax) │ │ (Top 200) │ │
│ └──────┬──────┘ └─────┬──────┘ └──────┬──────┘ │
│ │ weight=1.0 │ weight=0.3 │ w=0.5 │
│ └───────────┬────┴────────────────┘ │
│ ▼ │
│ Sparse Feature Matrix (hstack) │
│ │ │
│ ▼ │
│ Cosine Similarity Search │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ JomaShop Dataset (CSV) │
│ 93,931 products · 1,436 brands │
└─────────────────────────────────────────────────────┘
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 18 | UI component library |
| Build Tool | Vite 6 | Fast dev server & production bundler |
| Routing | React Router DOM 6 | Client-side SPA routing |
| Animation | Framer Motion 11 | Page transitions & micro-interactions |
| Icons | Lucide React | SVG icon library |
| HTTP Client | Axios | API communication |
| Backend | Flask 3.1 | REST API framework |
| CORS | Flask-CORS | Cross-origin request handling |
| ML - Text | TfidfVectorizer (scikit-learn) | Text feature extraction |
| ML - Numeric | MinMaxScaler (scikit-learn) | Price normalization |
| ML - Categorical | LabelEncoder (scikit-learn) | Brand encoding |
| ML - Similarity | cosine_similarity (scikit-learn) | Product matching |
| Data | Pandas, NumPy | Data processing & manipulation |
| Serialization | Joblib | Model persistence |
| Sparse Matrix | SciPy | Efficient feature storage |
| Production Server | Gunicorn | WSGI HTTP server |
The project uses the JomaShop Products Dataset containing 93,931 luxury products across watches, jewelry, accessories, and more.
| Column | Description |
|---|---|
product_type |
SimpleProduct, ConfigurableProduct, GroupedProduct |
name |
Product name |
brandName |
Brand (1,436 unique brands) |
stockStatus |
In stock / Out of stock |
description.short |
Brief product description |
description.complete |
Full product description |
genderLabel |
Target gender |
department |
Product department/category |
pricing.regularPrice.value |
Regular price |
pricing.finalPrice.value |
Sale/final price |
pricing.retailPrice.value |
Retail (MSRP) price |
The engine (backend/recommendation_engine.py) implements a hybrid content-based filtering approach that fuses three modalities:
- Combines
name,description.short, anddescription.completeinto a single text blob - Applies TF-IDF Vectorization with:
- 20,000 max features
- Unigrams and bigrams (
ngram_range=(1,2)) - Sublinear TF scaling (
sublinear_tf=True) - English stop word removal
- Normalizes
finalPriceand computeddiscount_pctusing MinMaxScaler - Discount percentage is derived as:
(retailPrice - finalPrice) / retailPrice × 100 - Ensures price-similar products rank higher
- Encodes the top 200 brands using one-hot encoding
- Remaining brands are grouped as "Other"
- Provides brand affinity in recommendations
All features are combined into a single sparse matrix using scipy.sparse.hstack, with each modality weighted according to its importance. Cosine similarity is then computed between the query vector and all products to rank results.
For query-based recommendations, an additional 0.15 score boost is applied to products matching the requested brand, ensuring brand-relevant results surface higher.
Health check endpoint.
{ "status": "ok" }Paginated product catalog with optional search.
{
"products": [...],
"total": 93931,
"page": 1,
"per_page": 20,
"total_pages": 4697
}Single product by index.
{
"id": 42,
"name": "Omega Speedmaster Professional",
"brandName": "Omega",
"finalPrice": 5350,
"retailPrice": 7150,
"discount_pct": 25.2,
...
}Get recommendations by product ID or text query.
By product ID:
{
"product_id": 42,
"top_n": 10
}By query + brand:
{
"query": "luxury diving watch",
"brand": "Omega",
"top_n": 10
}Response:
{
"recommendations": [
{
"id": 156,
"name": "Omega Seamaster Planet Ocean",
"similarity_score": 0.8743,
...
}
]
}Upload a CSV with query and optional brand columns. Returns JSON or downloadable CSV.
Content-Type: multipart/form-data
file: queries.csv
top_n: 5
format: json|csv
CortexCart/
├── .gitignore
├── build.sh # Render build script
├── render.yaml # Render deployment blueprint
├── main.py # Original ML exploration script
├── sample_batch_queries.csv # Sample CSV for batch testing
│
├── backend/
│ ├── app.py # Flask API server
│ ├── recommendation_engine.py # ML recommendation engine
│ └── requirements.txt # Python dependencies
│
├── frontend/
│ ├── index.html # HTML entry point
│ ├── package.json # Node.js dependencies
│ ├── vite.config.js # Vite configuration (proxy, port)
│ └── src/
│ ├── main.jsx # React entry + BrowserRouter
│ ├── App.jsx # Route definitions
│ ├── styles/
│ │ └── global.css # Global dark theme + CSS variables
│ ├── services/
│ │ └── api.js # Axios API service layer
│ ├── components/
│ │ ├── Navbar.jsx # Navigation bar (glassmorphism)
│ │ ├── Navbar.css
│ │ ├── ProductCard.jsx # Animated product card
│ │ ├── ProductCard.css
│ │ ├── Loader.jsx # Loading spinner animation
│ │ └── Loader.css
│ └── pages/
│ ├── HomePage.jsx # Landing page (hero + stats)
│ ├── HomePage.css
│ ├── CatalogPage.jsx # Product grid + search + pagination
│ ├── CatalogPage.css
│ ├── BatchPage.jsx # CSV drag-and-drop upload
│ ├── BatchPage.css
│ ├── RecommendationsPage.jsx # Similar products display
│ └── RecommendationsPage.css
│
├── Dataset/
│ ├── JomaShop Products Data.csv # Main dataset (93,931 products)
│ └── testdata.csv # Test dataset
│
├── models/ # Pre-trained ML model artifacts
│ ├── Untitled.ipynb # Jupyter notebook (EDA + classification)
│ └── *.pkl # Serialized model files
│
└── BG_image/
└── background.png # Background image asset
- Python 3.11+
- Node.js 18+
- npm or yarn
git clone https://github.com/Premkumar1845/CortexCart.git
cd CortexCartpip install -r backend/requirements.txtcd backend
python app.pyThe engine will take about a minute to build the TF-IDF model on first run. Subsequent starts are faster due to model caching.
Server runs at
http://localhost:5000
cd frontend
npm installnpm run devFrontend runs at
http://localhost:3000with API proxy to Flask
Navigate to http://localhost:3000 in your browser.
A sample file (sample_batch_queries.csv) is included for testing batch recommendations:
query,brand
omega speedmaster,Omega
rolex submariner,Rolex
diamond necklace,
gucci handbag,Gucci
seiko automatic watch,Seiko
ray ban sunglasses,Ray-Ban
mont blanc pen,Montblanc
casio g-shock,Casio
tissot dress watch,Tissot
gold bracelet,Upload this file on the Batch Upload page to test bulk recommendation generation.
This project is for educational and portfolio purposes.