Intelligent Video Memory Extraction for Surveillance & Monitoring
Extract the most important moments from video footage automatically. MemoryMap uses motion detection, object recognition, and AI-powered analysis to identify and summarize key scenesβperfect for security footage, time-lapse analysis, and video highlights.
- π― Smart Motion Detection - Adaptive motion-based event segmentation using K-sigma thresholds
- π¬ Scene Segmentation - Automatic detection and isolation of distinct scenes
- π Object Recognition - YOLOv8-powered object detection (persons, vehicles, etc.)
- πΎ Memory Selection - Intelligent importance scoring to select top moments
- π Auto-Explanations - Natural language descriptions of why each moment matters
- π Timeline Generation - Visual and JSON output with memory metadata
- β‘ CCTV-Optimized - Designed for static surveillance camera footage
# Clone repository
git clone <repository-url>
cd memorymap
# Install dependencies
pip install -r requirements.txtpython main.py input_video.mp4 output_folder/This will:
- Extract frames from your video
- Detect motion events
- Analyze objects and context
- Generate memory timeline
- Save results to
output_folder/
from pipeline import MemoryMapPipeline
pipeline = MemoryMapPipeline("video.mp4", "output/")
memories = pipeline.run(
sample_interval=1.0, # Frame sampling interval (seconds)
keep_ratio=0.2, # Keep top 20% of scenes
adaptive_k=2.5 # Motion sensitivity (higher = stricter)
)| Parameter | Default | Description |
|---|---|---|
sample_interval |
1.0 | Seconds between sampled frames (lower = more frames) |
keep_ratio |
0.2 | Fraction of scenes to save as memories (0.0-1.0) |
adaptive_k |
2.5 | Motion detection sensitivity (Ο multiplier). Higher = fewer events detected |
Input Video
β
1οΈβ£ Video Loading β Extract metadata (resolution, FPS, duration)
β
2οΈβ£ Frame Sampling β Sample frames at regular intervals
β
3οΈβ£ Motion Detection β Detect motion bursts as events
β
4οΈβ£ Representative Frames β Select key frame for each scene
β
5οΈβ£ Emotion Analysis β Calculate visual intensity scores
β
6οΈβ£ Object Detection β Identify persons, vehicles, etc. (YOLO)
β
7οΈβ£ Semantic Analysis β Classify event type (activity level)
β
8οΈβ£ Importance Scoring β Calculate importance score for each scene
β
9οΈβ£ Memory Selection β Select top-K memories by importance
β
π Timeline Generation β Generate JSON, images, and report
β
Output Files
memorymap/
βββ main.py # Entry point
βββ pipeline.py # Main orchestration
βββ requirements.txt # Dependencies
βββ modules/
β βββ data_structures.py # Scene, Frame dataclasses
β βββ video_ingestion.py # Video loading & metadata
β βββ frame_sampling.py # Frame extraction
β βββ motion_event_segmentation.py # Motion-based event detection
β βββ motion_analysis.py # Motion intensity calculation
β βββ representative_frames.py # Key frame selection
β βββ emotion_analysis.py # Visual intensity scoring
β βββ object_context.py # YOLO object detection
β βββ semantic_analyzer.py # Event classification
β βββ importance_scoring.py # Memory importance calculation
β βββ memory_selection.py # Top-K memory selection
β βββ explanation_generator.py # Natural language generation
β βββ memory_timeline.py # Output generation
β βββ utils.py # Helper functions
β βββ scene_segmentation_dl.py # (Optional) PySceneDetect
βββ memory_output/ # Default output directory
βββ timeline.json # Memory metadata
βββ memory_report.txt # Text summary
βββ memory_*.jpg # Representative images
{
"total_memories": 5,
"memories": [
{
"index": 0,
"timestamp": "00:23",
"seconds": 23.45,
"importance": 0.856,
"explanation": "This 4.2s moment is important because significant motion or activity detected and a new object appeared in the scene.",
"image": "memory_00.jpg"
}
]
}Text summary of all memories with timestamps, importance scores, and explanations.
Representative images from each important moment.
For more memories (keep more scenes):
keep_ratio=0.5 # Keep top 50% instead of 20%For stricter motion detection:
adaptive_k=3.5 # Only detect very obvious motionFor more granular frame sampling:
sample_interval=0.5 # Sample every 0.5s instead of 1.0s- Converts frames to grayscale and computes frame-to-frame differences
- Maintains adaptive baseline of motion history
- Detects motion "spikes" above mean + kΓΟ threshold
- Groups consecutive motion frames into events
importance = 0.40 Γ motion_score
+ 0.30 Γ object_change
+ 0.20 Γ duration_score
+ 0.10 Γ suddenness_score
Γ semantic_multiplier
Semantic Multipliers:
- Idle scene: 0.2Γ (less important)
- Minor activity: 0.6Γ
- Significant activity: 1.1Γ (more important)
- Critical activity: 1.3Γ (highest priority)
Combines:
- Contrast (60%): Standard deviation of grayscale values
- Edge Density (40%): Amount of edges detected (indicates structure)
- Python: 3.8+
- RAM: 4GB minimum (8GB+ recommended)
- GPU: Optional (YOLO inference will be slower on CPU)
- OS: Linux, macOS, Windows
opencv-python>=4.8.0
numpy>=1.21.0
PyAV>=10.0.0
ultralytics>=8.0.0 # YOLOv8
scenedetect>=0.6.1 # Optional
pip install -r requirements.txtpip install opencv-python numpy PyAV ultralytics scenedetect# For CUDA GPU acceleration
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118- Security Footage Analysis - Highlight important events in surveillance videos
- Time-Lapse Summarization - Extract key moments from long recordings
- Construction Monitoring - Track project progress and identify issues
- Wildlife Monitoring - Detect and extract animal activity
- Traffic Analysis - Identify traffic incidents and congestion
- Event Recording - Automatically create highlight reels
Solution: Lower adaptive_k value (try 2.0 instead of 2.5)
Solution: Install ultralytics: pip install ultralytics
Solution: Increase sample_interval (e.g., 2.0 instead of 1.0)
Solution: Try converting video with FFmpeg first:
ffmpeg -i input.mp4 -c:v libx264 -c:a aac output.mp4-
Reduce frame sampling for faster processing:
sample_interval=2.0 # Every 2 seconds instead of 1
-
Use lower resolution video:
ffmpeg -i input.mp4 -vf scale=640:480 output.mp4
-
Process only specific duration:
- Edit
video_ingestion.pyto limit duration
- Edit
-
Disable YOLO if objects not needed:
- Comment out object analysis in
pipeline.py
- Comment out object analysis in
Contributions welcome! Areas for improvement:
- Add multi-object tracking
- Implement face detection & recognition
- Add audio analysis
- Create web UI for visualization
- Add parallel processing
- Improve motion detection robustness
MIT License - See LICENSE file for details
- Avijit Roy
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check troubleshooting section above
- Review pipeline logs for detailed errors
- β Core motion detection pipeline
- β Object recognition (YOLOv8)
- β Importance scoring
- β JSON & image output
- β Emotion analysis integration
- π§ Multi-object tracking
- π§ Web UI dashboard
- π§ Audio analysis
- π§ Parallel processing
- π§ Face detection
- Pipeline Architecture - See
pipeline.py - Module Documentation - See docstrings in each module
- Data Structures - See
data_structures.py - Configuration - See parameter tables above
Built with: