RL4SE: Reinforcement Learning for AIGAME (LunarLander-v3)

RL4SE stands for Reinforcement Learning for Software Engineering. This project leverages cutting-edge reinforcement learning algorithms to optimize and enhance software engineering processes. By applying the Proximal Policy Optimization (PPO) algorithm to the LunarLander-v3 environment from OpenAI's Gymnasium, RL4SE demonstrates the practical applications of reinforcement learning in complex, real-world scenarios.

📝 Table of Contents

📈 Training Demo
🔍 Description
✨ Features
🚀 Installation
🎮 Usage
- 🔄 Training the PPO Model
- 📹 Viewing the Training Video
📊 Experiment Tracking
- WandB Dashboard
- Key Performance Metrics
⚙️ Configuration
📚 Dependencies
🤝 Contributing
📜 License
🙏 Acknowledgements
📬 Contact

📈 Training Demo

Watch the RL4SE agent successfully land on the lunar surface using the PPO algorithm.

📊 Key Performance Metrics

_{Eval 1: Initial evaluation metric showing baseline performance.}	_{Eval 2: Secondary evaluation metric indicating progress.}	_{Rollout 2: Rollout metrics during training phases.}
_{Train Entropy Loss: Measures the randomness of the policy over time.}	_{Train Learning Rate: Adaptive learning rate schedule during training.}	_{Train Loss: Overall training loss decreasing over epochs.}
_{Train Value Loss: Value loss metric showing accurate value estimation.}	_{Train Approx KL: Approximate KL divergence during training.}	_{Train Clip Fraction: Fraction of policy updates that were clipped.}

🔍 Description

RL4SE is a reinforcement learning project focused on applying the Proximal Policy Optimization (PPO) algorithm to the LunarLander-v3 environment from OpenAI's Gymnasium. This project leverages Stable Baselines3 for model implementation and Weights & Biases (WandB) for experiment tracking and visualization. Additionally, it incorporates Git Large File Storage (LFS) to manage large video recordings of agent performance.

Key Objectives:

Demonstrate the effectiveness of PPO in complex environments.
Track and visualize training metrics using WandB.
Manage large media files efficiently with Git LFS.
Provide a modular and scalable codebase for future enhancements.

✨ Features

Standard PPO Implementation: Utilizes the PPO algorithm from Stable Baselines3 for training agents.
Experiment Tracking: Integrates with WandB to monitor training progress, visualize metrics, and save code snapshots.
Video Recording: Records and displays videos of the trained agent's performance.
Model Saving: Saves trained models for future use and evaluation.
Git LFS Integration: Manages large video files efficiently using Git Large File Storage.
Modular Code Structure: Organized scripts and utilities for maintainability and scalability.
Configuration Flexibility: Easily adjustable hyperparameters and environment settings.
Comprehensive Documentation: Detailed instructions and explanations for ease of use.

🚀 Installation

🔧 Prerequisites

Before you begin, ensure you have met the following requirements:

Operating System: Windows, macOS, or Linux
Python: Version 3.7 or higher
Git: Installed on your system
Git LFS: Installed and configured (Installation Guide)
OpenAI Gymnasium Environment: Installed as part of the dependencies

📥 Clone the Repository

git clone https://github.com/evansnyanney/RL4SE.git
cd RL4SE

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
CS4900GameAI/evansnyanney		CS4900GameAI/evansnyanney
performance_metrics		performance_metrics
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
training-demo.gif		training-demo.gif

_{Eval 1: Initial evaluation metric showing baseline performance.}	_{Eval 2: Secondary evaluation metric indicating progress.}	_{Rollout 2: Rollout metrics during training phases.}
_{Train Entropy Loss: Measures the randomness of the policy over time.}	_{Train Learning Rate: Adaptive learning rate schedule during training.}	_{Train Loss: Overall training loss decreasing over epochs.}
_{Train Value Loss: Value loss metric showing accurate value estimation.}	_{Train Approx KL: Approximate KL divergence during training.}	_{Train Clip Fraction: Fraction of policy updates that were clipped.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL4SE: Reinforcement Learning for AIGAME (LunarLander-v3)

📝 Table of Contents

📈 Training Demo

📊 Key Performance Metrics

🔍 Description

✨ Features

🚀 Installation

🔧 Prerequisites

📥 Clone the Repository

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL4SE: Reinforcement Learning for AIGAME (LunarLander-v3)

📝 Table of Contents

📈 Training Demo

📊 Key Performance Metrics

🔍 Description

✨ Features

🚀 Installation

🔧 Prerequisites

📥 Clone the Repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages