Overview
This project introduces a modular multi-agent system designed to enhance code generation, debugging, and explanation through specialized roles, integrating Reinforcement Learning with Human Feedback (RLHF) and AI Feedback (RLAIF). A Streamlit UI provides interactive user engagement, facilitating structured feedback.
Technical Highlights
The system leverages a hierarchical agent architecture:
- Planner Agent: Routes and classifies user queries.
- Chain-of-Thought (CoT) Agent: Provides structured reasoning and algorithm strategies.
- Developer Agent: Generates executable Python code.
- Debugger Agent: Identifies and corrects code issues.
- Explainer Agent: Offers concise, user-friendly code explanations.

Agent communication is managed via a LangGraph workflow, ensuring efficient state transitions and clear data flow.
Challenges and Solutions
- Agent Coordination: Addressed by implementing a LangGraph-based state machine for efficient inter-agent communication.
- Model Efficiency: Employed knowledge distillation to fine-tune lightweight student models using larger teacher models, balancing computational efficiency and performance.
- Feedback Integration: Implemented PPO-based fine-tuning with both human and AI feedback, enhancing agent performance iteratively.
Technologies and Tools
- Python, PyTorch, Hugging Face Transformers
- LangGraph, Streamlit
- Reinforcement Learning (RLHF, RLAIF), PPO
My Role
Designed and implemented RLHF and RLAIF frameworks, developed specialized reward models, integrated PPO for agent optimization, and managed feedback-driven enhancements.
Key Results
The system demonstrated comparable performance to significantly larger models, notably improving code accuracy, robustness, and explanation clarity. Detailed evaluation, feedback loops, and results are included in the project report(top of the page).