Overview
This project developed an autonomous parking system using the Options-Critic reinforcement learning framework. A realistic Unity-based parking environment was borrowed and modified, incorporating dynamic obstacle arrangements and ray-perception sensor simulations for precise spatial awareness. The hierarchical reinforcement learning (HRL) architecture allowed the agent to execute complex parking tasks by dividing them into high-level options and low-level maneuvers.
Technical Highlights
Implemented the Options-Critic architecture featuring:
- Policy Over Options: High-level decision-making based on simplified states.
- Intra-Option Policies: Low-level actions controlling torque and steering.
- Termination Policies: Determined when to switch tasks efficiently.
- State-Option Critic Network: Evaluated state-option value estimates to guide learning.
- State-Option-Action Critic Network: Evaluated state-option-action value estimates to guide learning.
Targeted hierarchical reward signals improved stability and performance by effectively managing exploration, obstacle avoidance, and accurate parking maneuvers.
Challenges and Solutions
- Exploding Gradients: Managed by assigning separate reward signals to hierarchical components, stabilizing training.
- Reward Balancing: Carefully structured rewards prevented biases and ensured effective policy learning.
Technologies and Tools
- Unity, Unity ML-Agents
- Python, C#, PyTorch
My Role
I developed the Options-Critic RL architecture, designed hierarchical rewards, and managed training and simulations.
Key Results
Due to computational constraints, I was able to run the simulations only for a limited number of episodes. I have plotted graphs for the rewards, value functions, and loss, all included in the detailed report. For comprehensive understanding, please download the full report(top of the page).