Mukesh Javvaji - Portfolio

Developed a Heirarchial Reinforcement Learning based system for autonomous vehicle parking in dynamic environment.

Overview

This project developed an autonomous parking system using the Options-Critic reinforcement learning framework. A realistic Unity-based parking environment was borrowed and modified, incorporating dynamic obstacle arrangements and ray-perception sensor simulations for precise spatial awareness. The hierarchical reinforcement learning (HRL) architecture allowed the agent to execute complex parking tasks by dividing them into high-level options and low-level maneuvers.

Technical Highlights

Implemented the Options-Critic architecture featuring:

Policy Over Options: High-level decision-making based on simplified states.
Intra-Option Policies: Low-level actions controlling torque and steering.
Termination Policies: Determined when to switch tasks efficiently.
State-Option Critic Network: Evaluated state-option value estimates to guide learning.
State-Option-Action Critic Network: Evaluated state-option-action value estimates to guide learning.

Targeted hierarchical reward signals improved stability and performance by effectively managing exploration, obstacle avoidance, and accurate parking maneuvers.

Challenges and Solutions

Exploding Gradients: Managed by assigning separate reward signals to hierarchical components, stabilizing training.
Reward Balancing: Carefully structured rewards prevented biases and ensured effective policy learning.

Technologies and Tools

Unity, Unity ML-Agents
Python, C#, PyTorch

My Role

I developed the Options-Critic RL architecture, designed hierarchical rewards, and managed training and simulations.

Key Results

Due to computational constraints, I was able to run the simulations only for a limited number of episodes. I have plotted graphs for the rewards, value functions, and loss, all included in the detailed report. For comprehensive understanding, please download the full report(top of the page).