Introduction to Reinforcement Learning

Welcome to the repository for the Introduction to Reinforcement Learning course at Leiden University (4032IRLRN). This repository contains all assignment reports, each exploring a different aspect of reinforcement learning: from bandit problems and dynamic programming to model-free and model-based learning. To see the full code for each report, you may refer to the GitHub repository by clicking the link below.

View full project on GitHub →

Authors

Adrien Joon-Ha Im
Bence Válint

Assignment 1A – Exploration Strategies in Bandits

Comparative Analysis of Exploration Techniques in Multi-Armed Bandits

In this assignment, we explired the fundamental challenge of the exploration-exploitation trade-off in reinforcement learning through multi-armed bandit problems. We implemented and evaluated three key strategies: ε-Greedy, Optimistic Initialization, and Upper Confidence Bound (UCB). Each method was tested across various parameter settings to assess how effective they each are in maximizing cumulative rewards. Our experiments showed that while ε-Greedy is a simple and effective baseline, both Optimistic Initialization and UCB proved to show superior performance as they both converged faster and yielded higher long-term returns. Furthermore, we highlighted the importance of parameter tuning in optimizing learning efficiency.

Grade: 8.0 / 10.0

Assignment 1B – Dynamic Programming

Solving Markov Decision Processes with Policy and Value Iteration

This assignment focused on solving Markov Decision Processes (MDP) using Dynamic Programming (DP) in the Windy Gridworld environment. We applied and compared two basic DP algorithms: Policy Iteration and Value Iteration. Both methods used knowledge of the full environment in order to derive optimal policies through evaluation and improvement of value functions. Our results showed the strength of DP in small, deterministic environments, and also showed the importance of the discount factor in defining the behavior of the agent. While DP guarantees convergence, we also discussed the limitation of its scalability and limitations of its applications in solving real life problems.

Grade: 8.0 / 10.0

Assignment 2 – Model-Free Reinforcement Learning

Comparative Study of Model-Free Algorithms in Grid-Based RL

In this assignment, we implemented and compared four basic model-free reinforcement learning algorithms: Q-Learning, SARSA, Expected SARSA, and n-step SARSA. Each of these algorithms were studied in a stochastic grid environment called the ShortCut Environment. Our results showed that there are key differences between on-policy and off-policy methods. Q-Learning consistently found the most risky but optimal trajectories, while SARSA and Expected SARSA prioritized safer paths. n-step SARSA provided a flexible middle ground, with its performance depending heavily on the chosen step size. This assignment deepened our understanding trade-offs in exploration, and robustness in dynamic environments.

Grade: 8.9 / 10.0

Assignment 3 – Model-Based Reinforcement Learning