Generalized Policy Iteration

Reinforcement Learning 101: Policy Iteration & Value Iteration

Model-based RL: MDPs, value functions, action-value functions, Bellman equations, and contraction mapping with full mathematical proofs of Policy Iteration and Value Iteration.