Stochastic Optimization Methods for Policy Evaluation in Reinforcement Learning

Versandkostenfrei!

Nicht lieferbar

This monograph introduces various value-based approaches for solving the policy evaluation problem in the online reinforcement learning (RL) scenario, which aims to learn the value function associated with a specific policy under a single Markov decision process (MDP). Approaches vary depending on whether they are implemented in an on-policy or off-policy manner. In on-policy settings, where the evaluation of the policy is conducted using data generated from the same policy that is being assessed, classical techniques such as TD(0), TD(¿), and their extensions with function approximation or v...