... maximized.1
Most RLP formulations maximize some discounted expected reward, with a discount factor $\beta$. In the limit $\beta \to 1$, our formulation is obtained, which we find more intuitive.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.