Footnotes

... maximized.¹

Most RLP formulations maximize some discounted expected reward, with a discount factor $\beta$ . In the limit $\beta \to 1$ , our formulation is obtained, which we find more intuitive.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.