题外话:以前做题的时候老是碰到dp,不过队内有两位dp大师,我也不怎么输出dp题,没想到最近看书又碰到了dp。。。DP真是个需要智商的东西,很绝望啊。。。
UPD:自己之前以为值迭代和策略迭代相比会更加不稳定,但是后来问了一下why,被告诉其实两者都会产生震荡(不稳定)。
Policy Evaluation (Prediction)
police evalution(prediction problem):We consider how to compute the state-value function for an arbitrary policy .
For our purposes, iterative solution methods are most suitable. The initial approximation, , is chosen arbitrarily (except that the terminal state, if any, must be given value 0), and each successive
Read full article from Dynamic Programming | 酷狗的小窝
No comments:
Post a Comment