开课单位--北欧里尔公司
1 1/1

1
Online Markov Decision Processes under Bandit Feedback[班迪特反馈下的在线马尔可夫决策过程]
  Gergely Neu(北欧里尔公司) We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious advers...
热度:59
1 1/1