We propose a formal model of efficient reinforcement learning based on
Valiant's Probably Approximately Correct (PAC) learning framework and
describe the first polynomial-time PAC algorithm for the general
reinforcement learning problem. We show that an active and directed
exploration of its environment by the learning agent is necessary and
sufficient to obtain efficient learning for that problem. We also
consider the trade-off between exploration and exploitation in
reinforcement learning algorithms and show how in general a PAC algorithm
can be converted into an efficient on-line algorithm that adequately
balances exploration and exploitation.
Date: Thurs., October 8; Time: 4:15-5:30PM; Place: Gates 104
Return to seminar schedule.