Apprenticeship Learning via
Inverse Reinforcement Learning
Pieter Abbeel
Artificial Intelligence
Laboratory
Computer Science Department
Stanford University
In typical decision making and optimal control problems, the goal is to find a policy that performs well under the given dynamics and reward function. In this talk, I will first give a brief introduction on Markov decision processes and then consider learning in this framework when the reward function is not given explicitly, but where instead we can observe an expert demonstrating the task at hand. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. The expert can be seen as trying to maximize a reward function that is expressible as a linear combination of known features, and I will give an algorithm for learning the task demonstrated by the expert. The algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. I will show that the proposed algorithm terminates in a small number of iterations, and that, even though it may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.
This talk presents work done jointly with Andrew Ng.
|
Date: Wednesday, November 3, 2004 |
Time: 4:15-5:30PM |
Place: Gates 104 |
Return to the seminar schedule