Kaelbling’s paper is a survey covering the different algorithms and approaches to solving the issues with reinforcement learning. The survey states that there are two main strategies for solving these problems: The first is to search in the space of behaviors for the most appropriate one for the particular environment and the second is to use statistical inference and dynamic programming methods to estimate the effectiveness of certain actions in a particular state in the world. The survey itself covers the second type of solutions. What is interesting about the survey is that the different approaches to the issues with reinforcement learning cover topics in psychology as well as decision theory in mathematics.

The survey covers numerous types of solutions but the Q-learning piece is the one that most relates to Smart’s paper in which he discussed the Q-learning algorithm with the optimal value function. The issues that arise with this approach are involved in generalizing over large state and/or action spaces. It also may potentially converge quite slowly to a good policy. These were the issues Smart tried to address by seeding the control policy and bootstrapping the results to the value function approximation. This is a good approach to solving the issue with the trial and error approach inherent in the interactions of an autonomous agent in a dynamic environment. The other interesting approach to solving reinforcement learning problems is by looking at video game based artificial intelligence and understanding how they make decisions in those worlds.

This would be key to understanding more efficient approaches to dealing with dynamic worlds because video games are about interacting with a constantly changing environment and adapting to the changes. There are other problems (separate from a dynamic environment) that need to be solved, such as the model problem that is common for all reinforcement learning solutions: The k-armed bandit problem. This problem crosses the boundaries touching upon not only computer science but statistics and applied mathematics. This problem could be a good metric for all algorithms but the problem is that each algorithm solves a different problem. It seems that there would need to be an analysis on the space of reinforcement learning problems and try to reduce the problems each algorithm is trying to solve to the core issue. From reading this paper, there seems to be a lot of research in a lot of different niche areas but there doesn’t seem to be one general problem that these solutions are trying to address, therefore it is difficult to measure the effectiveness of any of these solutions. It is true that they are purely academic but it would be invaluable to reduce redundant research in the field.

In conclusion, this paper attempts to show a general overview of varying techniques and solutions to solving reinforcement learning problems. There is no one best solution to these problems but there are a lot of good ones. In the end, studying the artificial intelligence behind non-playing characters in video games could help shed some light on solutions to decision problems with reinforcement learning.

Reference:

  1. Kaelbling, Leslie Pack. “Reinforcement Learning: A Survey.” <http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a.pdf>