Smart’s paper proposes an idea of having the robot learn how to accomplish a task, rather than being told explicitly how to accomplish a task. This paper introduces a framework for implementing this type of model and describes the experiments which used this framework to accomplish small tasks. This paradigm of reinforced learning is interesting because it seems to extend that behavioral model mentioned in previous papers like Brainenberg’s “Vehicles”. This model directly extends Gat’s architecture described in “On Three-Layer Architectures” because reinforced learning is based upon maintaining state. The three-tier architecture needed to solve the issue of not maintaining any state (where the robot would be dependent upon unreliable sensors for information) and always saving the state (time consuming, computationally expensive and easy to lose sync with the external world). This architecture assumes that the world can be described by a set of states and that the robot has a discrete amount of actions to take within any given state. The actions themselves are weighted depending on how “good” they are so in essence, the higher the weighted score the better the decision.

This system can theoretically work in controlled environments but without prior knowledge of the environment, the robot is forced to try to learn and plan around the external world which is time consuming and error prone due to the changing environment. The paper addresses this issue by recommending seeding the initial learning paths the robot takes by having a human control the robot for a set amount of time. This information is stored and then bootstrapped to the value function stated earlier. This gives the robot the ability to search for reward-giving states instead of trial and error. Apparently, after thirty five training runs, the performance of the learned policy is indistinguishable from the best one of the authors could achieve by directly controlling the robot with a joystick. This is fascinating because it demonstrates that robots can simulate learning with a reward system which is similar to the feedback that humans get as they develop and learn about their surroundings. An interesting point of research in this field would be to have other robots seed the next series of robots so that in essence the previous generation can “teach” the next generation the reward system.

This paper does contribute to the understanding of how a robot can learn but I would prefer another approach than human interaction. The problem with a reward weighted system is that this information is predetermined before hand and doesn’t entirely allow the robot itself to experiment to determine its own reward policies. It would be interesting experiment to implement a three-tier architecture that utilizes the framework of reinforcement learning for the state saving algorithms. The robot would be able to gauge the behaviors which allow it to perform a given task and use those later when given a similar task. This technique is essentially allowing the robot itself to give the weights to the tasks that allow it to succeed.

Reference:

  1. Smart, William D. “Effective Reinforcement Learning for Mobile Robots.” <http://people.csail.mit.edu/lpk/papers/2002/SmartKaelbling-ICRA2002.pdf>