Critical Summary: Maes’s “Learning to Coordinate Behaviors”

Maes’s paper proposes an algorithm for a behavior based robot to learn from positive and negative feedback on when it activates behaviors. The architecture for this is decomposed into task-achieving modules, where the algorithm learns the control of behaviors through experience. This is a different approach from the one posed in Smart’s “Effective Reinforcement Learning for Mobile Robots” which had the robot learn from a reward system for specific states. The algorithm posed by Mae decomposes the complexity of the weighted reward system down to positive or negative feedback. This simplicity makes the decisions more efficient.

This model makes the basic assumptions that the preconditions for the probability of the positive and negative feedback are within the range of 0 to 1. Another assumption is that the feedback itself is immediate and does not involve action sequences. It is also assumed that the robot can actually do the experiments and that the environment does not involve too much risk to the robot. For the sake of analyzing an algorithm, a controlled environment does not preclude the effectiveness nor the application to the real world. These assumptions are reasonable given the conditions in which to test the robot. The experiment with the six legged “Genghis” robot demonstrated how this algorithm can simulate learning by having this robot “learn” to walk forward. This experiment was successful in demonstrating the concept that learning through action in real time and not state can be achieved.

The type of experiment that I proposed in a previous critical summary was to implement a three-tier architecture that utilizes the framework of reinforcement learning for the state saving algorithms. The robot would be able to gauge behaviors which allow it to perform a given task and use those later when given a similar task. The algorithm modeled in this paper essentially accomplishes these goals of first trying to find out what conditions maximize positive feedback and minimize negative feedback and measure how relevant this is to the global task. This approach allows the robot itself to learn the best path to take in order to achieve a goal and the results may be surprising unlike having a robot follow a reward system that was seeded by human intervention. The goal oriented solution from the article demonstrates how a simpler task based behavior can simulate learning as good as a complex knowledge based learning algorithm.

In conclusion, the value in implementing an effective, simple solution for learning can be extremely powerful. The search space for the “Genghis” was 8748 nodes. For an algorithm that has a decision space with “on”, “off” and “don’t care” values, the search space is very large. Compare this with a state based learning system where the scale of complexity is a magnitude larger and doesn’t seem scalable for a real environment where there are many variables to consider. The solution is in the simplicity.

Reference:

Maes, Pattie. “Learning to Coordinate Behaviors.” <http://www.aaai.org/Papers/AAAI/1990/AAAI90-119.pdf>

Patrick Hoey's Blog