%A Mark Humphrys %T Action Selection in a hypothetical house robot: Using those RL numbers %X Reinforcement Learning (RL) methods, in contrast to many forms of machine learning, build up value functions for actions. That is, an agent not only knows `what' it wants to do, it also knows `how much' it wants to do it. Traditionally, the latter are used to produce the former and are then ignored, since the agent is assumed to act alone. But the latter numbers contain useful information - they tell us how much the agent will suffer if its action is not executed (perhaps not much). They tell us which actions the agent can compromise on and which it cannot. It is clear that many interesting systems possess multiple parallel and conflicting goals, all demanding attention, and none of which can be fully satisfied expect at the expense of others. Animals are the prime example of such systems. In [Humphrys, 1995], I introduced the W-learning algorithms, showing one method of resolving competition among behaviors automatically by reference to their RL values. The scheme has the unusal feature that behaviors are at all times in selfish pursuit of their own goals and have no explicit concept of cooperation, despite residing in the same body. In this paper, I apply W-learning to the world of a hypothetical house robot, which doubles as family toy, movile security camera, mobile smoke alarm and occasional vacuum cleaner. I show how a W-learning community of behaviors inside the robot will support a robust behavior pattern, capabable of opportunistic behavior, avoiding dithering, and allowing for the concept of default behavior and expression of low-priority goals. %K reactive systems, action selection, reinforcement learning, multi-behavior learning %P 216-222 %E Peter G. Anderson %E Kevin Warwick %D 1996 %I ICSC Academic Press %L cogprints448