W-learning: Competition among selfish Q-learners

Humphrys, Mark (1995) W-learning: Competition among selfish Q-learners. [Departmental Technical Report]

Full text available as:

[img] Postscript


W-learning is a self-organising action-selection scheme for systems with multiple parallel goals, such as autonomous mobile robots. It uses ideas drawn from the subsumption architecture for mobile robots (Brooks), implementing them with the Q-learning algorithm from reinforcement learning (Watkins). Brooks explores the idea of multiple sensing-and-acting agents within a single robot, more than one of which is capable of controlling the robot on its own if allowed. I introduce a model where the agents are not only autonomous, but are in fact engaged in direct competition with each other for control of the robot. Interesting robots are ones where no agent achieves total victory, but rather the state-space is fragmented among different agents. Having the agents operate by Q-learning proves to be a way to implement this, leading to a local, incremental algorithm (W-learning) to resolve competition. I present a sketch proof that this algorithm converges when the world is a discrete, finite Markov decision process. For each state, competition is resolved with the most likely winner of the state being the agent that is most likely to suffer the most if it does not win. In this way, W-learning can be viewed as `fair' resolution of competition. In the empirical section, I show how W-learning may be used to define spaces of agent-collections whose action selection is learnt rather than hand-designed. This is the kind of solution-space that may be searched with a genetic algorithm.

Item Type:Departmental Technical Report
Keywords:mobile robots, subsumption architecture, action selection, reinforcement learning, Q-learning, multi-module learning, genetic algorithms
Subjects:Biology > Animal Behavior
Biology > Ethology
Computer Science > Artificial Intelligence
Computer Science > Dynamical Systems
Computer Science > Machine Learning
Computer Science > Robotics
ID Code:452
Deposited By: Humphrys, Mark
Deposited On:09 Jun 1998
Last Modified:11 Mar 2011 08:53


Repository Staff Only: item control page