Humphrys, Mark (1995) W-learning: Competition among selfish Q-learners. [Departmental Technical Report]
Full text available as:
Postscript
391Kb |
Abstract
W-learning is a self-organising action-selection scheme for systems with multiple parallel goals, such as autonomous mobile robots. It uses ideas drawn from the subsumption architecture for mobile robots (Brooks), implementing them with the Q-learning algorithm from reinforcement learning (Watkins). Brooks explores the idea of multiple sensing-and-acting agents within a single robot, more than one of which is capable of controlling the robot on its own if allowed. I introduce a model where the agents are not only autonomous, but are in fact engaged in direct competition with each other for control of the robot. Interesting robots are ones where no agent achieves total victory, but rather the state-space is fragmented among different agents. Having the agents operate by Q-learning proves to be a way to implement this, leading to a local, incremental algorithm (W-learning) to resolve competition. I present a sketch proof that this algorithm converges when the world is a discrete, finite Markov decision process. For each state, competition is resolved with the most likely winner of the state being the agent that is most likely to suffer the most if it does not win. In this way, W-learning can be viewed as `fair' resolution of competition. In the empirical section, I show how W-learning may be used to define spaces of agent-collections whose action selection is learnt rather than hand-designed. This is the kind of solution-space that may be searched with a genetic algorithm.
Item Type: | Departmental Technical Report |
---|---|
Keywords: | mobile robots, subsumption architecture, action selection, reinforcement learning, Q-learning, multi-module learning, genetic algorithms |
Subjects: | Biology > Animal Behavior Biology > Ethology Computer Science > Artificial Intelligence Computer Science > Dynamical Systems Computer Science > Machine Learning Computer Science > Robotics |
ID Code: | 452 |
Deposited By: | Humphrys, Mark |
Deposited On: | 09 Jun 1998 |
Last Modified: | 11 Mar 2011 08:53 |
Metadata
- ASCII Citation
- Atom
- BibTeX
- Dublin Core
- EP3 XML
- EPrints Application Profile (experimental)
- EndNote
- HTML Citation
- ID Plus Text Citation
- JSON
- METS
- MODS
- MPEG-21 DIDL
- OpenURL ContextObject
- OpenURL ContextObject in Span
- RDF+N-Triples
- RDF+N3
- RDF+XML
- Refer
- Reference Manager
- Search Data Dump
- Simple Metadata
- YAML
Repository Staff Only: item control page