Multi-agent Learning

Multi-agent Reinforcement Learning

Abstract
We address the problem of how autonomous agents that sense and act in their environment can learn to choose optimal actions to achieve their goals. We will use the Q-learning algorithm, which learns optimal control strategies from delayed rewards, even when agents have no prior knowledge of the effects of their actions on the environment. We will experiment with two different strategies for choosing actions, an ε-greedy strategy that randomly chooses actions and an ε-greedy strategy that chooses actions weighted by their estimated value.

Joint Policy

The above graph, random choice, converges slower than the below graph, probabilistic choice, which uses a Boltzmann-like annealing.  Code was written in Python.  Download the multi-agent learning project report.

Joint Policy Probabilistic Choice

This was a joint project with Davide Modolo.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>