AAAI 2021

2 minute read

Date:

This is the second post in a series of retrospectives on previous work I have done that shaped my PhD and are related to my future research goals. If you would like to read the paper you can find it on my Researchgate.

Although this paper is relatively recent, I thought I would use it as my second post because it is closely realted to a project that I began soon after beginning my AI Researcher position with IBM in early 2019. I actually began this project before I presented the previous project at RLDM 2019. This earlier project, which eventually became an arxiv paper, was influential in my interest in robotics, machine learning, and applications of reinforcement learning.

After this earlier project applying reinforcement learning to a robotics simulation environment, I extended the general idea into the project and paper that goes along with it “Capacity-Limited Decentralized Actor-Critic for Multi-Agent Games”. This was a fortunate paper as it us to have a published work on mutual information regularized reinforcement learning. This was the main idea around the earlier metod applied to robotics. Essentially the main idea is that we want to make the behaviour learned by robots as informationally simplistic as possible. This is inspired by human behaviour which we believe strives to be similarly informationally simplistic where possible.

This main idea was applied into a set of complex multi-agent environments with agents using a continuous control paradigm, as well as a degree of communication in some environments. We showed in our experimentation that applying a penalty to agent behavioural complexity at the learning stage can improve learning speed as well as the eventual highest performance of the agent. Compared to agents without this complexity constraint, our agents learned more generalizable behaviour that was simultaneously less informationally complex.

This was a great result for our ideas of human-inspired reinforcement leanring, and I will probably make a seperate blog post in the future explaining those ideas in the future. Since this project, our ideas of human-inspired RL have slightly shifted, but this was an important first step. Recently in our work applying lessons from how humans learn complex tasks onto reinforcement learning we have focused this ‘information constraint’ view in different ways, but in spirit this is the same concept as these earlier papers.