What’s in a Representation?

4 minute read

Date:

As I begin the final stages of my PhD thesis, I was recently surprised by one topic that jumped out as being more relevant than I had originally thought of when I began work on the project.

Broadly, the topic is in the realm of cognitive modelling, specifcially modelling and predicting the behaviour of humans performing a learning and decision making task based on visual information. Traditional approaches to predicting how humans learn and make desicions have done so by abstracting away much of the compelxity of the task presented to humans, and modelling their behaviour as some simple function of the features of the task, such as a soft-max distribution over the utilities associated with the options presented.

For context, when I began my PhD I was interested in two main areas, cognitive modelling and deep learning, which I had hoped to connect together throughout the work that I did during my time in grad school. This was not a wholely new approach to modelling human behaviour in complex tasks. However, the choice of how best to model cognition should not be motivated by individual interests alone. To motivate the use of deep learning approaches in modelling human cognition required a conectinon to biological percpetion, specifically vision.

One commonly cited and long-standing connection between human visual perception and deep learning is between the activation of convolutional neural networks and the preliminary processing of visual information that takes place in biological brains. However, recent research by Higgins et al. has investigated how visual information is represented in highly specialized areas of the brain, such as the primate face area, which processes visual information related to primate faces. Results from analysis of the activation of single neurons in the primate face area revealed a so-called disentangled property, meaning that deviation along a single dimension of input stimuli like chaning the age alone or hair color alone resulted in an altered activity of individual neurons. This is closely connected to the behaviour of beta-Variational Autoencoders which are trained to form low-dimension representations of stimuli that are information-constrained and able to reconstruct the original stimuli as accurately as possible.

The similarity of this highly specialized brain region to the properties of B-VAE models raises an important question of the structure of information as it is being processed by the reinforcement learning faculty of the human brain. Specifically the midbrain of humans has long been analyzed and understood as the centre of RL in biological agents, with dopamine acitivty serving as an alogue to the reward prediction error used in RL to update the predicted values associated with certin stimuli. To properly function, RL must make predictions of reward associated with stimuli, or states of an environment, and update these predictions based on experience.

In Deep RL this is typcially done using a CNN or other deep neural network model that processes the visual information to form a state representation, and predictions of reward are made based on these representations. However, there are many issues with applying Deep RL methods that use CNNs to predicting human behaviour, and much of the research I have done so far in my PhD has been in trying to improve the generalization and robustness used in deep RL methods when predicting human behaviour. For my final thesis project, I will be applying B-VAE models onto predicting learning and decision making in humans based on visual information. The main thrust of this research is in analyzing the representations formed by Deep RL models when they are trained to predicut human behaviour. Hopefully, this analysis will reveal the source of some common biases in human learning and decision making, as well as the good qualities of human learning such as robustness and generalization.

To draw this research back to the title of this blog post, I am interested in investigating what information is contained within cognitive neural representations as that information flows from the visual information processing regions of the brain into the reinfocement learning regions.

To hear more about this project, stay tuned for an update on my PhD thesis towards the end of the year!