Summer 2022 - Reflective Report

Here is my Summer 2022 Reflective Report. Taking the time to reflect, distill and write out my summer experience was fun:)
Summer 2022 - Reflective Report

What was the overall structure and methodology of your research project?
It is easiest to speak about the structure of my research project through its 3 distinct phase:
literature review, environment construction, implementation of baselines - it should be
mentioned that there should be another two distinct phases here that we unfortunately didn’t
have time to get to, namely experiment implementation, and iteration upon results.
The first phase of my research project was a literature review where we dug deeper into the
papers surrounding Cooperative AI and Multi Agent Reinforcement Learning. I came into the
Laidlaw program off the back of another research project, from which I had got the idea for
the Laidlaw research project; as a result, this literature review was primarily for the purpose
of understanding the papers that inspired my research project on a deeper level. This
primarily meant emailing and asking questions of the main authors on such papers. This
experience proved to be incredibly valuable. So much gets stripped away from a paper, as
under the current paradigms and incentives for getting published, some really useful info,
like “what didn’t work” doesn’t always get published... or more simply, sometimes there just
isn’t room on the paper !
The second phase of my research project was environment construction. Unfortunately
DeepMind doesn’t open source the bulk of their code; in which, the environments I hoped to
do my experiments in. Thankfully there is an open source version available, however it had
not been maintained and there were breaking changes in the library, so as a result, I had to
reimplement the environments where we would do our experiments. This proved to be a very
time consuming but valuable experience. Time consuming because it was super hard, and I
had to learn to use many different frameworks. Incredibly valuable as I now understand the
codebase of our environments much deeper than I would have otherwise... I know what
makes them work, and I know what makes them break!
The final phase of our research project was the implementation of baselines. Once we had
our environments reimplemented, we now had to match the behaviour and performance of
the plain vanilla baseline agents from the DeepMind papers - i.e. we had to make the most
simple version of our game with multiple agents work in our environments. Which we did!
Unfortunately, this is where I must stop for now, as we didn’t get any further in our project
(however I am still working on it now, just with less time). Reconstruction of the environments
proved to be a very hard and time consuming task that really pushed us back.
Now getting back to the methodology of our research project; it is an empirical one.

How does your finished project compare to the original proposal and objectives?
Interestingly, despite not having finished our original project, while still not adjusting our
vision with it, the scope of our project has expanded since the initial vision. We now want to
introduce adversarial agents into our system down the line, as a means of further stress
testing the cooperativeness of our agents.

While I still intend and am working on building and finishing the original project proposal, I
now also plan to extend our project to also look at inverse reinforcement learning and
adversarial attacks. These extensions came from conversations with my mentors.

What is the next step for this research in your view?
There are a few things to speak about here. First of all, the main crux of our original research
project, which I was only enlightened to through conversations with my mentors. Secondly,
the extensions and expansion of our research project, that I mentioned above.
Our research project was to develop a sort of empathy mechanism for AI agents through the
use of counterfactuals. However, there are some hidden assumptions here in being able to
use counterfactuals to assess how we affect the wellbeing/ reward function of other agents...
we are assuming that we have access to the value functions of other agents (i.e. that we can
access the inner workings of other agents -> that we can directly observe the utility gained
by other agents, or to put it in anthropomorphic terms, that we can directly see how the other
agents are feeling. When we talk about empathy in human discourse we usually mean two
1. Acting to increase the values/ preferences of others
2. Understanding the values/ preferences of others
Whilst our current project addresses 1. quite clearly, we are not addressing point 2! Future
research would involve developing an algorithm or mechanism that can enable AI agents to
address both aspects of empathy as we have laid out here.
One potential path for future research here would be to merge Inverse Reinforcement
Learning (IRL) with our reward influence mechanism. IRL is concerned with learning the
reward function/ objectives of an agent from observations. An agent’s reward function, what
it's optimising for, is the most precise description or encapsulation of an agent’s objectives.
Humans' reward functions or objectives are however diverse and complex; even in more
trivial scenarios with AI agents, IRL often faces an underspecification problem where we just
don’t have enough information from observations. And so this is a very hard problem.
However this is certainly worth further research.

What were the significant achievements and challenges of this research project?
The most significant achievement of this research project was simply to get the
environments of our research project up and running; especially so considering that I have
little to no experience with Reinforcement Learning environments prior to this. We managed
to reproduce complex multi-agent environments published by teams at DeepMind,
refactoring and fixing up an old existing implementation that had been deprecated with
breaking changes to the dependent libraries.
The most significant challenges of this research project was probably two fold, the first one
being again to get the environments of our research project up and running, matching public baselines! The other main challenge of this project involved reaching out to the core
researchers who had already done and published seminal work in this area, to better
understand the flaws of their approaches (that maybe didn’t get clearly published) and other
nuances with environmental setup... or rather just realising that I could do this! I would
spend hours reading through the same paper, and then work up the courage to just articulate
my questions and thoughts on a given paper to send to the author(s)! This proved to be
really very valuable! There was so much more that I learned that you just couldn’t find
through the paper itself, such as techniques that weren’t explained explicitly, and other
results that didn’t work out!

What did you learn about yourself as a researcher?
There were three main things that I learnt about myself as a researcher:

1. You can learn just about anything (and naivety is a sort of saviour)
While I had quite a bit of experience in computer vision and AI before diving into this
research project, I had no practical experience doing Reinforcement Learning - even more
so not doing Multi Agent Reinforcement Learning. I, naively, expected the learning curve to
not be so steep... it was. Yet, I managed!
I had to cover game theory, game design, new machine learning frameworks, and more. And
each time, I would generally only realise that I would need to learn something more, as I
came across. Although I had a plan initially as to how I foresaw the project roll out, that
quickly had to be refactored, and refactored, and refactored.
If I had had perfect vision, and foreseen all that I would have to learn before I could even get
stuck into the research, I think I would have likely chosen to focus on a different project, an
easier or more swiftly tractable one. However, I don’t think I would have learned so much,
nor have been able to reap the rewards of pursuing and working on a larger problem, and
hopefully making more useful progress.
Naivety can be a good thing - and don’t be intimidated by learning.

2. I really value curiosity and care about knowledge for the sake of knowledge (I enjoy
research!); I updated positively on doing a PhD
Alexander Pope once said, “A little learning is a dangerous thing; drink deep, or taste not the
Pierian spring”... I can't remember where I first read this, however it has stuck with me for
quite a while, and I feel like the older I get, the more I understand and appreciate it.
I really enjoy learning about topics that I am interested in; and I particularly enjoy going really
deep into any one topic, bingeing through all the papers on such a topic, reading forums and
seeking out more interesting links. I just find it so satisfying and fulfilling.
One of the main reasons I chose to take part in Laidlaw was to understand better whether I
should pursue a PhD or graduate study/ research in general. I have updated my probability
in wanting to follow such a path in a positive direction!

3. I really care about impact.
This final thing is something that I have found quite hard to articulate, but if I was to distil it
down into its simplest terms, it's that I really care about impact. Towards the later end of the
research programme, I had a lot of self doubt about the impact that my work doing research
was actually making or having on the world... “will it actually be useful and helpful?”, “will this
ever see the light of day?”, “is this the most effective and useful work that I could be doing
right now?”. I still don’t have any good answers to these questions, however I have found a
way past them, now viewing, for better or worse, the research project not as a means to an
end in itself. I like to view this research project as a facilitator of learning, of both research,
management and leadership skills, that will help me towards doing better work going forward
in how I spend my time. And in this (my doubts) I saw a lot of what I really care about, which
is using my time and effort to be maximally useful and have an impact - maximise the area
between how much we can help people by how many people we help.

What did you learn about yourself as a leader, and your perspective on leadership,
during the summer?
The summer research project helped reinforce some things I already knew about myself with
regards to leadership. Namely, that I can work well independently and take initiative to just
do things (I really value agency); I can communicate well, however this is something that will
need consistent practice and attention; one of my main strengths really is just have a strong
sort of endurance.
On this last point, I am reminded of a quote from Ernest Shackleton, “by endurance we
conquer''. In Ernest’s case, he was talking about exploring where no man had been before,
and in the harshest of conditions, however it has really struck me as a great piece of advice
for both research and working on startups. Its obvious that you need a good idea and good
execution, however, what's less clear is that you need great endurance. Whatever you are
working on, if it's a hard enough problem, things will go wrong, sometimes everything will go
wrong; more often than not, these are the most important times, times when a startup
competitor might not push through to reach that deal, or where the other researchers
working on a similar or same problem decide to move on. By endurance, we conquer.
I learnt a lot from my mentors, especially with regards to leadership. The first thing that stood
out in this regard was just the edge up that comes from really just being smart and have very
strong domain knowledge and expertise. Problems that I might have spent hours on, they
were able to point me in the right direction towards a much faster solution, or realisation that
I should just leave it and move on; I found this latter part really quite important, not just for
research, but really any endeavour in life... knowing when it's the right time to stop and
move on, and coming up with a strategy for acquiring the necessary information as fast and
painlessly as possible.
Perhaps the biggest takeaway on leadership from my mentors was actually simply the power
of mentorship! You can have such a broader and wider reach when you mentor people, and
help others to do their own best work. There’s only so much one person can accomplish;
and while working in teams or groups together is another story altogether, simply offering

someone a reasonably quick discussion once a week, can really help others on their way to
doing great work. My mentors supervised many PhD students and relayed to me the
satisfaction they got from being able to help these mentees - it's both fulfilling and impactful!
I’m going to take more opportunities to help mentor others going forward, and really make a
conscious effort to do such!

How did your experience compare and contrast to the goals you set out in your PDP?
One of my largest goals that I was planning to work on this summer was my public speaking
skills. Public speaking is something that I have struggled with quite a bit in the past, but now
post covid and the Zoom era, it was the perfect opportunity to really focus my attention on
improving this crucial skill... primarily through exposure and practice. Over the summer, I
had the opportunity to speak in front of two reasonably sized audiences of 10-30 people
(large for me at least!). Although there is definitely still some way to go in actually improving
my public speaking skill, right now I am a lot more comfortable and confident in simply being
able to go on a stage and just speak coherent words - this is great!
Beyond this, I am very happy with my other PDP goals, which primarily centred around
improving my personal research and management skills both of which have visually
improved. I feel more confident in being able to go at more technical research problems; not
just from a technical standpoint, but in being able to reach out to other researchers/ mentors
for advice or guidance! The high workload of the summer demanded strong personal
management skills; I am now competently managing my work using a suite of software tools
found through trial and error, whittling down my personal management system and workflow
to the simplest and most efficient system (this will also translate well into working in teams!).

Please sign in

If you are a registered user on Laidlaw Scholars Network, please sign in