Developing artificial agents able to autonomously discover new goals, to select them and
learn the related skills is an important challenge for robotics. This becomes even crucial if
we want robots to interact with real environments where they have to face many unpredictable
problems and where it is not clear which skills will be the more suitable to solve them. The
ability to learn and store multiple skills in order to use them when required is one of the main
characteristics of biological agents: forming ample repertoires of actions is important to widen
the possibility for an agent to better adapt to different environments and to improve its chance
of survival and reproduction. Moreover, humans and other mammals explore the environment
and learn new skills not only on the basis of reward-related stimuli but also on the basis of
novel or unexpected neutral stimuli. The mechanisms related to this kind of learning processes
have been studied under the heading of “Intrinsic Motivations” (IMs), and in the last decades
the concept of IMs have been used in developmental and autonomous robotics to foster an
artificial curiosity that can improve the autonomy and versatility of artificial agents.
In the research presented in this thesis I focus on the development of open-ended learning
robots able to autonomously discover interesting events in the environment and autonomously
learn the skills necessary to reproduce those events. In particular, this research focuses on
the role that IMs can play in fostering those processes and in improving the autonomy and
versatility of artificial agents. Taking inspiration from recent and past research in this field, I
tackle some of the interesting open challenges related to IMs and to the implementation of
intrinsically motivated robots.
I first focus on the neurophysiology underlying IM learning signals, and in particular on the
relations between IMs and phasic dopamine (DA). With the support of a first computational
model, I propose a new hypothesis that addresses the dispute over the nature and the functions
of phasic DA activations: reconciling two contrasting theories in the literature and taking
xi
into account the different experimental data, I suggest that phasic DA can be considered as a
reinforcement prediction error learning signal determined by both unexpected changes in the
environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic
reinforcements). The results obtained with my computational model support the presented hypothesis,
showing how such a learning signal can serve two important functions: driving both
the discovery and acquisition of novel actions and the maximisation of rewards. Moreover,
those results provide a first example of the power of IMs to guide artificial agents in the cumulative
learning of complex behaviours that would not be learnt simply providing a direct reward
for the final tasks.
In a second work, I move to investigate the issues related to the implementation of IMs signal
in robots. Since the literature still lacks a specific analysis of which is the best IM signal to drive
skill acquisition, I compare in a robotic setup different typologies of IMs, as well as the different
mechanisms used to implement them. The results provide two important contributions: 1) they
show how IM signals based on the competence of the system are able to generate a better
guidance for skill acquisition with respect to the signals based on the knowledge of the agent;
2) they identify a proper mechanism to generate a competence-based IM signal, showing that
the stronger the link between the IM signal and the competence of the system, the better the
performance.
Following the aim of widening the autonomy and the versatility of artificial agents, in a third
work I focus on the improvement of the control architecture of the robot. I build a new 3-level
architecture that allows the system to select the goals to pursue, to search for the best way to
achieve them, and acquire the related skills. I implement this architecture in a simulated iCub
robot and test it in a 3D experimental scenario where the agent has to learn, on the basis of
IMs, a reaching task where it is not clear which arm of the robot is the most suitable to reach the
different targets. The performance of the system is compared to the one of my previous 2-level
architecture system, where tasks and computational resources are associated at design time.
The better performance of the system endowed with the new 3-level architecture highlights the
importance of developing robots with different levels of autonomy, and in particular both the
high-level of goal selection and the low-level of motor control.
Finally, I focus on a crucial issue for autonomous robotics: the development of a system that
is able not only to select its own goals, but also to discover them through the interaction with
the environment. In the last work I present GRAIL, a Goal-discovering Robotic Architecture
for Intrisically-motivated Learning. Building on the insights provided by my previous research,
GRAIL is a 4-level hierarchical architecture that for the first time assembles in unique system
different features necessary for the development of truly autonomous robots. GRAIL is able
to autonomously 1) discover new goals, 2) create and store representations of the events associated
to those goals, 3) select the goal to pursue, 4) select the computational resources to
learn to achieve the desired goal, and 5) self-generate its own learning signals on the basis
of the achievement of the selected goals. I implement GRAIL in a simulated iCub and test it
in three different 3D experimental setup, comparing its performance to my previous systems,
showing its capacity to generate new goals in unknown scenarios, and testing its ability to cope
with stochastic environments. The experiments highlight on the one hand the importance of
an appropriate hierarchical architecture for supporting the development of autonomous robots,
and on the other hand how IMs (together with goals) can play a crucial role in the autonomous
learning of multiple skills.
Date of Award | 2016 |
---|
Original language | English |
---|
Awarding Institution | |
---|
Supervisor | Marco Mirolli (Other Supervisor) |
---|
- Open-ended learning
- Intrinsic motivations
- Reinforcement learning
- Artificial Neural Networks
- Autonomous robotics
Autonomous learning of multiple skills through intrinsic motivations: A study with computational embodied models
Santucci, V. G. (Author). 2016
Student thesis: PhD