An evolved spiking neural controller for alternate object pushing by a simulated embodied agent
Răzvan V. Florian
Genuine, creative artificial intelligence can emerge only in embodied agents, capable of cognitive development and learning by interacting with their environment. Before the start of the learning process, the agents need to have some innate (predefined) drives or reflexes that can induce the exploration of the environment.
In the experiment presented here, a basic reflex is evolved for a simple simulated agent, controlled by a spiking neural network. This reflex will be used in future work to bootstrap the ontogenetic cognitive development of the agent.
The agent and its environment are simulated using Thyrix, a simulator that provides a two-dimensional environment with simplified, quasi-static (Aristotelian) mechanics, and supports collision detection and resolution between the objects in the environment. The agent's morphology was chosen as the simplest one which would allow the agent to push the circular objects in its environment without the slipping of the objects on the surface of the agent. Maximum simplicity was desired in order to reduce the computational burden of evolution, and to simplify the analysis of the evolved behavior. The agent is composed of two circles, connected by a variable length link. The agent can apply forces to each of its two body circles. Two effectors correspond to each of the two body circles, one commanding a forward-pushing force, and one commanding a backward-pushing force. A fifth effector commands the length of the virtual link connecting the two body circles. The agent has 16 contact sensors equally distributed on the surface of its two body circles, 14 visual sensors forming an eye on each circle, and 5 proprioceptive sensors. The environment consisted of one agent and 6 circles ("balls") that the agent can move around. The spatial extension of the environment was not limited.
The task of the agent was to move alternatively each of the balls in its environment, on a distance as long as possible, in limited time. More specifically, the fitness of each agent was computed as the sum of the distances on which each ball was moved, but with a constant threshold for each ball. Thus, the agent had to move all balls to achieve maximum fitness, instead of just detecting one ball and pushing it indefinitely.
The controller of the agent consisted of a recurrent spiking neural network. We have used this type of network as it seems to be the most suited for the control of embodied agents, among the classes of neural networks amenable to large scale computer simulation. During some experiments, the neural network featured spike-time dependent plasticity (STDP) with directional damping for the synapse efficacies; in other experiments, the synapses were static. The network was totally connected, having 70 input neurons, 55 integrate-and-fire hidden and motor neurons and 6875 synapses. The genome directly encoded the static synaptic efficacies or the maximum efficacies of the plastic synapses.
Networks with both static and plastic synapses successfully evolved to solve the required task. The performance of the best individuals was extremely close to the optimum possible fitness. Plastic networks evolved much faster, in terms of generations, than networks with static synapses.
The evolved agents seek the closest ball in the environment, and then start pushing it on a curved trajectory until they detect another ball in their front. They release afterwards the current pushed ball, go to the ball in front of them and start pushing it.
This evolved push-release-seek reflex will be used in future experiments to bootstrap more complex behaviors, such as arranging the balls in a particular pattern, sorting the balls by size, or categorizing different kinds of objects.