Training Agents to Navigate a Virtual Environment Using the NEAT Algorithm

Agents in this research are equipped with sensors, which provide them with the ability to sense the unknown environment. The behavior of an agent is controlled by a neural network, which reads its onboard sensors and takes the appropriate actions to explore the environment. The learning ability allows agents to optimize their sensors and improve their actions of avoiding obstacles and navigating more quickly. The functionality of these sensors, along with the learning procedure implemented to optimize them, is discussed in this article.

Watch the Demo of the AI program that Learns to Avoid Obstacles

Learning Ability of Agent

The agents developed in this research are trained to survey the environment to unveil the unknown regions and find the location of resources. The environment in which agents are placed is unknown to them. The agents are equipped with the learning capability to optimize their sensors. As these agents move around the simulated environment and encounter obstacles, eventually, they learn to react more quickly and accurately in order to avoid collisions with obstacles and explore more efficiently. As a result of the learning process, agents get more evolved and are able to survey the environment quickly.

In this research, the real-time NEAT (Neuro-Evolution of Augmenting Topologies) algorithm is used to train the agents and make their behaviour evolve, and navigate efficiently in the unknown environment. This article explains the NEAT and real-time NEAT algorithms in details. The structure of the Neural Network and the techniques used for evaluating the fitness of agents were inspired by the book of “AI Techniques for Game Playing” written by Buckland [1]. The following sections further discuss the structure of the Neural Network, its inputs and outputs, and how the real-time NEAT algorithm is implemented to endow agents with learning capabilities.

The Neural Network Structure

Each agent is controlled by a Neural Network, which acts as its brain. The Neural Network controls the actions of the agent based on the inputs it receives from the environment using onboard sensors. The figure below demonstrates the structure of a simple Neural Network used as the brain of an agent. The inputs of a Neural Network are linked to the sensors and are fed with the sensor readings. The outputs of a Neural Network are connected to the wheels of the agent and determine the force applied to each wheel, which then is used to determine the direction and speed of movement.

Other than the sensors, two more inputs are fed into the Neural Network, which include a bias input and an input representing the collision flag. The collision flag indicates if the agent has collided with an obstacle. It is important to standardize the inputs into the Neural Network in order to have the same value of effect on the results. The sensor readings and other inputs of the Neural Network are scaled to fit between -1 and 1. At every update cycle, an agent reads its sensors and feeds the data into the Neural Network to determine the outputs and the actions to take.

The Neural Network structure generated for controlling an agent

Inputs of the Neural Network

Neural Network inputs receive data from the environment using the sensors of the agent. Agents are equipped with two types of sensors, which I refer to as object detection and presence detection sensors. Object detection sensors enable agents to avoid collisions with obstacles. Presence detection sensors allow agents to sense if they have visited an area on the map before. These sensors are line segments with specific lengths spreading out from a central point of the agent. The object detectors are labelled d1 to d3, and the presence detectors are labelled f1 to f3. Each agent has three instances of each type of sensor placed at its front and at its left and right sides at a 45-degree angle to the front as shown in the figure below. In every update cycle, the sensors are checked and their reading is fed into the Neural Network.

Object Detection Sensors

The detectors are checked to see if they are intersecting with any obstacle. If no obstacles are encountered (by a detector), the value -1 is returned as its output. Otherwise, the distance of the intersection point from the start of the detector is returned. This value is normalized according to the length of the detector to be between 0 and 1. Any value between 0 and 1 indicates that an obstacle has been detected within the range of the detector but does not indicate that a collision has actually occurred.

In order to determine if a collision has occurred, the calculated distance is checked against a predefined value, which is proportional to the scale of the agent and the length of the detector. If the distance is less than the predefined value, it means that the obstacle is too close to the agent and therefore collision has occurred. In this case, the collision flag is set to 1, otherwise, its value is set to 0. Feeding the value of detectors and the collision flag into the Neural Network helps the agent to learn obstacle avoidance.

In order to train an agent to avoid collisions with obstacles, it is rewarded based on the number of cells it visits. The reason, for choosing this factor as the fitness function, is that the more cells agents visit, the fewer collisions they have experienced. Therefore, this method would evolve the agent to learn collision avoidance.

Presence Detection Sensors

Presence detector sensors are similar to object detector sensors in terms of the length and the angle at which they are placed but their function is different. The tip of each presence detector is used as a reference to find the cells around the agent. As explained in Space Partitioning, a cell-based partitioning technique is used to divide the virtual environment into axis-aligned rectangles and creates a matrix of cells. Basically, cells are equal-sized and axis-aligned virtual rectangles created by partitioning the environment. A presence detector senses the area of a cell and thus detects which cell it is visiting. Each agent has a memory that records the number of ticks (update cycles) it has spent in each cell. The readings from a presence detector indicate the number of ticks the agent has spent on that particular cell that is currently detected by the presence detector .

Readings from presence detectors are normalized and fed into the Neural Network inputs. Since all other inputs of the Neural Network are standardized, these inputs also need to be between -1 and 1. In order to achieve this, the following conditions are applied:

If the tip of a presence detector is on a cell that has not been visited before, it returns -1.
When the amount of time the agent has spent on that cell is between 1 and 99 ticks, the value is divided by 100 before being fed to the Neural Network
If this amount exceeds 99 ticks, then the reading of the presence detector is set to 1.

The more cells an agent visits, the more data it can collect from the environment. It is desirable to have agents that can explore cells more rapidly in order to obtain more knowledge about the environment. Therefore, an agent is rewarded based on the number of cells it has visited in a given time so as to evolve agents that can navigate more cells.

Outputs of the Neural Network

The output of the neuron is derived by applying an activation function to the weighted sum of its inputs. As discussed in the Introduction to Artificial Neural Networks, the linear combination of the inputs of a neuron is calculated through the following Equation:

The activation function used in this case is the log-sigmoid function as in Equation below:

The activation (a) in this equation is the weighted sum of the inputs of a neuron. The activation response (p) sets the curvature of the sigmoid function, which is initially set to 1.

The left and right output neurons correspond to the left and right wheel of the agent respectively (as in the figure showing the structure of the neural network). The outputs of the network control the direction of movement and its speed by controlling the force applied to each wheel. The variation in the force applied to the left and right wheels results in the rotation of the agent to the left or right. If a greater force is applied to the right wheel then the agent rotates towards the left and vice versa on the other side.

Updating the Agent’s Position

The forces applied to each wheel are subtracted from each other to find the resulting rotational force. This force is checked to ensure it does not exceed the maximum-allowed rotational force (at each update cycle). It is then added to the rotational force calculated in the previous steps. Then the direction of rotation along the x-and-y-axis is calculated using the mathematical functions -sine and cosine, respectively [1]. The speed of movement equals the sum of forces applied to both wheels. Multiplying the speed by the direction gives the amount of movement along the x-and-y-axis. Finally, the position of the agent can be determined by adding the amount of movement in the current update cycle to its previous location.

Implementing the Real-time Neuro-Evolution of Augmenting Topologies

The real-time NEAT algorithm is used to make the agent’s evolve and optimize their sensors. The figure below shows a UML diagram of the classes created in order to implement real-time NEAT.

Neuroevolution of Augmenting Topologies - NEAT Algorithm Implementation — UML Diagram of class implementation for Neuroevolution of Augmenting Topologies (NEAT) Algorithm

The main class that initiates real-time NEAT and controls its procedure is the NEATController. The NEATController class uses the GeneticAlgorithm class, which contains all the required functions for performing the Genetic Algorithm algorithm operations including crossover, mutation and selection operations.

Initially, a population of dummy agents is created for the training purpose. The training population considered for this project was set to 50 due to CPU constraints. The training continues at run-time to further improve their behavior.

Each agent consists of different components defined as different classes. These components include the body, brain, memory, and genome. The body of an agent is declared as the Organism class, which is its main component representing it in the virtual environment and encapsulating its other components. It keeps references to the brain, memory, and genome classes. This class is an extension of the Entity class (as described in Creating a Physics Engine), which provides the agent with certain attributes including mass, scale, velocity, position, and direction to apply the physics laws of motion. Other parameters and functions are also included to enable it to test the onboard sensors and determine their position and movement.

The MemoryMap class is defined to represent the memory of an agent, which is also created in the Organism class. It holds information about a two-dimensional matrix of cells, including the number of cells along x-and-y-axis and the size of each cell. It provides functions to check in which cell the agent is located, and if the agent has previously been visited by this agent.

The Genome class is used to encode the structure of the Neural Network in a genome. Each genome has a number of parameters including a unique ID, a reference to the species that it belongs to, a reference to the phenotype, and Vectors to hold the list of all neuron genes (NeuronGene class) and link genes (LinkGene class). It also has a number of functions, which perform the following operations; adding new link genes and neuron genes, mutating the weights and activation responses, providing a mapping from the genotype to phenotype, and sorting the link genes based on their innovation ID. A fitness attribute is also declared to measure the fitness of the agent that this genome belongs to.

The brain of an agent is a Neural Network as defined by the NeuralNet class. When the agents are created for the first time, Neural networks with simple structures are defined as their brain. A genome (genotype) is also created to encode the structure of the Neural Network (phenotype). These are the first-generation genotype and phenotype. The initial phenotype has two layers with eight inputs and two outputs, where each input is linked directly to both output neurons (as shown earlier in the figure above showing the neural network structure). A random weight between -1 and 1 is assigned to each link. Also, the initial value of the activation response (p) is set to 1.

Fitness Evaluation Process

For every generation of the real-time NEAT algorithm, the performance of agents is evaluated to measure the fitness of their genomes. The steps below summarize the sequence of procedures performed by each agent in every update cycle in order to evaluate the fitness of each genome.

Test sensors to get their readings
Check object detection sensors to see if any obstacle exists within their range and whether any collision has occurred
Check presence detectors to find the amount of time that agent has spent on the cell nearby
Feed the sensor readings into the Neural Network as inputs
Calculate the direction and speed of movement based on the speed of left and right wheel
Update the memory of agent by incrementing the number of ticks spent in the cell currently being visited
Increment the fitness value of agent if the currently visited cell has not been visited before

This procedure is part of the update() method of the Organism class, which briefly involves perceiving the environment by reading the sensors, deciding the actions to take using the Neural Network, finding the current position of an agent, and counting the number of cells visited.

At every update cycle, the Organism class tests the sensors and feeds the sensor readings into the Neural Network in order to update the outputs. The output of each neuron is calculated based on the log-sigmoid function as shown earlier. If the network has hidden neurons, then the activation of input neurons is fed into the hidden neurons. Similarly, this equation is then used to calculate the activation of hidden neurons and feed them into the neurons in the next layer. This procedure continues until the values of output neurons are determined. These values represent the forces applied to the wheels.

The difference in the force applied to the left and right wheels results in the rotation of the agent. Using geometric calculations, the direction and speed of movement are obtained and the current position of the agent is determined, as mentioned previously. Then the algorithm finds the cell where the agent is located at and updates its memory. The number of ticks an agent spends on this cell is recorded in its memory. Also the agent increments and updates the number of cells it has visited for the first time.

The evaluation process runs for a predefined period to give agents enough time to evolve to navigate the environment. The time between evolutions is calculated by the equation below:

n = m / ( |P | * I)

where n is the number of ticks between evolutions, P is the size of the population, m is the minimum time the agent is alive, and I specifies what fraction of the population can be expected to be ineligible once evolution reaches a steady-state [2].

At the end of each time interval, the real-time NEAT algorithm is executed to create a new generation agent by performing the procedures described in the article explaining real-time NEAT. All the genomes of the population that are eligible for mating are checked to find two of the fittest genomes. These two genomes are mated through mutation and crossover operations in order to create a new genome. Then a less fit genome is selected and replaced with the newly-created genome. This process is repeated until the agents are evolved to a point that their behaviour is satisfactory. This can be measured by human observation by noticing that fewer collisions occur in later generations. Also, the number of cells visited by agents is increasing over generations.

The two figures below show the top view of the environment during the training process, one at the second generation, and the other at the seventh generation. They show how the ability of agent in exploring the environment and avoiding hitting the obstacles is improved over five generations.

The movement path of the best agent in each generation is demonstrated. The shaded area shows the cells visited by the fittest agent during 1000 ticks. As the number of ticks spent in each cell increases, the shaded color becomes darker. This way, it can be observed that the fittest agent in the second generation has only visited a small number of cells. From the dark-colored cells, it can be observed that the fittest agent has wandered around certain areas on multiple occasions and has collided with the wall and eventually got stocked. On the other hand, the fittest agent in the seventh generation has visited more cells in 1000 ticks by navigating around the environment. From observing the light-colored cells it can be concluded that its navigation and collision avoidance has greatly improved compared to the second-generation agent. It has also evolved to navigate the unvisited cells and avoid visiting the areas that have already been explored.

Fitness versus generation graph (over seven generations)

The figure above shows the graph of agent’s fitness versus the generation number over seven generations. The graph also indicates that five species were created. The black line indicates the fitness of the fittest agent in each generation. The blue line is the average fitness of agents and the green line indicates the lowest fitness. As can be observed from this graph, fitness is increasing over generations. The best fitness started at 45 in the first generation and increased to 101 in the seventh generation. The average fitness also increases over generation showing an improvement throughout the population.

Conclusion

In this article, I discussed how the real-time NEAT algorithm was implemented to endow agents with learning capabilities to make their behavior evolve and explore a virtual environment.

Agents in this research are trained to survey and discover the virtual environment. A Neural Network is used to control the behavior of each agent. The inputs of Neural Networks are linked to onboard sensors to sense the surroundings of the agent. Based on the sensor readings, the agent then decides how to explore the environment while avoiding obstacles.

The sensors are not tuned at the beginning because initial Neural Networks have minimal topologies and randomly generated weights. The NEAT grows and evolves the structure of the Neural Network in order to optimize the sensors. It works by adding neurons and links to the phenotype, and defining historical markings (innovations) to keep track of structural changes. It checks the innovation database when new structural changes are about to occur and assigns the correct innovation ID to the newly-created neuron and link genes. This ensures that the genes contain historical markings of all changes in the phenotype, and helps to find the origin of genes. Using these innovations, genes of the same origin can be lined up when performing the crossover operation so disparate topologies can be crossed over in a consequential manner. NEAT also protects these innovations by allocating them to a new species and thereby reducing the competing conventions problem.

The NEAT algorithm evolves the entire genome population at once and necessarily is performed off-line or at certain intervals. This results in large gaps in time between generations and causes the process to look inconsistent. Meanwhile, real-time NEAT is based on the NEAT algorithm and performs the evolution process one or a few genomes at a time. This is particularly useful for dynamic environments and helps agents to adapt to environmental changes. It also makes it possible to interact with evolving agents in real-time. The process of real-time NEAT is performed by the trainer agent, which creates a population of agents and trains them explicitly for the task of exploring.

References

M. Buckland. AI Techniques for Game Playing. Premier Press, Cincinnati, OH, USA, 2002.
K. O. Stanley, B. D. Bryant, and R. Miikkulainen. Real-time neuroevolution in the NERO video game. IEEE Transactions on Evolutionary Computation, 9:653–668, 2005.

Post Views: 467

4 thoughts on “Training Agents to Navigate a Virtual Environment Using the NEAT Algorithm”

Pingback: Collision Detection and Space Partitioning for Agent Navigation - Brainy Loop
Pingback: My Research on Artificial Intelligence, Machine Learning and Multi-Agent Systems - Brainy Loop
Pingback: Agent Communication Languages in AI for Knowledge Sharing Brainy Loop
Pingback: Neuro-Evolution of Augmenting Topologies Algorithm - Brainy Loop

Overview: