Optimization problems
In a genetic algorithm, a population of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem is evolved toward better solutions. Each candidate solution has a set of properties (its chromosomes or genotypes) that can be mutated and altered. Traditionally, solutions are represented in binary as strings of zeros and ones, but other encodings are also possible.
Evolution usually starts from a population of randomly generated individuals, and is an iterative process, with the population in each iteration called a generation. In each generation, the fitness of each individual in the population is evaluated. Fitness is usually the value of the objective function in the optimization problem being solved. The fittest individuals are selected stochastically from the current population, and each individual's genome is modified (recombined "Recombination (evolutionary computing)") and possibly mutated at random) to form a new generation. The new generation of candidate solutions is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when a maximum number of generations has occurred, or a satisfactory fitness level for the population has been reached.
A typical genetic algorithm requires:
A standard representation of each candidate solution is as an array "(mathematical) Matrix") of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, which facilitates simple crossover operations. Variable length representations can also be used, but the implementation of chromosome crossing over is more complex in this case. Tree representations are explored in genetic programming and graph-like representations are explored in evolutionary programming. A mixture of both linear chromosomes and trees is explored in gene expression programming.
Once the genetic representation and fitness function are defined, a GA proceeds to initialize a population of solutions and then improve it by repetitively applying the mutation, crossover, inversion, and selection operators.
The size of the population depends on the nature of the problem, but typically contains several hundred or thousands of possible solutions. Often the initial population is generated randomly, allowing for the full range of possible solutions (the search space). Occasionally, solutions may be "seeded" in areas where optimal solutions are likely to be found.
During each successive generation, a portion of the existing population is selected to raise a new generation. Individual solutions are selected through a fitness-based process, where fitness solutions (as measured by a fitness function) are typically more likely to be selected. Certain selection methods evaluate the suitability of each solution and preferentially select the best solutions. Other methods score only a random sample of the population, as the above process can be time-consuming.
The fitness function is defined on the genetic representation and measures the quality of the represented solution. The fitness function always depends on the problem. For example, in the backpack problem we want to maximize the total value of the objects that can be put in a backpack of some fixed capacity. A representation of a solution can be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the backpack. Not all representations are valid, since the size of the objects may exceed the capacity of the backpack. The fitness of the solution is the sum of values of all objects in the backpack if the representation is valid, or 0 otherwise.
In some problems, it is difficult or even impossible to define the expression of physical condition. In these cases, simulation can be used to determine the value of the fitness function of a phenotype (for example, computational fluid dynamics is used to determine the air resistance of a vehicle whose shape is encoded as the phenotype) or even interactive genetic algorithms.
The next step is to generate a second generation population, from solutions selected through a combination of genetic operators: chromosomal crossing over (also called crossover or recombination) and mutation.
For each new solution that has been produced, a pair of "parent" solutions has been selected for breeding from the previously selected pool. By producing a "breeding" solution using the chromosomal crossover and mutation methods mentioned above, a new solution is created that typically shares many of the characteristics of its "parents." New parents are selected for each new brood, and the process continues until a new population of solutions of appropriate size is generated. Although breeding methods that rely on the use of two parents are more "inspired biology", some research suggests that more than two "parents" can generate higher quality chromosomes.
These processes ultimately result in the next generation of chromosome population, which is different from the initial generation. In general, the average fitness has been increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, along with a small proportion of less fit solutions. These less fit solutions ensure genetic diversity within the genetic pool of the parents and, therefore, ensure the genetic diversity of the next generation of children.
Opinion is divided on the importance of crossover versus mutation. There are many references in Fogel (2006) that support the importance of mutation-based search.
Although crossover and mutation are known as the main genetic operators, it is possible to use other operators such as reassortment, colonization-extinction or migration in genetic algorithms.
It is worth adjusting parameters such as mutation probability, crossover probability, and population size to find reasonable fits for the kind of problem you are working on. A very small mutation rate can lead to genetic drift (which is non-ergodic in nature). A recombination rate that is too high can lead to premature convergence of the genetic algorithm. Too high a mutation rate can lead to the loss of good solutions, unless elite selection is used.
This generational process is repeated until a termination condition is reached. Common termination conditions are:.