More About PyGAD

Multi-Objective Optimization

In PyGAD 3.2.0, the library supports multi-objective optimization using the non-dominated sorting genetic algorithm II (NSGA-II). The code is exactly similar to the regular code used for single-objective optimization except for 1 difference. It is the return value of the fitness function.

In single-objective optimization, the fitness function returns a single numeric value. In this example, the variable fitness is expected to be a numeric value.

def fitness_func(ga_instance, solution, solution_idx):
    ...
    return fitness

But in multi-objective optimization, the fitness function returns any of these data types:

  1. list

  2. tuple

  3. numpy.ndarray

def fitness_func(ga_instance, solution, solution_idx):
    ...
    return [fitness1, fitness2, ..., fitnessN]

Whenever the fitness function returns an iterable of these data types, then the problem is considered multi-objective. This holds even if there is a single element in the returned iterable.

Other than the fitness function, everything else could be the same in both single and multi-objective problems.

But it is recommended to use one of these 2 parent selection operators to solve multi-objective problems:

  1. nsga2: This selects the parents based on non-dominated sorting and crowding distance.

  2. tournament_nsga2: This selects the parents using tournament selection which uses non-dominated sorting and crowding distance to rank the solutions.

This is a multi-objective optimization example that optimizes these 2 linear functions:

  1. y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6

  2. y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12

Where:

  1. (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=50

  2. (x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5) and y=30

The 2 functions use the same parameters (weights) w1 to w6.

The goal is to use PyGAD to find the optimal values for such weights that satisfy the 2 functions y1 and y2.

import pygad
import numpy

"""
Given these 2 functions:
    y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6
    y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12
    where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=50
    and   (x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5) and y=30
What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize these 2 functions.
This is a multi-objective optimization problem.

PyGAD considers the problem as multi-objective if the fitness function returns:
    1) List.
    2) Or tuple.
    3) Or numpy.ndarray.
"""

function_inputs1 = [4,-2,3.5,5,-11,-4.7] # Function 1 inputs.
function_inputs2 = [-2,0.7,-9,1.4,3,5] # Function 2 inputs.
desired_output1 = 50 # Function 1 output.
desired_output2 = 30 # Function 2 output.

def fitness_func(ga_instance, solution, solution_idx):
    output1 = numpy.sum(solution*function_inputs1)
    output2 = numpy.sum(solution*function_inputs2)
    fitness1 = 1.0 / (numpy.abs(output1 - desired_output1) + 0.000001)
    fitness2 = 1.0 / (numpy.abs(output2 - desired_output2) + 0.000001)
    return [fitness1, fitness2]

num_generations = 100
num_parents_mating = 10

sol_per_pop = 20
num_genes = len(function_inputs1)

ga_instance = pygad.GA(num_generations=num_generations,
                       num_parents_mating=num_parents_mating,
                       sol_per_pop=sol_per_pop,
                       num_genes=num_genes,
                       fitness_func=fitness_func,
                       parent_selection_type='nsga2')

ga_instance.run()

ga_instance.plot_fitness(label=['Obj 1', 'Obj 2'])

solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness)
print(f"Parameters of the best solution : {solution}")
print(f"Fitness value of the best solution = {solution_fitness}")

prediction = numpy.sum(numpy.array(function_inputs1)*solution)
print(f"Predicted output 1 based on the best solution : {prediction}")
prediction = numpy.sum(numpy.array(function_inputs2)*solution)
print(f"Predicted output 2 based on the best solution : {prediction}")

This is the result of the print statements. The predicted outputs are close to the desired outputs.

Parameters of the best solution : [ 0.79676439 -2.98823386 -4.12677662  5.70539445 -2.02797016 -1.07243922]
Fitness value of the best solution = [  1.68090829 349.8591915 ]
Predicted output 1 based on the best solution : 50.59491545442283
Predicted output 2 based on the best solution : 29.99714270722312

This is the figure created by the plot_fitness() method. The fitness of the first objective has the green color. The blue color is used for the second objective fitness.

Limit the Gene Value Range using the gene_space Parameter

In PyGAD 2.11.0, the gene_space parameter supported a new feature to allow customizing the range of accepted values for each gene. Let’s take a quick review of the gene_space parameter to build over it.

The gene_space parameter allows the user to feed the space of values of each gene. This way the accepted values for each gene is retracted to the user-defined values. Assume there is a problem that has 3 genes where each gene has different set of values as follows:

  1. Gene 1: [0.4, 12, -5, 21.2]

  2. Gene 2: [-2, 0.3]

  3. Gene 3: [1.2, 63.2, 7.4]

Then, the gene_space for this problem is as given below. Note that the order is very important.

gene_space = [[0.4, 12, -5, 21.2],
              [-2, 0.3],
              [1.2, 63.2, 7.4]]

In case all genes share the same set of values, then simply feed a single list to the gene_space parameter as follows. In this case, all genes can only take values from this list of 6 values.

gene_space = [33, 7, 0.5, 95. 6.3, 0.74]

The previous example restricts the gene values to just a set of fixed number of discrete values. In case you want to use a range of discrete values to the gene, then you can use the range() function. For example, range(1, 7) means the set of allowed values for the gene are 1, 2, 3, 4, 5, and 6. You can also use the numpy.arange() or numpy.linspace() functions for the same purpose.

The previous discussion only works with a range of discrete values not continuous values. In PyGAD 2.11.0, the gene_space parameter can be assigned a dictionary that allows the gene to have values from a continuous range.

Assuming you want to restrict the gene within this half-open range [1 to 5) where 1 is included and 5 is not. Then simply create a dictionary with 2 items where the keys of the 2 items are:

  1. 'low': The minimum value in the range which is 1 in the example.

  2. 'high': The maximum value in the range which is 5 in the example.

The dictionary will look like that:

{'low': 1,
 'high': 5}

It is not acceptable to add more than 2 items in the dictionary or use other keys than 'low' and 'high'.

For a 3-gene problem, the next code creates a dictionary for each gene to restrict its values in a continuous range. For the first gene, it can take any floating-point value from the range that starts from 1 (inclusive) and ends at 5 (exclusive).

gene_space = [{'low': 1, 'high': 5}, {'low': 0.3, 'high': 1.4}, {'low': -0.2, 'high': 4.5}]

More about the gene_space Parameter

The gene_space parameter customizes the space of values of each gene.

Assuming that all genes have the same global space which include the values 0.3, 5.2, -4, and 8, then those values can be assigned to the gene_space parameter as a list, tuple, or range. Here is a list assigned to this parameter. By doing that, then the gene values are restricted to those assigned to the gene_space parameter.

gene_space = [0.3, 5.2, -4, 8]

If some genes have different spaces, then gene_space should accept a nested list or tuple. In this case, the elements could be:

  1. Number (of int, float, or NumPy data types): A single value to be assigned to the gene. This means this gene will have the same value across all generations.

  2. list, tuple, numpy.ndarray, or any range like range, numpy.arange(), or numpy.linspace: It holds the space for each individual gene. But this space is usually discrete. That is there is a set of finite values to select from.

  3. dict: To sample a value for a gene from a continuous range. The dictionary must have 2 mandatory keys which are "low" and "high" in addition to an optional key which is "step". A random value is returned between the values assigned to the items with "low" and "high" keys. If the "step" exists, then this works as the previous options (i.e. discrete set of values).

  4. None: A gene with its space set to None is initialized randomly from the range specified by the 2 parameters init_range_low and init_range_high. For mutation, its value is mutated based on a random value from the range specified by the 2 parameters random_mutation_min_val and random_mutation_max_val. If all elements in the gene_space parameter are None, the parameter will not have any effect.

Assuming that a chromosome has 2 genes and each gene has a different value space. Then the gene_space could be assigned a nested list/tuple where each element determines the space of a gene.

According to the next code, the space of the first gene is [0.4, -5] which has 2 values and the space for the second gene is [0.5, -3.2, 8.8, -9] which has 4 values.

gene_space = [[0.4, -5], [0.5, -3.2, 8.2, -9]]

For a 2 gene chromosome, if the first gene space is restricted to the discrete values from 0 to 4 and the second gene is restricted to the values from 10 to 19, then it could be specified according to the next code.

gene_space = [range(5), range(10, 20)]

The gene_space can also be assigned to a single range, as given below, where the values of all genes are sampled from the same range.

gene_space = numpy.arange(15)

The gene_space can be assigned a dictionary to sample a value from a continuous range.

gene_space = {"low": 4, "high": 30}

A step also can be assigned to the dictionary. This works as if a range is used.

gene_space = {"low": 4, "high": 30, "step": 2.5}

Setting a dict like {"low": 0, "high": 10} in the gene_space means that random values from the continuous range [0, 10) are sampled. Note that 0 is included but 10 is not included while sampling. Thus, the maximum value that could be returned is less than 10 like 9.9999. But if the user decided to round the genes using, for example, [float, 2], then this value will become 10. So, the user should be careful to the inputs.

If a None is assigned to only a single gene, then its value will be randomly generated initially using the init_range_low and init_range_high parameters in the pygad.GA class’s constructor. During mutation, the value are sampled from the range defined by the 2 parameters random_mutation_min_val and random_mutation_max_val. This is an example where the second gene is given a None value.

gene_space = [range(5), None, numpy.linspace(10, 20, 300)]

If the user did not assign the initial population to the initial_population parameter, the initial population is created randomly based on the gene_space parameter. Moreover, the mutation is applied based on this parameter.

How Mutation Works with the gene_space Parameter?

Mutation changes based on whether the gene_space has a continuous range or discrete set of values.

If a gene has its static/discrete space defined in the gene_space parameter, then mutation works by replacing the gene value by a value randomly selected from the gene space. This happens for both int and float data types.

For example, the following gene_space has the static space [1, 2, 3] defined for the first gene. So, this gene can only have a value out of these 3 values.

Gene space: [[1, 2, 3],
             None]
Solution: [1, 5]

For a solution like [1, 5], then mutation happens for the first gene by simply replacing its current value by a randomly selected value (other than its current value if possible). So, the value 1 will be replaced by either 2 or 3.

For the second gene, its space is set to None. So, traditional mutation happens for this gene by:

  1. Generating a random value from the range defined by the random_mutation_min_val and random_mutation_max_val parameters.

  2. Adding this random value to the current gene’s value.

If its current value is 5 and the random value is -0.5, then the new value is 4.5. If the gene type is integer, then the value will be rounded.

On the other hand, if a gene has a continuous space defined in the gene_space parameter, then mutation occurs by adding a random value to the current gene value.

For example, the following gene_space has the continuous space defined by the dictionary {'low': 1, 'high': 5}. This applies to all genes. So, mutation is applied to one or more selected genes by adding a random value to the current gene value.

Gene space: {'low': 1, 'high': 5}
Solution: [1.5, 3.4]

Assuming random_mutation_min_val=-1 and random_mutation_max_val=1, then a random value such as 0.3 can be added to the gene(s) participating in mutation. If only the first gene is mutated, then its new value changes from 1.5 to 1.5+0.3=1.8. Note that PyGAD verifies that the new value is within the range. In the worst scenarios, the value will be set to either boundary of the continuous range. For example, if the gene value is 1.5 and the random value is -0.55, then the new value is 0.95 which smaller than the lower boundary 1. Thus, the gene value will be rounded to 1.

If the dictionary has a step like the example below, then it is considered a discrete range and mutation occurs by randomly selecting a value from the set of values. In other words, no random value is added to the gene value.

Gene space: {'low': 1, 'high': 5, 'step': 0.5}

Stop at Any Generation

In PyGAD 2.4.0, it is possible to stop the genetic algorithm after any generation. All you need to do it to return the string "stop" in the callback function on_generation. When this callback function is implemented and assigned to the on_generation parameter in the constructor of the pygad.GA class, then the algorithm immediately stops after completing its current generation. Let’s discuss an example.

Assume that the user wants to stop algorithm either after the 100 generations or if a condition is met. The user may assign a value of 100 to the num_generations parameter of the pygad.GA class constructor.

The condition that stops the algorithm is written in a callback function like the one in the next code. If the fitness value of the best solution exceeds 70, then the string "stop" is returned.

def func_generation(ga_instance):
    if ga_instance.best_solution()[1] >= 70:
        return "stop"

Stop Criteria

In PyGAD 2.15.0, a new parameter named stop_criteria is added to the constructor of the pygad.GA class. It helps to stop the evolution based on some criteria. It can be assigned to one or more criterion.

Each criterion is passed as str that consists of 2 parts:

  1. Stop word.

  2. Number.

It takes this form:

"word_num"

The current 2 supported words are reach and saturate.

The reach word stops the run() method if the fitness value is equal to or greater than a given fitness value. An example for reach is "reach_40" which stops the evolution if the fitness is >= 40.

saturate stops the evolution if the fitness saturates for a given number of consecutive generations. An example for saturate is "saturate_7" which means stop the run() method if the fitness does not change for 7 consecutive generations.

Here is an example that stops the evolution if either the fitness value reached 127.4 or if the fitness saturates for 15 generations.

import pygad
import numpy

equation_inputs = [4, -2, 3.5, 8, 9, 4]
desired_output = 44

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution * equation_inputs)

    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)

    return fitness

ga_instance = pygad.GA(num_generations=200,
                       sol_per_pop=10,
                       num_parents_mating=4,
                       num_genes=len(equation_inputs),
                       fitness_func=fitness_func,
                       stop_criteria=["reach_127.4", "saturate_15"])

ga_instance.run()
print(f"Number of generations passed is {ga_instance.generations_completed}")

Elitism Selection

In PyGAD 2.18.0, a new parameter called keep_elitism is supported. It accepts an integer to define the number of elitism (i.e. best solutions) to keep in the next generation. This parameter defaults to 1 which means only the best solution is kept in the next generation.

In the next example, the keep_elitism parameter in the constructor of the pygad.GA class is set to 2. Thus, the best 2 solutions in each generation are kept in the next generation.

import numpy
import pygad

function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution*function_inputs)
    fitness = 1.0 / numpy.abs(output - desired_output)
    return fitness

ga_instance = pygad.GA(num_generations=2,
                       num_parents_mating=3,
                       fitness_func=fitness_func,
                       num_genes=6,
                       sol_per_pop=5,
                       keep_elitism=2)

ga_instance.run()

The value passed to the keep_elitism parameter must satisfy 2 conditions:

  1. It must be >= 0.

  2. It must be <= sol_per_pop. That is its value cannot exceed the number of solutions in the current population.

In the previous example, if the keep_elitism parameter is set equal to the value passed to the sol_per_pop parameter, which is 5, then there will be no evolution at all as in the next figure. This is because all the 5 solutions are used as elitism in the next generation and no offspring will be created.

...

ga_instance = pygad.GA(...,
                       sol_per_pop=5,
                       keep_elitism=5)

ga_instance.run()

Note that if the keep_elitism parameter is effective (i.e. is assigned a positive integer, not zero), then the keep_parents parameter will have no effect. Because the default value of the keep_elitism parameter is 1, then the keep_parents parameter has no effect by default. The keep_parents parameter is only effective when keep_elitism=0.

Random Seed

In PyGAD 2.18.0, a new parameter called random_seed is supported. Its value is used as a seed for the random function generators.

PyGAD uses random functions in these 2 libraries:

  1. NumPy

  2. random

The random_seed parameter defaults to None which means no seed is used. As a result, different random numbers are generated for each run of PyGAD.

If this parameter is assigned a proper seed, then the results will be reproducible. In the next example, the integer 2 is used as a random seed.

import numpy
import pygad

function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution*function_inputs)
    fitness = 1.0 / numpy.abs(output - desired_output)
    return fitness

ga_instance = pygad.GA(num_generations=2,
                       num_parents_mating=3,
                       fitness_func=fitness_func,
                       sol_per_pop=5,
                       num_genes=6,
                       random_seed=2)

ga_instance.run()
best_solution, best_solution_fitness, best_match_idx = ga_instance.best_solution()
print(best_solution)
print(best_solution_fitness)

This is the best solution found and its fitness value.

[ 2.77249188 -4.06570662  0.04196872 -3.47770796 -0.57502138 -3.22775267]
0.04872203136549972

After running the code again, it will find the same result.

[ 2.77249188 -4.06570662  0.04196872 -3.47770796 -0.57502138 -3.22775267]
0.04872203136549972

Continue without Losing Progress

In PyGAD 2.18.0, and thanks for Felix Bernhard for opening this GitHub issue, the values of these 4 instance attributes are no longer reset after each call to the run() method.

  1. self.best_solutions

  2. self.best_solutions_fitness

  3. self.solutions

  4. self.solutions_fitness

This helps the user to continue where the last run stopped without losing the values of these 4 attributes.

Now, the user can save the model by calling the save() method.

import pygad

def fitness_func(ga_instance, solution, solution_idx):
    ...
    return fitness

ga_instance = pygad.GA(...)

ga_instance.run()

ga_instance.plot_fitness()

ga_instance.save("pygad_GA")

Then the saved model is loaded by calling the load() function. After calling the run() method over the loaded instance, then the data from the previous 4 attributes are not reset but extended with the new data.

import pygad

def fitness_func(ga_instance, solution, solution_idx):
    ...
    return fitness

loaded_ga_instance = pygad.load("pygad_GA")

loaded_ga_instance.run()

loaded_ga_instance.plot_fitness()

The plot created by the plot_fitness() method will show the data collected from both the runs.

Note that the 2 attributes (self.best_solutions and self.best_solutions_fitness) only work if the save_best_solutions parameter is set to True. Also, the 2 attributes (self.solutions and self.solutions_fitness) only work if the save_solutions parameter is True.

Change Population Size during Runtime

Starting from PyGAD 3.3.0, the population size can changed during runtime. In other words, the number of solutions/chromosomes and number of genes can be changed.

The user has to carefully arrange the list of parameters and instance attributes that have to be changed to keep the GA consistent before and after changing the population size. Generally, change everything that would be used during the GA evolution.

CAUTION: If the user failed to change a parameter or an instance attributes necessary to keep the GA running after the population size changed, errors will arise.

These are examples of the parameters that the user should decide whether to change. The user should check the list of parameters and decide what to change.

  1. population: The population. It must be changed.

  2. num_offspring: The number of offspring to produce out of the crossover and mutation operations. Change this parameter if the number of offspring have to be changed to be consistent with the new population size.

  3. num_parents_mating: The number of solutions to select as parents. Change this parameter if the number of parents have to be changed to be consistent with the new population size.

  4. fitness_func: If the way of calculating the fitness changes after the new population size, then the fitness function have to be changed.

  5. sol_per_pop: The number of solutions per population. It is not critical to change it but it is recommended to keep this number consistent with the number of solutions in the population parameter.

These are examples of the instance attributes that might be changed. The user should check the list of instance attributes and decide what to change.

  1. All the last_generation_* parameters

    1. last_generation_fitness: A 1D NumPy array of fitness values of the population.

    2. last_generation_parents and last_generation_parents_indices: Two NumPy arrays: 2D array representing the parents and 1D array of the parents indices.

    3. last_generation_elitism and last_generation_elitism_indices: Must be changed if keep_elitism != 0. The default value of keep_elitism is 1. Two NumPy arrays: 2D array representing the elitism and 1D array of the elitism indices.

  2. pop_size: The population size.

Prevent Duplicates in Gene Values

In PyGAD 2.13.0, a new bool parameter called allow_duplicate_genes is supported to control whether duplicates are supported in the chromosome or not. In other words, whether 2 or more genes might have the same exact value.

If allow_duplicate_genes=True (which is the default case), genes may have the same value. If allow_duplicate_genes=False, then no 2 genes will have the same value given that there are enough unique values for the genes.

The next code gives an example to use the allow_duplicate_genes parameter. A callback generation function is implemented to print the population after each generation.

import pygad

def fitness_func(ga_instance, solution, solution_idx):
    return 0

def on_generation(ga):
    print("Generation", ga.generations_completed)
    print(ga.population)

ga_instance = pygad.GA(num_generations=5,
                       sol_per_pop=5,
                       num_genes=4,
                       mutation_num_genes=3,
                       random_mutation_min_val=-5,
                       random_mutation_max_val=5,
                       num_parents_mating=2,
                       fitness_func=fitness_func,
                       gene_type=int,
                       on_generation=on_generation,
                       allow_duplicate_genes=False)
ga_instance.run()

Here are the population after the 5 generations. Note how there are no duplicate values.

Generation 1
[[ 2 -2 -3  3]
 [ 0  1  2  3]
 [ 5 -3  6  3]
 [-3  1 -2  4]
 [-1  0 -2  3]]
Generation 2
[[-1  0 -2  3]
 [-3  1 -2  4]
 [ 0 -3 -2  6]
 [-3  0 -2  3]
 [ 1 -4  2  4]]
Generation 3
[[ 1 -4  2  4]
 [-3  0 -2  3]
 [ 4  0 -2  1]
 [-4  0 -2 -3]
 [-4  2  0  3]]
Generation 4
[[-4  2  0  3]
 [-4  0 -2 -3]
 [-2  5  4 -3]
 [-1  2 -4  4]
 [-4  2  0 -3]]
Generation 5
[[-4  2  0 -3]
 [-1  2 -4  4]
 [ 3  4 -4  0]
 [-1  0  2 -2]
 [-4  2 -1  1]]

The allow_duplicate_genes parameter is configured with use with the gene_space parameter. Here is an example where each of the 4 genes has the same space of values that consists of 4 values (1, 2, 3, and 4).

import pygad

def fitness_func(ga_instance, solution, solution_idx):
    return 0

def on_generation(ga):
    print("Generation", ga.generations_completed)
    print(ga.population)

ga_instance = pygad.GA(num_generations=1,
                       sol_per_pop=5,
                       num_genes=4,
                       num_parents_mating=2,
                       fitness_func=fitness_func,
                       gene_type=int,
                       gene_space=[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]],
                       on_generation=on_generation,
                       allow_duplicate_genes=False)
ga_instance.run()

Even that all the genes share the same space of values, no 2 genes duplicate their values as provided by the next output.

Generation 1
[[2 3 1 4]
 [2 3 1 4]
 [2 4 1 3]
 [2 3 1 4]
 [1 3 2 4]]
Generation 2
[[1 3 2 4]
 [2 3 1 4]
 [1 3 2 4]
 [2 3 4 1]
 [1 3 4 2]]
Generation 3
[[1 3 4 2]
 [2 3 4 1]
 [1 3 4 2]
 [3 1 4 2]
 [3 2 4 1]]
Generation 4
[[3 2 4 1]
 [3 1 4 2]
 [3 2 4 1]
 [1 2 4 3]
 [1 3 4 2]]
Generation 5
[[1 3 4 2]
 [1 2 4 3]
 [2 1 4 3]
 [1 2 4 3]
 [1 2 4 3]]

You should care of giving enough values for the genes so that PyGAD is able to find alternatives for the gene value in case it duplicates with another gene.

There might be 2 duplicate genes where changing either of the 2 duplicating genes will not solve the problem. For example, if gene_space=[[3, 0, 1], [4, 1, 2], [0, 2], [3, 2, 0]] and the solution is [3 2 0 0], then the values of the last 2 genes duplicate. There are no possible changes in the last 2 genes to solve the problem.

This problem can be solved by randomly changing one of the non-duplicating genes that may make a room for a unique value in one the 2 duplicating genes. For example, by changing the second gene from 2 to 4, then any of the last 2 genes can take the value 2 and solve the duplicates. The resultant gene is then [3 4 2 0]. But this option is not yet supported in PyGAD.

Solve Duplicates using a Third Gene

When allow_duplicate_genes=False and a user-defined gene_space is used, it sometimes happen that there is no room to solve the duplicates between the 2 genes by simply replacing the value of one gene by another gene. In PyGAD 3.1.0, the duplicates are solved by looking for a third gene that will help in solving the duplicates. The following examples explain how it works.

Example 1:

Let’s assume that this gene space is used and there is a solution with 2 duplicate genes with the same value 4.

Gene space: [[2, 3],
             [3, 4],
             [4, 5],
             [5, 6]]
Solution: [3, 4, 4, 5]

By checking the gene space, the second gene can have the values [3, 4] and the third gene can have the values [4, 5]. To solve the duplicates, we have the value of any of these 2 genes.

If the value of the second gene changes from 4 to 3, then it will be duplicate with the first gene. If we are to change the value of the third gene from 4 to 5, then it will duplicate with the fourth gene. As a conclusion, trying to just selecting a different gene value for either the second or third genes will introduce new duplicating genes.

When there are 2 duplicate genes but there is no way to solve their duplicates, then the solution is to change a third gene that makes a room to solve the duplicates between the 2 genes.

In our example, duplicates between the second and third genes can be solved by, for example,:

  • Changing the first gene from 3 to 2 then changing the second gene from 4 to 3.

  • Or changing the fourth gene from 5 to 6 then changing the third gene from 4 to 5.

Generally, this is how to solve such duplicates:

  1. For any duplicate gene GENE1, select another value.

  2. Check which other gene GENEX has duplicate with this new value.

  3. Find if GENEX can have another value that will not cause any more duplicates. If so, go to step 7.

  4. If all the other values of GENEX will cause duplicates, then try another gene GENEY.

  5. Repeat steps 3 and 4 until exploring all the genes.

  6. If there is no possibility to solve the duplicates, then there is not way to solve the duplicates and we have to keep the duplicate value.

  7. If a value for a gene GENEM is found that will not cause more duplicates, then use this value for the gene GENEM.

  8. Replace the value of the gene GENE1 by the old value of the gene GENEM. This solves the duplicates.

This is an example to solve the duplicate for the solution [3, 4, 4, 5]:

  1. Let’s use the second gene with value 4. Because the space of this gene is [3, 4], then the only other value we can select is 3.

  2. The first gene also have the value 3.

  3. The first gene has another value 2 that will not cause more duplicates in the solution. Then go to step 7.

  4. Skip.

  5. Skip.

  6. Skip.

  7. The value of the first gene 3 will be replaced by the new value 2. The new solution is [2, 4, 4, 5].

  8. Replace the value of the second gene 4 by the old value of the first gene which is 3. The new solution is [2, 3, 4, 5]. The duplicate is solved.

Example 2:

Gene space: [[0, 1],
             [1, 2],
             [2, 3],
             [3, 4]]
Solution: [1, 2, 2, 3]

The quick summary is:

  • Change the value of the first gene from 1 to 0. The solution becomes [0, 2, 2, 3].

  • Change the value of the second gene from 2 to 1. The solution becomes [0, 1, 2, 3]. The duplicate is solved.

More about the gene_type Parameter

The gene_type parameter allows the user to control the data type for all genes at once or each individual gene. In PyGAD 2.15.0, the gene_type parameter also supports customizing the precision for float data types. As a result, the gene_type parameter helps to:

  1. Select a data type for all genes with or without precision.

  2. Select a data type for each individual gene with or without precision.

Let’s discuss things by examples.

Data Type for All Genes without Precision

The data type for all genes can be specified by assigning the numeric data type directly to the gene_type parameter. This is an example to make all genes of int data types.

gene_type=int

Given that the supported numeric data types of PyGAD include Python’s int and float in addition to all numeric types of NumPy, then any of these types can be assigned to the gene_type parameter.

If no precision is specified for a float data type, then the complete floating-point number is kept.

The next code uses an int data type for all genes where the genes in the initial and final population are only integers.

import pygad
import numpy

equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution * equation_inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
    return fitness

ga_instance = pygad.GA(num_generations=10,
                       sol_per_pop=5,
                       num_parents_mating=2,
                       num_genes=len(equation_inputs),
                       fitness_func=fitness_func,
                       gene_type=int)

print("Initial Population")
print(ga_instance.initial_population)

ga_instance.run()

print("Final Population")
print(ga_instance.population)
Initial Population
[[ 1 -1  2  0 -3]
 [ 0 -2  0 -3 -1]
 [ 0 -1 -1  2  0]
 [-2  3 -2  3  3]
 [ 0  0  2 -2 -2]]

Final Population
[[ 1 -1  2  2  0]
 [ 1 -1  2  2  0]
 [ 1 -1  2  2  0]
 [ 1 -1  2  2  0]
 [ 1 -1  2  2  0]]

Data Type for All Genes with Precision

A precision can only be specified for a float data type and cannot be specified for integers. Here is an example to use a precision of 3 for the float data type. In this case, all genes are of type float and their maximum precision is 3.

gene_type=[float, 3]

The next code uses prints the initial and final population where the genes are of type float with precision 3.

import pygad
import numpy

equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution * equation_inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)

    return fitness

ga_instance = pygad.GA(num_generations=10,
                       sol_per_pop=5,
                       num_parents_mating=2,
                       num_genes=len(equation_inputs),
                       fitness_func=fitness_func,
                       gene_type=[float, 3])

print("Initial Population")
print(ga_instance.initial_population)

ga_instance.run()

print("Final Population")
print(ga_instance.population)
Initial Population
[[-2.417 -0.487  3.623  2.457 -2.362]
 [-1.231  0.079 -1.63   1.629 -2.637]
 [ 0.692 -2.098  0.705  0.914 -3.633]
 [ 2.637 -1.339 -1.107 -0.781 -3.896]
 [-1.495  1.378 -1.026  3.522  2.379]]

Final Population
[[ 1.714 -1.024  3.623  3.185 -2.362]
 [ 0.692 -1.024  3.623  3.185 -2.362]
 [ 0.692 -1.024  3.623  3.375 -2.362]
 [ 0.692 -1.024  4.041  3.185 -2.362]
 [ 1.714 -0.644  3.623  3.185 -2.362]]

Data Type for each Individual Gene without Precision

In PyGAD 2.14.0, the gene_type parameter allows customizing the gene type for each individual gene. This is by using a list/tuple/numpy.ndarray with number of elements equal to the number of genes. For each element, a type is specified for the corresponding gene.

This is an example for a 5-gene problem where different types are assigned to the genes.

gene_type=[int, float, numpy.float16, numpy.int8, float]

This is a complete code that prints the initial and final population for a custom-gene data type.

import pygad
import numpy

equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution * equation_inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
    return fitness

ga_instance = pygad.GA(num_generations=10,
                       sol_per_pop=5,
                       num_parents_mating=2,
                       num_genes=len(equation_inputs),
                       fitness_func=fitness_func,
                       gene_type=[int, float, numpy.float16, numpy.int8, float])

print("Initial Population")
print(ga_instance.initial_population)

ga_instance.run()

print("Final Population")
print(ga_instance.population)
Initial Population
[[0 0.8615522360026828 0.7021484375 -2 3.5301821368185866]
 [-3 2.648189378595294 -3.830078125 1 -0.9586271572917742]
 [3 3.7729827570110714 1.2529296875 -3 1.395741994211889]
 [0 1.0490687178053282 1.51953125 -2 0.7243617940450235]
 [0 -0.6550158436937226 -2.861328125 -2 1.8212734549263097]]

Final Population
[[3 3.7729827570110714 2.055 0 0.7243617940450235]
 [3 3.7729827570110714 1.458 0 -0.14638754050305036]
 [3 3.7729827570110714 1.458 0 0.0869406120516778]
 [3 3.7729827570110714 1.458 0 0.7243617940450235]
 [3 3.7729827570110714 1.458 0 -0.14638754050305036]]

Data Type for each Individual Gene with Precision

The precision can also be specified for the float data types as in the next line where the second gene precision is 2 and last gene precision is 1.

gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]]

This is a complete example where the initial and final populations are printed where the genes comply with the data types and precisions specified.

import pygad
import numpy

equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution * equation_inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
    return fitness

ga_instance = pygad.GA(num_generations=10,
                       sol_per_pop=5,
                       num_parents_mating=2,
                       num_genes=len(equation_inputs),
                       fitness_func=fitness_func,
                       gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]])

print("Initial Population")
print(ga_instance.initial_population)

ga_instance.run()

print("Final Population")
print(ga_instance.population)
Initial Population
[[-2 -1.22 1.716796875 -1 0.2]
 [-1 -1.58 -3.091796875 0 -1.3]
 [3 3.35 -0.107421875 1 -3.3]
 [-2 -3.58 -1.779296875 0 0.6]
 [2 -3.73 2.65234375 3 -0.5]]

Final Population
[[2 -4.22 3.47 3 -1.3]
 [2 -3.73 3.47 3 -1.3]
 [2 -4.22 3.47 2 -1.3]
 [2 -4.58 3.47 3 -1.3]
 [2 -3.73 3.47 3 -1.3]]

Parallel Processing in PyGAD

Starting from PyGAD 2.17.0, parallel processing becomes supported. This section explains how to use parallel processing in PyGAD.

According to the PyGAD lifecycle, parallel processing can be parallelized in only 2 operations:

  1. Population fitness calculation.

  2. Mutation.

The reason is that the calculations in these 2 operations are independent (i.e. each solution/chromosome is handled independently from the others) and can be distributed across different processes or threads.

For the mutation operation, it does not do intensive calculations on the CPU. Its calculations are simple like flipping the values of some genes from 0 to 1 or adding a random value to some genes. So, it does not take much CPU processing time. Experiments proved that parallelizing the mutation operation across the solutions increases the time instead of reducing it. This is because running multiple processes or threads adds overhead to manage them. Thus, parallel processing cannot be applied on the mutation operation.

For the population fitness calculation, parallel processing can help make a difference and reduce the processing time. But this is conditional on the type of calculations done in the fitness function. If the fitness function makes intensive calculations and takes much processing time from the CPU, then it is probably that parallel processing will help to cut down the overall time.

This section explains how parallel processing works in PyGAD and how to use parallel processing in PyGAD

How to Use Parallel Processing in PyGAD

Starting from PyGAD 2.17.0, a new parameter called parallel_processing added to the constructor of the pygad.GA class.

import pygad
...
ga_instance = pygad.GA(...,
                       parallel_processing=...)
...

This parameter allows the user to do the following:

  1. Enable parallel processing.

  2. Select whether processes or threads are used.

  3. Specify the number of processes or threads to be used.

These are 3 possible values for the parallel_processing parameter:

  1. None: (Default) It means no parallel processing is used.

  2. A positive integer referring to the number of threads to be used (i.e. threads, not processes, are used.

  3. list/tuple: If a list or a tuple of exactly 2 elements is assigned, then:

    1. The first element can be either 'process' or 'thread' to specify whether processes or threads are used, respectively.

    2. The second element can be:

      1. A positive integer to select the maximum number of processes or threads to be used

      2. 0 to indicate that 0 processes or threads are used. It means no parallel processing. This is identical to setting parallel_processing=None.

      3. None to use the default value as calculated by the concurrent.futures module.

These are examples of the values assigned to the parallel_processing parameter:

  • parallel_processing=4: Because the parameter is assigned a positive integer, this means parallel processing is activated where 4 threads are used.

  • parallel_processing=["thread", 5]: Use parallel processing with 5 threads. This is identical to parallel_processing=5.

  • parallel_processing=["process", 8]: Use parallel processing with 8 processes.

  • parallel_processing=["process", 0]: As the second element is given the value 0, this means do not use parallel processing. This is identical to parallel_processing=None.

Examples

The examples will help you know the difference between using processes and threads. Moreover, it will give an idea when parallel processing would make a difference and reduce the time. These are dummy examples where the fitness function is made to always return 0.

The first example uses 10 genes, 5 solutions in the population where only 3 solutions mate, and 9999 generations. The fitness function uses a for loop with 100 iterations just to have some calculations. In the constructor of the pygad.GA class, parallel_processing=None means no parallel processing is used.

import pygad
import time

def fitness_func(ga_instance, solution, solution_idx):
    for _ in range(99):
        pass
    return 0

ga_instance = pygad.GA(num_generations=9999,
                       num_parents_mating=3,
                       sol_per_pop=5,
                       num_genes=10,
                       fitness_func=fitness_func,
                       suppress_warnings=True,
                       parallel_processing=None)

if __name__ == '__main__':
    t1 = time.time()

    ga_instance.run()

    t2 = time.time()
    print("Time is", t2-t1)

When parallel processing is not used, the time it takes to run the genetic algorithm is 1.5 seconds.

In the comparison, let’s do a second experiment where parallel processing is used with 5 threads. In this case, it take 5 seconds.

...
ga_instance = pygad.GA(...,
                       parallel_processing=5)
...

For the third experiment, processes instead of threads are used. Also, only 99 generations are used instead of 9999. The time it takes is 99 seconds.

...
ga_instance = pygad.GA(num_generations=99,
                       ...,
                       parallel_processing=["process", 5])
...

This is the summary of the 3 experiments:

  1. No parallel processing & 9999 generations: 1.5 seconds.

  2. Parallel processing with 5 threads & 9999 generations: 5 seconds

  3. Parallel processing with 5 processes & 99 generations: 99 seconds

Because the fitness function does not need much CPU time, the normal processing takes the least time. Running processes for this simple problem takes 99 compared to only 5 seconds for threads because managing processes is much heavier than managing threads. Thus, most of the CPU time is for swapping the processes instead of executing the code.

In the second example, the loop makes 99999999 iterations and only 5 generations are used. With no parallelization, it takes 22 seconds.

import pygad
import time

def fitness_func(ga_instance, solution, solution_idx):
    for _ in range(99999999):
        pass
    return 0

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=3,
                       sol_per_pop=5,
                       num_genes=10,
                       fitness_func=fitness_func,
                       suppress_warnings=True,
                       parallel_processing=None)

if __name__ == '__main__':
    t1 = time.time()
    ga_instance.run()
    t2 = time.time()
    print("Time is", t2-t1)

It takes 15 seconds when 10 processes are used.

...
ga_instance = pygad.GA(...,
                       parallel_processing=["process", 10])
...

This is compared to 20 seconds when 10 threads are used.

...
ga_instance = pygad.GA(...,
                       parallel_processing=["thread", 10])
...

Based on the second example, using parallel processing with 10 processes takes the least time because there is much CPU work done. Generally, processes are preferred over threads when most of the work in on the CPU. Threads are preferred over processes in some situations like doing input/output operations.

Before releasing PyGAD 2.17.0, László Fazekas wrote an article to parallelize the fitness function with PyGAD. Check it: How Genetic Algorithms Can Compete with Gradient Descent and Backprop.

Logging Outputs

In PyGAD 3.0.0, the print() statement is no longer used and the outputs are printed using the logging module. A a new parameter called logger is supported to accept the user-defined logger.

import logging

logger = ...

ga_instance = pygad.GA(...,
                       logger=logger,
                       ...)

The default value for this parameter is None. If there is no logger passed (i.e. logger=None), then a default logger is created to log the messages to the console exactly like how the print() statement works.

Some advantages of using the the logging module instead of the print() statement are:

  1. The user has more control over the printed messages specially if there is a project that uses multiple modules where each module prints its messages. A logger can organize the outputs.

  2. Using the proper Handler, the user can log the output messages to files and not only restricted to printing it to the console. So, it is much easier to record the outputs.

  3. The format of the printed messages can be changed by customizing the Formatter assigned to the Logger.

This section gives some quick examples to use the logging module and then gives an example to use the logger with PyGAD.

Logging to the Console

This is an example to create a logger to log the messages to the console.

import logging

# Create a logger
logger = logging.getLogger(__name__)

# Set the logger level to debug so that all the messages are printed.
logger.setLevel(logging.DEBUG)

# Create a stream handler to log the messages to the console.
stream_handler = logging.StreamHandler()

# Set the handler level to debug.
stream_handler.setLevel(logging.DEBUG)

# Create a formatter
formatter = logging.Formatter('%(message)s')

# Add the formatter to handler.
stream_handler.setFormatter(formatter)

# Add the stream handler to the logger
logger.addHandler(stream_handler)

Now, we can log messages to the console with the format specified in the Formatter.

logger.debug('Debug message.')
logger.info('Info message.')
logger.warning('Warn message.')
logger.error('Error message.')
logger.critical('Critical message.')

The outputs are identical to those returned using the print() statement.

Debug message.
Info message.
Warn message.
Error message.
Critical message.

By changing the format of the output messages, we can have more information about each message.

formatter = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')

This is a sample output.

2023-04-03 18:46:27 DEBUG: Debug message.
2023-04-03 18:46:27 INFO: Info message.
2023-04-03 18:46:27 WARNING: Warn message.
2023-04-03 18:46:27 ERROR: Error message.
2023-04-03 18:46:27 CRITICAL: Critical message.

Note that you may need to clear the handlers after finishing the execution. This is to make sure no cached handlers are used in the next run. If the cached handlers are not cleared, then the single output message may be repeated.

logger.handlers.clear()

Logging to a File

This is another example to log the messages to a file named logfile.txt. The formatter prints the following about each message:

  1. The date and time at which the message is logged.

  2. The log level.

  3. The message.

  4. The path of the file.

  5. The lone number of the log message.

import logging

level = logging.DEBUG
name = 'logfile.txt'

logger = logging.getLogger(name)
logger.setLevel(level)

file_handler = logging.FileHandler(name, 'a+', 'utf-8')
file_handler.setLevel(logging.DEBUG)
file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(file_format)
logger.addHandler(file_handler)

This is how the outputs look like.

2023-04-03 18:54:03 DEBUG: Debug message. - c:\users\agad069\desktop\logger\example2.py:46
2023-04-03 18:54:03 INFO: Info message. - c:\users\agad069\desktop\logger\example2.py:47
2023-04-03 18:54:03 WARNING: Warn message. - c:\users\agad069\desktop\logger\example2.py:48
2023-04-03 18:54:03 ERROR: Error message. - c:\users\agad069\desktop\logger\example2.py:49
2023-04-03 18:54:03 CRITICAL: Critical message. - c:\users\agad069\desktop\logger\example2.py:50

Consider clearing the handlers if necessary.

logger.handlers.clear()

Log to Both the Console and a File

This is an example to create a single Logger associated with 2 handlers:

  1. A file handler.

  2. A stream handler.

import logging

level = logging.DEBUG
name = 'logfile.txt'

logger = logging.getLogger(name)
logger.setLevel(level)

file_handler = logging.FileHandler(name,'a+','utf-8')
file_handler.setLevel(logging.DEBUG)
file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(file_format)
logger.addHandler(file_handler)

console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_format = logging.Formatter('%(message)s')
console_handler.setFormatter(console_format)
logger.addHandler(console_handler)

When a log message is executed, then it is both printed to the console and saved in the logfile.txt.

Consider clearing the handlers if necessary.

logger.handlers.clear()

PyGAD Example

To use the logger in PyGAD, just create your custom logger and pass it to the logger parameter.

import logging
import pygad
import numpy

level = logging.DEBUG
name = 'logfile.txt'

logger = logging.getLogger(name)
logger.setLevel(level)

file_handler = logging.FileHandler(name,'a+','utf-8')
file_handler.setLevel(logging.DEBUG)
file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(file_format)
logger.addHandler(file_handler)

console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_format = logging.Formatter('%(message)s')
console_handler.setFormatter(console_format)
logger.addHandler(console_handler)

equation_inputs = [4, -2, 8]
desired_output = 2671.1234

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution * equation_inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
    return fitness

def on_generation(ga_instance):
    ga_instance.logger.info(f"Generation = {ga_instance.generations_completed}")
    ga_instance.logger.info(f"Fitness    = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}")

ga_instance = pygad.GA(num_generations=10,
                       sol_per_pop=40,
                       num_parents_mating=2,
                       keep_parents=2,
                       num_genes=len(equation_inputs),
                       fitness_func=fitness_func,
                       on_generation=on_generation,
                       logger=logger)
ga_instance.run()

logger.handlers.clear()

By executing this code, the logged messages are printed to the console and also saved in the text file.

2023-04-03 19:04:27 INFO: Generation = 1
2023-04-03 19:04:27 INFO: Fitness    = 0.00038086960368076276
2023-04-03 19:04:27 INFO: Generation = 2
2023-04-03 19:04:27 INFO: Fitness    = 0.00038214871408010853
2023-04-03 19:04:27 INFO: Generation = 3
2023-04-03 19:04:27 INFO: Fitness    = 0.0003832795907974678
2023-04-03 19:04:27 INFO: Generation = 4
2023-04-03 19:04:27 INFO: Fitness    = 0.00038398612055017196
2023-04-03 19:04:27 INFO: Generation = 5
2023-04-03 19:04:27 INFO: Fitness    = 0.00038442348890867516
2023-04-03 19:04:27 INFO: Generation = 6
2023-04-03 19:04:27 INFO: Fitness    = 0.0003854406039137763
2023-04-03 19:04:27 INFO: Generation = 7
2023-04-03 19:04:27 INFO: Fitness    = 0.00038646083174063284
2023-04-03 19:04:27 INFO: Generation = 8
2023-04-03 19:04:27 INFO: Fitness    = 0.0003875169193024936
2023-04-03 19:04:27 INFO: Generation = 9
2023-04-03 19:04:27 INFO: Fitness    = 0.0003888816727311021
2023-04-03 19:04:27 INFO: Generation = 10
2023-04-03 19:04:27 INFO: Fitness    = 0.000389832593101348

Solve Non-Deterministic Problems

PyGAD can be used to solve both deterministic and non-deterministic problems. Deterministic are those that return the same fitness for the same solution. For non-deterministic problems, a different fitness value would be returned for the same solution.

By default, PyGAD settings are set to solve deterministic problems. PyGAD can save the explored solutions and their fitness to reuse in the future. These instances attributes can save the solutions:

  1. solutions: Exists if save_solutions=True.

  2. best_solutions: Exists if save_best_solutions=True.

  3. last_generation_elitism: Exists if keep_elitism > 0.

  4. last_generation_parents: Exists if keep_parents > 0 or keep_parents=-1.

To configure PyGAD for non-deterministic problems, we have to disable saving the previous solutions. This is by setting these parameters:

  1. keep_elisitm=0

  2. keep_parents=0

  3. keep_solutions=False

  4. keep_best_solutions=False

import pygad
...
ga_instance = pygad.GA(...,
                       keep_elitism=0,
                       keep_parents=0,
                       save_solutions=False,
                       save_best_solutions=False,
                       ...)

This way PyGAD will not save any explored solution and thus the fitness function have to be called for each individual solution.

Reuse the Fitness instead of Calling the Fitness Function

It may happen that a previously explored solution in generation X is explored again in another generation Y (where Y > X). For some problems, calling the fitness function takes much time.

For deterministic problems, it is better to not call the fitness function for an already explored solutions. Instead, reuse the fitness of the old solution. PyGAD supports some options to help you save time calling the fitness function for a previously explored solution.

The parameters explored in this section can be set in the constructor of the pygad.GA class.

The cal_pop_fitness() method of the pygad.GA class checks these parameters to see if there is a possibility of reusing the fitness instead of calling the fitness function.

1. save_solutions

It defaults to False. If set to True, then the population of each generation is saved into the solutions attribute of the pygad.GA instance. In other words, every single solution is saved in the solutions attribute.

2. save_best_solutions

It defaults to False. If True, then it only saves the best solution in every generation.

3. keep_elitism

It accepts an integer and defaults to 1. If set to a positive integer, then it keeps the elitism of one generation available in the next generation.

4. keep_parents

It accepts an integer and defaults to -1. It set to -1 or a positive integer, then it keeps the parents of one generation available in the next generation.

Why the Fitness Function is not Called for Solution at Index 0?

PyGAD has a parameter called keep_elitism which defaults to 1. This parameter defines the number of best solutions in generation X to keep in the next generation X+1. The best solutions are just copied from generation X to generation X+1 without making any change.

ga_instance = pygad.GA(...,
                       keep_elitism=1,
                       ...)

The best solutions are copied at the beginning of the population. If keep_elitism=1, this means the best solution in generation X is kept in the next generation X+1 at index 0 of the population. If keep_elitism=2, this means the 2 best solutions in generation X are kept in the next generation X+1 at indices 0 and 1 of the population of generation 1.

Because the fitness of these best solutions are already calculated in generation X, then their fitness values will not be recalculated at generation X+1 (i.e. the fitness function will not be called for these solutions again). Instead, their fitness values are just reused. This is why you see that no solution with index 0 is passed to the fitness function.

To force calling the fitness function for each solution in every generation, consider setting keep_elitism and keep_parents to 0. Moreover, keep the 2 parameters save_solutions and save_best_solutions to their default value False.

ga_instance = pygad.GA(...,
                       keep_elitism=0,
                       keep_parents=0,
                       save_solutions=False,
                       save_best_solutions=False,
                       ...)

Batch Fitness Calculation

In PyGAD 2.19.0, a new optional parameter called fitness_batch_size is supported. A new optional parameter called fitness_batch_size is supported to calculate the fitness function in batches. Thanks to Linan Qiu for opening the GitHub issue #136.

Its values can be:

  • 1 or None: If the fitness_batch_size parameter is assigned the value 1 or None (default), then the normal flow is used where the fitness function is called for each individual solution. That is if there are 15 solutions, then the fitness function is called 15 times.

  • 1 < fitness_batch_size <= sol_per_pop: If the fitness_batch_size parameter is assigned a value satisfying this condition 1 < fitness_batch_size <= sol_per_pop, then the solutions are grouped into batches of size fitness_batch_size and the fitness function is called once for each batch. In this case, the fitness function must return a list/tuple/numpy.ndarray with a length equal to the number of solutions passed.

Example without fitness_batch_size Parameter

This is an example where the fitness_batch_size parameter is given the value None (which is the default value). This is equivalent to using the value 1. In this case, the fitness function will be called for each solution. This means the fitness function fitness_func will receive only a single solution. This is an example of the passed arguments to the fitness function:

solution: [ 2.52860734, -0.94178795, 2.97545704, 0.84131987, -3.78447118, 2.41008358]
solution_idx: 3

The fitness function also must return a single numeric value as the fitness for the passed solution.

As we have a population of 20 solutions, then the fitness function is called 20 times per generation. For 5 generations, then the fitness function is called 20*5 = 100 times. In PyGAD, the fitness function is called after the last generation too and this adds additional 20 times. So, the total number of calls to the fitness function is 20*5 + 20 = 120.

Note that the keep_elitism and keep_parents parameters are set to 0 to make sure no fitness values are reused and to force calling the fitness function for each individual solution.

import pygad
import numpy

function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44

number_of_calls = 0

def fitness_func(ga_instance, solution, solution_idx):
    global number_of_calls
    number_of_calls = number_of_calls + 1
    output = numpy.sum(solution*function_inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
    return fitness

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=10,
                       sol_per_pop=20,
                       fitness_func=fitness_func,
                       fitness_batch_size=None,
                       # fitness_batch_size=1,
                       num_genes=len(function_inputs),
                       keep_elitism=0,
                       keep_parents=0)

ga_instance.run()
print(number_of_calls)
120

Example with fitness_batch_size Parameter

This is an example where the fitness_batch_size parameter is used and assigned the value 4. This means the solutions will be grouped into batches of 4 solutions. The fitness function will be called once for each patch (i.e. called once for each 4 solutions).

This is an example of the arguments passed to it:

solutions:
    [[ 3.1129432  -0.69123589  1.93792414  2.23772968 -1.54616001 -0.53930799]
     [ 3.38508121  0.19890812  1.93792414  2.23095014 -3.08955597  3.10194128]
     [ 2.37079504 -0.88819803  2.97545704  1.41742256 -3.95594055  2.45028256]
     [ 2.52860734 -0.94178795  2.97545704  0.84131987 -3.78447118  2.41008358]]
solutions_indices:
    [16, 17, 18, 19]

As we have 20 solutions, then there are 20/4 = 5 patches. As a result, the fitness function is called only 5 times per generation instead of 20. For each call to the fitness function, it receives a batch of 4 solutions.

As we have 5 generations, then the function will be called 5*5 = 25 times. Given the call to the fitness function after the last generation, then the total number of calls is 5*5 + 5 = 30.

import pygad
import numpy

function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44

number_of_calls = 0

def fitness_func_batch(ga_instance, solutions, solutions_indices):
    global number_of_calls
    number_of_calls = number_of_calls + 1
    batch_fitness = []
    for solution in solutions:
        output = numpy.sum(solution*function_inputs)
        fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
        batch_fitness.append(fitness)
    return batch_fitness

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=10,
                       sol_per_pop=20,
                       fitness_func=fitness_func_batch,
                       fitness_batch_size=4,
                       num_genes=len(function_inputs),
                       keep_elitism=0,
                       keep_parents=0)

ga_instance.run()
print(number_of_calls)
30

When batch fitness calculation is used, then we saved 120 - 30 = 90 calls to the fitness function.

Use Functions and Methods to Build Fitness and Callbacks

In PyGAD 2.19.0, it is possible to pass user-defined functions or methods to the following parameters:

  1. fitness_func

  2. on_start

  3. on_fitness

  4. on_parents

  5. on_crossover

  6. on_mutation

  7. on_generation

  8. on_stop

This section gives 2 examples to assign these parameters user-defined:

  1. Functions.

  2. Methods.

Assign Functions

This is a dummy example where the fitness function returns a random value. Note that the instance of the pygad.GA class is passed as the last parameter of all functions.

import pygad
import numpy

def fitness_func(ga_instanse, solution, solution_idx):
    return numpy.random.rand()

def on_start(ga_instanse):
    print("on_start")

def on_fitness(ga_instanse, last_gen_fitness):
    print("on_fitness")

def on_parents(ga_instanse, last_gen_parents):
    print("on_parents")

def on_crossover(ga_instanse, last_gen_offspring):
    print("on_crossover")

def on_mutation(ga_instanse, last_gen_offspring):
    print("on_mutation")

def on_generation(ga_instanse):
    print("on_generation\n")

def on_stop(ga_instanse, last_gen_fitness):
    print("on_stop")

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=4,
                       sol_per_pop=10,
                       num_genes=2,
                       on_start=on_start,
                       on_fitness=on_fitness,
                       on_parents=on_parents,
                       on_crossover=on_crossover,
                       on_mutation=on_mutation,
                       on_generation=on_generation,
                       on_stop=on_stop,
                       fitness_func=fitness_func)

ga_instance.run()

Assign Methods

The next example has all the method defined inside the class Test. All of the methods accept an additional parameter representing the method’s object of the class Test.

All methods accept self as the first parameter and the instance of the pygad.GA class as the last parameter.

import pygad
import numpy

class Test:
    def fitness_func(self, ga_instanse, solution, solution_idx):
        return numpy.random.rand()

    def on_start(self, ga_instanse):
        print("on_start")

    def on_fitness(self, ga_instanse, last_gen_fitness):
        print("on_fitness")

    def on_parents(self, ga_instanse, last_gen_parents):
        print("on_parents")

    def on_crossover(self, ga_instanse, last_gen_offspring):
        print("on_crossover")

    def on_mutation(self, ga_instanse, last_gen_offspring):
        print("on_mutation")

    def on_generation(self, ga_instanse):
        print("on_generation\n")

    def on_stop(self, ga_instanse, last_gen_fitness):
        print("on_stop")

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=4,
                       sol_per_pop=10,
                       num_genes=2,
                       on_start=Test().on_start,
                       on_fitness=Test().on_fitness,
                       on_parents=Test().on_parents,
                       on_crossover=Test().on_crossover,
                       on_mutation=Test().on_mutation,
                       on_generation=Test().on_generation,
                       on_stop=Test().on_stop,
                       fitness_func=Test().fitness_func)

ga_instance.run()