More About PyGAD¶
Multi-Objective Optimization¶
In PyGAD 3.2.0, the library supports multi-objective optimization using the non-dominated sorting genetic algorithm II (NSGA-II). The code is exactly similar to the regular code used for single-objective optimization except for 1 difference. It is the return value of the fitness function.
In single-objective optimization, the fitness function returns a single
numeric value. In this example, the variable fitness
is expected to
be a numeric value.
def fitness_func(ga_instance, solution, solution_idx):
...
return fitness
But in multi-objective optimization, the fitness function returns any of these data types:
list
tuple
numpy.ndarray
def fitness_func(ga_instance, solution, solution_idx):
...
return [fitness1, fitness2, ..., fitnessN]
Whenever the fitness function returns an iterable of these data types, then the problem is considered multi-objective. This holds even if there is a single element in the returned iterable.
Other than the fitness function, everything else could be the same in both single and multi-objective problems.
But it is recommended to use one of these 2 parent selection operators to solve multi-objective problems:
nsga2
: This selects the parents based on non-dominated sorting and crowding distance.tournament_nsga2
: This selects the parents using tournament selection which uses non-dominated sorting and crowding distance to rank the solutions.
This is a multi-objective optimization example that optimizes these 2 linear functions:
y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6
y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12
Where:
(x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)
andy=50
(x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5)
andy=30
The 2 functions use the same parameters (weights) w1
to w6
.
The goal is to use PyGAD to find the optimal values for such weights
that satisfy the 2 functions y1
and y2
.
import pygad
import numpy
"""
Given these 2 functions:
y1 = f(w1:w6) = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + 6wx6
y2 = f(w1:w6) = w1x7 + w2x8 + w3x9 + w4x10 + w5x11 + 6wx12
where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7) and y=50
and (x7,x8,x9,x10,x11,x12)=(-2,0.7,-9,1.4,3,5) and y=30
What are the best values for the 6 weights (w1 to w6)? We are going to use the genetic algorithm to optimize these 2 functions.
This is a multi-objective optimization problem.
PyGAD considers the problem as multi-objective if the fitness function returns:
1) List.
2) Or tuple.
3) Or numpy.ndarray.
"""
function_inputs1 = [4,-2,3.5,5,-11,-4.7] # Function 1 inputs.
function_inputs2 = [-2,0.7,-9,1.4,3,5] # Function 2 inputs.
desired_output1 = 50 # Function 1 output.
desired_output2 = 30 # Function 2 output.
def fitness_func(ga_instance, solution, solution_idx):
output1 = numpy.sum(solution*function_inputs1)
output2 = numpy.sum(solution*function_inputs2)
fitness1 = 1.0 / (numpy.abs(output1 - desired_output1) + 0.000001)
fitness2 = 1.0 / (numpy.abs(output2 - desired_output2) + 0.000001)
return [fitness1, fitness2]
num_generations = 100
num_parents_mating = 10
sol_per_pop = 20
num_genes = len(function_inputs1)
ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
sol_per_pop=sol_per_pop,
num_genes=num_genes,
fitness_func=fitness_func,
parent_selection_type='nsga2')
ga_instance.run()
ga_instance.plot_fitness(label=['Obj 1', 'Obj 2'])
solution, solution_fitness, solution_idx = ga_instance.best_solution(ga_instance.last_generation_fitness)
print(f"Parameters of the best solution : {solution}")
print(f"Fitness value of the best solution = {solution_fitness}")
prediction = numpy.sum(numpy.array(function_inputs1)*solution)
print(f"Predicted output 1 based on the best solution : {prediction}")
prediction = numpy.sum(numpy.array(function_inputs2)*solution)
print(f"Predicted output 2 based on the best solution : {prediction}")
This is the result of the print statements. The predicted outputs are close to the desired outputs.
Parameters of the best solution : [ 0.79676439 -2.98823386 -4.12677662 5.70539445 -2.02797016 -1.07243922]
Fitness value of the best solution = [ 1.68090829 349.8591915 ]
Predicted output 1 based on the best solution : 50.59491545442283
Predicted output 2 based on the best solution : 29.99714270722312
This is the figure created by the plot_fitness()
method. The fitness
of the first objective has the green color. The blue color is used for
the second objective fitness.
Limit the Gene Value Range using the gene_space
Parameter¶
In PyGAD
2.11.0,
the gene_space
parameter supported a new feature to allow
customizing the range of accepted values for each gene. Let’s take a
quick review of the gene_space
parameter to build over it.
The gene_space
parameter allows the user to feed the space of values
of each gene. This way the accepted values for each gene is retracted to
the user-defined values. Assume there is a problem that has 3 genes
where each gene has different set of values as follows:
Gene 1:
[0.4, 12, -5, 21.2]
Gene 2:
[-2, 0.3]
Gene 3:
[1.2, 63.2, 7.4]
Then, the gene_space
for this problem is as given below. Note that
the order is very important.
gene_space = [[0.4, 12, -5, 21.2],
[-2, 0.3],
[1.2, 63.2, 7.4]]
In case all genes share the same set of values, then simply feed a
single list to the gene_space
parameter as follows. In this case,
all genes can only take values from this list of 6 values.
gene_space = [33, 7, 0.5, 95. 6.3, 0.74]
The previous example restricts the gene values to just a set of fixed
number of discrete values. In case you want to use a range of discrete
values to the gene, then you can use the range()
function. For
example, range(1, 7)
means the set of allowed values for the gene
are 1, 2, 3, 4, 5, and 6
. You can also use the numpy.arange()
or
numpy.linspace()
functions for the same purpose.
The previous discussion only works with a range of discrete values not
continuous values. In PyGAD
2.11.0,
the gene_space
parameter can be assigned a dictionary that allows
the gene to have values from a continuous range.
Assuming you want to restrict the gene within this half-open range [1 to 5) where 1 is included and 5 is not. Then simply create a dictionary with 2 items where the keys of the 2 items are:
'low'
: The minimum value in the range which is 1 in the example.'high'
: The maximum value in the range which is 5 in the example.
The dictionary will look like that:
{'low': 1,
'high': 5}
It is not acceptable to add more than 2 items in the dictionary or use
other keys than 'low'
and 'high'
.
For a 3-gene problem, the next code creates a dictionary for each gene to restrict its values in a continuous range. For the first gene, it can take any floating-point value from the range that starts from 1 (inclusive) and ends at 5 (exclusive).
gene_space = [{'low': 1, 'high': 5}, {'low': 0.3, 'high': 1.4}, {'low': -0.2, 'high': 4.5}]
More about the gene_space
Parameter¶
The gene_space
parameter customizes the space of values of each
gene.
Assuming that all genes have the same global space which include the
values 0.3, 5.2, -4, and 8, then those values can be assigned to the
gene_space
parameter as a list, tuple, or range. Here is a list
assigned to this parameter. By doing that, then the gene values are
restricted to those assigned to the gene_space
parameter.
gene_space = [0.3, 5.2, -4, 8]
If some genes have different spaces, then gene_space
should accept a
nested list or tuple. In this case, the elements could be:
Number (of
int
,float
, orNumPy
data types): A single value to be assigned to the gene. This means this gene will have the same value across all generations.list
,tuple
,numpy.ndarray
, or any range likerange
,numpy.arange()
, ornumpy.linspace
: It holds the space for each individual gene. But this space is usually discrete. That is there is a set of finite values to select from.dict
: To sample a value for a gene from a continuous range. The dictionary must have 2 mandatory keys which are"low"
and"high"
in addition to an optional key which is"step"
. A random value is returned between the values assigned to the items with"low"
and"high"
keys. If the"step"
exists, then this works as the previous options (i.e. discrete set of values).None
: A gene with its space set toNone
is initialized randomly from the range specified by the 2 parametersinit_range_low
andinit_range_high
. For mutation, its value is mutated based on a random value from the range specified by the 2 parametersrandom_mutation_min_val
andrandom_mutation_max_val
. If all elements in thegene_space
parameter areNone
, the parameter will not have any effect.
Assuming that a chromosome has 2 genes and each gene has a different
value space. Then the gene_space
could be assigned a nested
list/tuple where each element determines the space of a gene.
According to the next code, the space of the first gene is [0.4, -5]
which has 2 values and the space for the second gene is
[0.5, -3.2, 8.8, -9]
which has 4 values.
gene_space = [[0.4, -5], [0.5, -3.2, 8.2, -9]]
For a 2 gene chromosome, if the first gene space is restricted to the discrete values from 0 to 4 and the second gene is restricted to the values from 10 to 19, then it could be specified according to the next code.
gene_space = [range(5), range(10, 20)]
The gene_space
can also be assigned to a single range, as given
below, where the values of all genes are sampled from the same range.
gene_space = numpy.arange(15)
The gene_space
can be assigned a dictionary to sample a value from a
continuous range.
gene_space = {"low": 4, "high": 30}
A step also can be assigned to the dictionary. This works as if a range is used.
gene_space = {"low": 4, "high": 30, "step": 2.5}
Setting a
dict
like{"low": 0, "high": 10}
in thegene_space
means that random values from the continuous range [0, 10) are sampled. Note that0
is included but10
is not included while sampling. Thus, the maximum value that could be returned is less than10
like9.9999
. But if the user decided to round the genes using, for example,[float, 2]
, then this value will become 10. So, the user should be careful to the inputs.
If a None
is assigned to only a single gene, then its value will be
randomly generated initially using the init_range_low
and
init_range_high
parameters in the pygad.GA
class’s constructor.
During mutation, the value are sampled from the range defined by the 2
parameters random_mutation_min_val
and random_mutation_max_val
.
This is an example where the second gene is given a None
value.
gene_space = [range(5), None, numpy.linspace(10, 20, 300)]
If the user did not assign the initial population to the
initial_population
parameter, the initial population is created
randomly based on the gene_space
parameter. Moreover, the mutation
is applied based on this parameter.
How Mutation Works with the gene_space
Parameter?¶
Mutation changes based on whether the gene_space
has a continuous
range or discrete set of values.
If a gene has its static/discrete space defined in the
gene_space
parameter, then mutation works by replacing the gene
value by a value randomly selected from the gene space. This happens for
both int
and float
data types.
For example, the following gene_space
has the static space
[1, 2, 3]
defined for the first gene. So, this gene can only have a
value out of these 3 values.
Gene space: [[1, 2, 3],
None]
Solution: [1, 5]
For a solution like [1, 5]
, then mutation happens for the first gene
by simply replacing its current value by a randomly selected value
(other than its current value if possible). So, the value 1 will be
replaced by either 2 or 3.
For the second gene, its space is set to None
. So, traditional
mutation happens for this gene by:
Generating a random value from the range defined by the
random_mutation_min_val
andrandom_mutation_max_val
parameters.Adding this random value to the current gene’s value.
If its current value is 5 and the random value is -0.5
, then the new
value is 4.5. If the gene type is integer, then the value will be
rounded.
On the other hand, if a gene has a continuous space defined in the
gene_space
parameter, then mutation occurs by adding a random value
to the current gene value.
For example, the following gene_space
has the continuous space
defined by the dictionary {'low': 1, 'high': 5}
. This applies to all
genes. So, mutation is applied to one or more selected genes by adding a
random value to the current gene value.
Gene space: {'low': 1, 'high': 5}
Solution: [1.5, 3.4]
Assuming random_mutation_min_val=-1
and
random_mutation_max_val=1
, then a random value such as 0.3
can
be added to the gene(s) participating in mutation. If only the first
gene is mutated, then its new value changes from 1.5
to
1.5+0.3=1.8
. Note that PyGAD verifies that the new value is within
the range. In the worst scenarios, the value will be set to either
boundary of the continuous range. For example, if the gene value is 1.5
and the random value is -0.55, then the new value is 0.95 which smaller
than the lower boundary 1. Thus, the gene value will be rounded to 1.
If the dictionary has a step like the example below, then it is considered a discrete range and mutation occurs by randomly selecting a value from the set of values. In other words, no random value is added to the gene value.
Gene space: {'low': 1, 'high': 5, 'step': 0.5}
Stop at Any Generation¶
In PyGAD
2.4.0,
it is possible to stop the genetic algorithm after any generation. All
you need to do it to return the string "stop"
in the callback
function on_generation
. When this callback function is implemented
and assigned to the on_generation
parameter in the constructor of
the pygad.GA
class, then the algorithm immediately stops after
completing its current generation. Let’s discuss an example.
Assume that the user wants to stop algorithm either after the 100
generations or if a condition is met. The user may assign a value of 100
to the num_generations
parameter of the pygad.GA
class
constructor.
The condition that stops the algorithm is written in a callback function
like the one in the next code. If the fitness value of the best solution
exceeds 70, then the string "stop"
is returned.
def func_generation(ga_instance):
if ga_instance.best_solution()[1] >= 70:
return "stop"
Stop Criteria¶
In PyGAD
2.15.0,
a new parameter named stop_criteria
is added to the constructor of
the pygad.GA
class. It helps to stop the evolution based on some
criteria. It can be assigned to one or more criterion.
Each criterion is passed as str
that consists of 2 parts:
Stop word.
Number.
It takes this form:
"word_num"
The current 2 supported words are reach
and saturate
.
The reach
word stops the run()
method if the fitness value is
equal to or greater than a given fitness value. An example for reach
is "reach_40"
which stops the evolution if the fitness is >= 40.
saturate
stops the evolution if the fitness saturates for a given
number of consecutive generations. An example for saturate
is
"saturate_7"
which means stop the run()
method if the fitness
does not change for 7 consecutive generations.
Here is an example that stops the evolution if either the fitness value
reached 127.4
or if the fitness saturates for 15
generations.
import pygad
import numpy
equation_inputs = [4, -2, 3.5, 8, 9, 4]
desired_output = 44
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution * equation_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
ga_instance = pygad.GA(num_generations=200,
sol_per_pop=10,
num_parents_mating=4,
num_genes=len(equation_inputs),
fitness_func=fitness_func,
stop_criteria=["reach_127.4", "saturate_15"])
ga_instance.run()
print(f"Number of generations passed is {ga_instance.generations_completed}")
Elitism Selection¶
In PyGAD
2.18.0,
a new parameter called keep_elitism
is supported. It accepts an
integer to define the number of elitism (i.e. best solutions) to keep in
the next generation. This parameter defaults to 1
which means only
the best solution is kept in the next generation.
In the next example, the keep_elitism
parameter in the constructor
of the pygad.GA
class is set to 2. Thus, the best 2 solutions in
each generation are kept in the next generation.
import numpy
import pygad
function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution*function_inputs)
fitness = 1.0 / numpy.abs(output - desired_output)
return fitness
ga_instance = pygad.GA(num_generations=2,
num_parents_mating=3,
fitness_func=fitness_func,
num_genes=6,
sol_per_pop=5,
keep_elitism=2)
ga_instance.run()
The value passed to the keep_elitism
parameter must satisfy 2
conditions:
It must be
>= 0
.It must be
<= sol_per_pop
. That is its value cannot exceed the number of solutions in the current population.
In the previous example, if the keep_elitism
parameter is set equal
to the value passed to the sol_per_pop
parameter, which is 5, then
there will be no evolution at all as in the next figure. This is because
all the 5 solutions are used as elitism in the next generation and no
offspring will be created.
...
ga_instance = pygad.GA(...,
sol_per_pop=5,
keep_elitism=5)
ga_instance.run()
Note that if the keep_elitism
parameter is effective (i.e. is
assigned a positive integer, not zero), then the keep_parents
parameter will have no effect. Because the default value of the
keep_elitism
parameter is 1, then the keep_parents
parameter has
no effect by default. The keep_parents
parameter is only effective
when keep_elitism=0
.
Random Seed¶
In PyGAD
2.18.0,
a new parameter called random_seed
is supported. Its value is used
as a seed for the random function generators.
PyGAD uses random functions in these 2 libraries:
NumPy
random
The random_seed
parameter defaults to None
which means no seed
is used. As a result, different random numbers are generated for each
run of PyGAD.
If this parameter is assigned a proper seed, then the results will be reproducible. In the next example, the integer 2 is used as a random seed.
import numpy
import pygad
function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution*function_inputs)
fitness = 1.0 / numpy.abs(output - desired_output)
return fitness
ga_instance = pygad.GA(num_generations=2,
num_parents_mating=3,
fitness_func=fitness_func,
sol_per_pop=5,
num_genes=6,
random_seed=2)
ga_instance.run()
best_solution, best_solution_fitness, best_match_idx = ga_instance.best_solution()
print(best_solution)
print(best_solution_fitness)
This is the best solution found and its fitness value.
[ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267]
0.04872203136549972
After running the code again, it will find the same result.
[ 2.77249188 -4.06570662 0.04196872 -3.47770796 -0.57502138 -3.22775267]
0.04872203136549972
Continue without Losing Progress
In PyGAD
2.18.0,
and thanks for Felix Bernhard for
opening this GitHub
issue,
the values of these 4 instance attributes are no longer reset after each
call to the run()
method.
self.best_solutions
self.best_solutions_fitness
self.solutions
self.solutions_fitness
This helps the user to continue where the last run stopped without losing the values of these 4 attributes.
Now, the user can save the model by calling the save()
method.
import pygad
def fitness_func(ga_instance, solution, solution_idx):
...
return fitness
ga_instance = pygad.GA(...)
ga_instance.run()
ga_instance.plot_fitness()
ga_instance.save("pygad_GA")
Then the saved model is loaded by calling the load()
function. After
calling the run()
method over the loaded instance, then the data
from the previous 4 attributes are not reset but extended with the new
data.
import pygad
def fitness_func(ga_instance, solution, solution_idx):
...
return fitness
loaded_ga_instance = pygad.load("pygad_GA")
loaded_ga_instance.run()
loaded_ga_instance.plot_fitness()
The plot created by the plot_fitness()
method will show the data
collected from both the runs.
Note that the 2 attributes (self.best_solutions
and
self.best_solutions_fitness
) only work if the
save_best_solutions
parameter is set to True
. Also, the 2
attributes (self.solutions
and self.solutions_fitness
) only work
if the save_solutions
parameter is True
.
Change Population Size during Runtime¶
Starting from PyGAD 3.3.0, the population size can changed during runtime. In other words, the number of solutions/chromosomes and number of genes can be changed.
The user has to carefully arrange the list of parameters and instance attributes that have to be changed to keep the GA consistent before and after changing the population size. Generally, change everything that would be used during the GA evolution.
CAUTION: If the user failed to change a parameter or an instance attributes necessary to keep the GA running after the population size changed, errors will arise.
These are examples of the parameters that the user should decide whether to change. The user should check the list of parameters and decide what to change.
population
: The population. It must be changed.num_offspring
: The number of offspring to produce out of the crossover and mutation operations. Change this parameter if the number of offspring have to be changed to be consistent with the new population size.num_parents_mating
: The number of solutions to select as parents. Change this parameter if the number of parents have to be changed to be consistent with the new population size.fitness_func
: If the way of calculating the fitness changes after the new population size, then the fitness function have to be changed.sol_per_pop
: The number of solutions per population. It is not critical to change it but it is recommended to keep this number consistent with the number of solutions in thepopulation
parameter.
These are examples of the instance attributes that might be changed. The user should check the list of instance attributes and decide what to change.
All the
last_generation_*
parameterslast_generation_fitness
: A 1D NumPy array of fitness values of the population.last_generation_parents
andlast_generation_parents_indices
: Two NumPy arrays: 2D array representing the parents and 1D array of the parents indices.last_generation_elitism
andlast_generation_elitism_indices
: Must be changed ifkeep_elitism != 0
. The default value ofkeep_elitism
is 1. Two NumPy arrays: 2D array representing the elitism and 1D array of the elitism indices.
pop_size
: The population size.
Prevent Duplicates in Gene Values¶
In PyGAD
2.13.0,
a new bool parameter called allow_duplicate_genes
is supported to
control whether duplicates are supported in the chromosome or not. In
other words, whether 2 or more genes might have the same exact value.
If allow_duplicate_genes=True
(which is the default case), genes may
have the same value. If allow_duplicate_genes=False
, then no 2 genes
will have the same value given that there are enough unique values for
the genes.
The next code gives an example to use the allow_duplicate_genes
parameter. A callback generation function is implemented to print the
population after each generation.
import pygad
def fitness_func(ga_instance, solution, solution_idx):
return 0
def on_generation(ga):
print("Generation", ga.generations_completed)
print(ga.population)
ga_instance = pygad.GA(num_generations=5,
sol_per_pop=5,
num_genes=4,
mutation_num_genes=3,
random_mutation_min_val=-5,
random_mutation_max_val=5,
num_parents_mating=2,
fitness_func=fitness_func,
gene_type=int,
on_generation=on_generation,
allow_duplicate_genes=False)
ga_instance.run()
Here are the population after the 5 generations. Note how there are no duplicate values.
Generation 1
[[ 2 -2 -3 3]
[ 0 1 2 3]
[ 5 -3 6 3]
[-3 1 -2 4]
[-1 0 -2 3]]
Generation 2
[[-1 0 -2 3]
[-3 1 -2 4]
[ 0 -3 -2 6]
[-3 0 -2 3]
[ 1 -4 2 4]]
Generation 3
[[ 1 -4 2 4]
[-3 0 -2 3]
[ 4 0 -2 1]
[-4 0 -2 -3]
[-4 2 0 3]]
Generation 4
[[-4 2 0 3]
[-4 0 -2 -3]
[-2 5 4 -3]
[-1 2 -4 4]
[-4 2 0 -3]]
Generation 5
[[-4 2 0 -3]
[-1 2 -4 4]
[ 3 4 -4 0]
[-1 0 2 -2]
[-4 2 -1 1]]
The allow_duplicate_genes
parameter is configured with use with the
gene_space
parameter. Here is an example where each of the 4 genes
has the same space of values that consists of 4 values (1, 2, 3, and 4).
import pygad
def fitness_func(ga_instance, solution, solution_idx):
return 0
def on_generation(ga):
print("Generation", ga.generations_completed)
print(ga.population)
ga_instance = pygad.GA(num_generations=1,
sol_per_pop=5,
num_genes=4,
num_parents_mating=2,
fitness_func=fitness_func,
gene_type=int,
gene_space=[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]],
on_generation=on_generation,
allow_duplicate_genes=False)
ga_instance.run()
Even that all the genes share the same space of values, no 2 genes duplicate their values as provided by the next output.
Generation 1
[[2 3 1 4]
[2 3 1 4]
[2 4 1 3]
[2 3 1 4]
[1 3 2 4]]
Generation 2
[[1 3 2 4]
[2 3 1 4]
[1 3 2 4]
[2 3 4 1]
[1 3 4 2]]
Generation 3
[[1 3 4 2]
[2 3 4 1]
[1 3 4 2]
[3 1 4 2]
[3 2 4 1]]
Generation 4
[[3 2 4 1]
[3 1 4 2]
[3 2 4 1]
[1 2 4 3]
[1 3 4 2]]
Generation 5
[[1 3 4 2]
[1 2 4 3]
[2 1 4 3]
[1 2 4 3]
[1 2 4 3]]
You should care of giving enough values for the genes so that PyGAD is able to find alternatives for the gene value in case it duplicates with another gene.
There might be 2 duplicate genes where changing either of the 2
duplicating genes will not solve the problem. For example, if
gene_space=[[3, 0, 1], [4, 1, 2], [0, 2], [3, 2, 0]]
and the
solution is [3 2 0 0]
, then the values of the last 2 genes
duplicate. There are no possible changes in the last 2 genes to solve
the problem.
This problem can be solved by randomly changing one of the
non-duplicating genes that may make a room for a unique value in one the
2 duplicating genes. For example, by changing the second gene from 2 to
4, then any of the last 2 genes can take the value 2 and solve the
duplicates. The resultant gene is then [3 4 2 0]
. But this option is
not yet supported in PyGAD.
Solve Duplicates using a Third Gene¶
When allow_duplicate_genes=False
and a user-defined gene_space
is used, it sometimes happen that there is no room to solve the
duplicates between the 2 genes by simply replacing the value of one gene
by another gene. In PyGAD
3.1.0,
the duplicates are solved by looking for a third gene that will help in
solving the duplicates. The following examples explain how it works.
Example 1:
Let’s assume that this gene space is used and there is a solution with 2 duplicate genes with the same value 4.
Gene space: [[2, 3],
[3, 4],
[4, 5],
[5, 6]]
Solution: [3, 4, 4, 5]
By checking the gene space, the second gene can have the values
[3, 4]
and the third gene can have the values [4, 5]
. To solve
the duplicates, we have the value of any of these 2 genes.
If the value of the second gene changes from 4 to 3, then it will be duplicate with the first gene. If we are to change the value of the third gene from 4 to 5, then it will duplicate with the fourth gene. As a conclusion, trying to just selecting a different gene value for either the second or third genes will introduce new duplicating genes.
When there are 2 duplicate genes but there is no way to solve their duplicates, then the solution is to change a third gene that makes a room to solve the duplicates between the 2 genes.
In our example, duplicates between the second and third genes can be solved by, for example,:
Changing the first gene from 3 to 2 then changing the second gene from 4 to 3.
Or changing the fourth gene from 5 to 6 then changing the third gene from 4 to 5.
Generally, this is how to solve such duplicates:
For any duplicate gene GENE1, select another value.
Check which other gene GENEX has duplicate with this new value.
Find if GENEX can have another value that will not cause any more duplicates. If so, go to step 7.
If all the other values of GENEX will cause duplicates, then try another gene GENEY.
Repeat steps 3 and 4 until exploring all the genes.
If there is no possibility to solve the duplicates, then there is not way to solve the duplicates and we have to keep the duplicate value.
If a value for a gene GENEM is found that will not cause more duplicates, then use this value for the gene GENEM.
Replace the value of the gene GENE1 by the old value of the gene GENEM. This solves the duplicates.
This is an example to solve the duplicate for the solution
[3, 4, 4, 5]
:
Let’s use the second gene with value 4. Because the space of this gene is
[3, 4]
, then the only other value we can select is 3.The first gene also have the value 3.
The first gene has another value 2 that will not cause more duplicates in the solution. Then go to step 7.
Skip.
Skip.
Skip.
The value of the first gene 3 will be replaced by the new value 2. The new solution is [2, 4, 4, 5].
Replace the value of the second gene 4 by the old value of the first gene which is 3. The new solution is [2, 3, 4, 5]. The duplicate is solved.
Example 2:
Gene space: [[0, 1],
[1, 2],
[2, 3],
[3, 4]]
Solution: [1, 2, 2, 3]
The quick summary is:
Change the value of the first gene from 1 to 0. The solution becomes [0, 2, 2, 3].
Change the value of the second gene from 2 to 1. The solution becomes [0, 1, 2, 3]. The duplicate is solved.
More about the gene_type
Parameter¶
The gene_type
parameter allows the user to control the data type for
all genes at once or each individual gene. In PyGAD
2.15.0,
the gene_type
parameter also supports customizing the precision for
float
data types. As a result, the gene_type
parameter helps to:
Select a data type for all genes with or without precision.
Select a data type for each individual gene with or without precision.
Let’s discuss things by examples.
Data Type for All Genes without Precision¶
The data type for all genes can be specified by assigning the numeric
data type directly to the gene_type
parameter. This is an example to
make all genes of int
data types.
gene_type=int
Given that the supported numeric data types of PyGAD include Python’s
int
and float
in addition to all numeric types of NumPy
,
then any of these types can be assigned to the gene_type
parameter.
If no precision is specified for a float
data type, then the
complete floating-point number is kept.
The next code uses an int
data type for all genes where the genes in
the initial and final population are only integers.
import pygad
import numpy
equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution * equation_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
ga_instance = pygad.GA(num_generations=10,
sol_per_pop=5,
num_parents_mating=2,
num_genes=len(equation_inputs),
fitness_func=fitness_func,
gene_type=int)
print("Initial Population")
print(ga_instance.initial_population)
ga_instance.run()
print("Final Population")
print(ga_instance.population)
Initial Population
[[ 1 -1 2 0 -3]
[ 0 -2 0 -3 -1]
[ 0 -1 -1 2 0]
[-2 3 -2 3 3]
[ 0 0 2 -2 -2]]
Final Population
[[ 1 -1 2 2 0]
[ 1 -1 2 2 0]
[ 1 -1 2 2 0]
[ 1 -1 2 2 0]
[ 1 -1 2 2 0]]
Data Type for All Genes with Precision¶
A precision can only be specified for a float
data type and cannot
be specified for integers. Here is an example to use a precision of 3
for the float
data type. In this case, all genes are of type
float
and their maximum precision is 3.
gene_type=[float, 3]
The next code uses prints the initial and final population where the
genes are of type float
with precision 3.
import pygad
import numpy
equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution * equation_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
ga_instance = pygad.GA(num_generations=10,
sol_per_pop=5,
num_parents_mating=2,
num_genes=len(equation_inputs),
fitness_func=fitness_func,
gene_type=[float, 3])
print("Initial Population")
print(ga_instance.initial_population)
ga_instance.run()
print("Final Population")
print(ga_instance.population)
Initial Population
[[-2.417 -0.487 3.623 2.457 -2.362]
[-1.231 0.079 -1.63 1.629 -2.637]
[ 0.692 -2.098 0.705 0.914 -3.633]
[ 2.637 -1.339 -1.107 -0.781 -3.896]
[-1.495 1.378 -1.026 3.522 2.379]]
Final Population
[[ 1.714 -1.024 3.623 3.185 -2.362]
[ 0.692 -1.024 3.623 3.185 -2.362]
[ 0.692 -1.024 3.623 3.375 -2.362]
[ 0.692 -1.024 4.041 3.185 -2.362]
[ 1.714 -0.644 3.623 3.185 -2.362]]
Data Type for each Individual Gene without Precision¶
In PyGAD
2.14.0,
the gene_type
parameter allows customizing the gene type for each
individual gene. This is by using a list
/tuple
/numpy.ndarray
with number of elements equal to the number of genes. For each element,
a type is specified for the corresponding gene.
This is an example for a 5-gene problem where different types are assigned to the genes.
gene_type=[int, float, numpy.float16, numpy.int8, float]
This is a complete code that prints the initial and final population for a custom-gene data type.
import pygad
import numpy
equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution * equation_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
ga_instance = pygad.GA(num_generations=10,
sol_per_pop=5,
num_parents_mating=2,
num_genes=len(equation_inputs),
fitness_func=fitness_func,
gene_type=[int, float, numpy.float16, numpy.int8, float])
print("Initial Population")
print(ga_instance.initial_population)
ga_instance.run()
print("Final Population")
print(ga_instance.population)
Initial Population
[[0 0.8615522360026828 0.7021484375 -2 3.5301821368185866]
[-3 2.648189378595294 -3.830078125 1 -0.9586271572917742]
[3 3.7729827570110714 1.2529296875 -3 1.395741994211889]
[0 1.0490687178053282 1.51953125 -2 0.7243617940450235]
[0 -0.6550158436937226 -2.861328125 -2 1.8212734549263097]]
Final Population
[[3 3.7729827570110714 2.055 0 0.7243617940450235]
[3 3.7729827570110714 1.458 0 -0.14638754050305036]
[3 3.7729827570110714 1.458 0 0.0869406120516778]
[3 3.7729827570110714 1.458 0 0.7243617940450235]
[3 3.7729827570110714 1.458 0 -0.14638754050305036]]
Data Type for each Individual Gene with Precision¶
The precision can also be specified for the float
data types as in
the next line where the second gene precision is 2 and last gene
precision is 1.
gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]]
This is a complete example where the initial and final populations are printed where the genes comply with the data types and precisions specified.
import pygad
import numpy
equation_inputs = [4, -2, 3.5, 8, -2]
desired_output = 2671.1234
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution * equation_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
ga_instance = pygad.GA(num_generations=10,
sol_per_pop=5,
num_parents_mating=2,
num_genes=len(equation_inputs),
fitness_func=fitness_func,
gene_type=[int, [float, 2], numpy.float16, numpy.int8, [float, 1]])
print("Initial Population")
print(ga_instance.initial_population)
ga_instance.run()
print("Final Population")
print(ga_instance.population)
Initial Population
[[-2 -1.22 1.716796875 -1 0.2]
[-1 -1.58 -3.091796875 0 -1.3]
[3 3.35 -0.107421875 1 -3.3]
[-2 -3.58 -1.779296875 0 0.6]
[2 -3.73 2.65234375 3 -0.5]]
Final Population
[[2 -4.22 3.47 3 -1.3]
[2 -3.73 3.47 3 -1.3]
[2 -4.22 3.47 2 -1.3]
[2 -4.58 3.47 3 -1.3]
[2 -3.73 3.47 3 -1.3]]
Parallel Processing in PyGAD¶
Starting from PyGAD 2.17.0, parallel processing becomes supported. This section explains how to use parallel processing in PyGAD.
According to the PyGAD lifecycle, parallel processing can be parallelized in only 2 operations:
Population fitness calculation.
Mutation.
The reason is that the calculations in these 2 operations are independent (i.e. each solution/chromosome is handled independently from the others) and can be distributed across different processes or threads.
For the mutation operation, it does not do intensive calculations on the CPU. Its calculations are simple like flipping the values of some genes from 0 to 1 or adding a random value to some genes. So, it does not take much CPU processing time. Experiments proved that parallelizing the mutation operation across the solutions increases the time instead of reducing it. This is because running multiple processes or threads adds overhead to manage them. Thus, parallel processing cannot be applied on the mutation operation.
For the population fitness calculation, parallel processing can help make a difference and reduce the processing time. But this is conditional on the type of calculations done in the fitness function. If the fitness function makes intensive calculations and takes much processing time from the CPU, then it is probably that parallel processing will help to cut down the overall time.
This section explains how parallel processing works in PyGAD and how to use parallel processing in PyGAD
How to Use Parallel Processing in PyGAD¶
Starting from PyGAD
2.17.0,
a new parameter called parallel_processing
added to the constructor
of the pygad.GA
class.
import pygad
...
ga_instance = pygad.GA(...,
parallel_processing=...)
...
This parameter allows the user to do the following:
Enable parallel processing.
Select whether processes or threads are used.
Specify the number of processes or threads to be used.
These are 3 possible values for the parallel_processing
parameter:
None
: (Default) It means no parallel processing is used.A positive integer referring to the number of threads to be used (i.e. threads, not processes, are used.
list
/tuple
: If a list or a tuple of exactly 2 elements is assigned, then:The first element can be either
'process'
or'thread'
to specify whether processes or threads are used, respectively.The second element can be:
A positive integer to select the maximum number of processes or threads to be used
0
to indicate that 0 processes or threads are used. It means no parallel processing. This is identical to settingparallel_processing=None
.None
to use the default value as calculated by theconcurrent.futures module
.
These are examples of the values assigned to the parallel_processing
parameter:
parallel_processing=4
: Because the parameter is assigned a positive integer, this means parallel processing is activated where 4 threads are used.parallel_processing=["thread", 5]
: Use parallel processing with 5 threads. This is identical toparallel_processing=5
.parallel_processing=["process", 8]
: Use parallel processing with 8 processes.parallel_processing=["process", 0]
: As the second element is given the value 0, this means do not use parallel processing. This is identical toparallel_processing=None
.
Examples¶
The examples will help you know the difference between using processes and threads. Moreover, it will give an idea when parallel processing would make a difference and reduce the time. These are dummy examples where the fitness function is made to always return 0.
The first example uses 10 genes, 5 solutions in the population where
only 3 solutions mate, and 9999 generations. The fitness function uses a
for
loop with 100 iterations just to have some calculations. In the
constructor of the pygad.GA
class, parallel_processing=None
means no parallel processing is used.
import pygad
import time
def fitness_func(ga_instance, solution, solution_idx):
for _ in range(99):
pass
return 0
ga_instance = pygad.GA(num_generations=9999,
num_parents_mating=3,
sol_per_pop=5,
num_genes=10,
fitness_func=fitness_func,
suppress_warnings=True,
parallel_processing=None)
if __name__ == '__main__':
t1 = time.time()
ga_instance.run()
t2 = time.time()
print("Time is", t2-t1)
When parallel processing is not used, the time it takes to run the
genetic algorithm is 1.5
seconds.
In the comparison, let’s do a second experiment where parallel
processing is used with 5 threads. In this case, it take 5
seconds.
...
ga_instance = pygad.GA(...,
parallel_processing=5)
...
For the third experiment, processes instead of threads are used. Also,
only 99 generations are used instead of 9999. The time it takes is
99
seconds.
...
ga_instance = pygad.GA(num_generations=99,
...,
parallel_processing=["process", 5])
...
This is the summary of the 3 experiments:
No parallel processing & 9999 generations: 1.5 seconds.
Parallel processing with 5 threads & 9999 generations: 5 seconds
Parallel processing with 5 processes & 99 generations: 99 seconds
Because the fitness function does not need much CPU time, the normal processing takes the least time. Running processes for this simple problem takes 99 compared to only 5 seconds for threads because managing processes is much heavier than managing threads. Thus, most of the CPU time is for swapping the processes instead of executing the code.
In the second example, the loop makes 99999999 iterations and only 5 generations are used. With no parallelization, it takes 22 seconds.
import pygad
import time
def fitness_func(ga_instance, solution, solution_idx):
for _ in range(99999999):
pass
return 0
ga_instance = pygad.GA(num_generations=5,
num_parents_mating=3,
sol_per_pop=5,
num_genes=10,
fitness_func=fitness_func,
suppress_warnings=True,
parallel_processing=None)
if __name__ == '__main__':
t1 = time.time()
ga_instance.run()
t2 = time.time()
print("Time is", t2-t1)
It takes 15 seconds when 10 processes are used.
...
ga_instance = pygad.GA(...,
parallel_processing=["process", 10])
...
This is compared to 20 seconds when 10 threads are used.
...
ga_instance = pygad.GA(...,
parallel_processing=["thread", 10])
...
Based on the second example, using parallel processing with 10 processes takes the least time because there is much CPU work done. Generally, processes are preferred over threads when most of the work in on the CPU. Threads are preferred over processes in some situations like doing input/output operations.
Before releasing PyGAD 2.17.0, László Fazekas wrote an article to parallelize the fitness function with PyGAD. Check it: How Genetic Algorithms Can Compete with Gradient Descent and Backprop.
Print Lifecycle Summary¶
In PyGAD
2.19.0,
a new method called summary()
is supported. It prints a Keras-like
summary of the PyGAD lifecycle showing the steps, callback functions,
parameters, etc.
This method accepts the following parameters:
line_length=70
: An integer representing the length of the single line in characters.fill_character=" "
: A character to fill the lines.line_character="-"
: A character for creating a line separator.line_character2="="
: A secondary character to create a line separator.columns_equal_len=False
: The table rows are split into equal-sized columns or split subjective to the width needed.print_step_parameters=True
: Whether to print extra parameters about each step inside the step. Ifprint_step_parameters=False
andprint_parameters_summary=True
, then the parameters of each step are printed at the end of the table.print_parameters_summary=True
: Whether to print parameters summary at the end of the table. Ifprint_step_parameters=False
, then the parameters of each step are printed at the end of the table too.
This is a quick example to create a PyGAD example.
import pygad
import numpy
function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44
def genetic_fitness(solution, solution_idx):
output = numpy.sum(solution*function_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
def on_gen(ga):
pass
def on_crossover_callback(a, b):
pass
ga_instance = pygad.GA(num_generations=100,
num_parents_mating=10,
sol_per_pop=20,
num_genes=len(function_inputs),
on_crossover=on_crossover_callback,
on_generation=on_gen,
parallel_processing=2,
stop_criteria="reach_10",
fitness_batch_size=4,
crossover_probability=0.4,
fitness_func=genetic_fitness)
Then call the summary()
method to print the summary with the default
parameters. Note that entries for the crossover and generation callback
function are created because their callback functions are implemented
through the on_crossover_callback()
and on_gen()
, respectively.
ga_instance.summary()
----------------------------------------------------------------------
PyGAD Lifecycle
======================================================================
Step Handler Output Shape
======================================================================
Fitness Function genetic_fitness() (1)
Fitness batch size: 4
----------------------------------------------------------------------
Parent Selection steady_state_selection() (10, 6)
Number of Parents: 10
----------------------------------------------------------------------
Crossover single_point_crossover() (10, 6)
Crossover probability: 0.4
----------------------------------------------------------------------
On Crossover on_crossover_callback() None
----------------------------------------------------------------------
Mutation random_mutation() (10, 6)
Mutation Genes: 1
Random Mutation Range: (-1.0, 1.0)
Mutation by Replacement: False
Allow Duplicated Genes: True
----------------------------------------------------------------------
On Generation on_gen() None
Stop Criteria: [['reach', 10.0]]
----------------------------------------------------------------------
======================================================================
Population Size: (20, 6)
Number of Generations: 100
Initial Population Range: (-4, 4)
Keep Elitism: 1
Gene DType: [<class 'float'>, None]
Parallel Processing: ['thread', 2]
Save Best Solutions: False
Save Solutions: False
======================================================================
We can set the print_step_parameters
and
print_parameters_summary
parameters to False
to not print the
parameters.
ga_instance.summary(print_step_parameters=False,
print_parameters_summary=False)
----------------------------------------------------------------------
PyGAD Lifecycle
======================================================================
Step Handler Output Shape
======================================================================
Fitness Function genetic_fitness() (1)
----------------------------------------------------------------------
Parent Selection steady_state_selection() (10, 6)
----------------------------------------------------------------------
Crossover single_point_crossover() (10, 6)
----------------------------------------------------------------------
On Crossover on_crossover_callback() None
----------------------------------------------------------------------
Mutation random_mutation() (10, 6)
----------------------------------------------------------------------
On Generation on_gen() None
----------------------------------------------------------------------
======================================================================
Logging Outputs¶
In PyGAD
3.0.0,
the print()
statement is no longer used and the outputs are printed
using the logging
module. A a new parameter called logger
is supported to accept the
user-defined logger.
import logging
logger = ...
ga_instance = pygad.GA(...,
logger=logger,
...)
The default value for this parameter is None
. If there is no logger
passed (i.e. logger=None
), then a default logger is created to log
the messages to the console exactly like how the print()
statement
works.
Some advantages of using the the
logging module
instead of the print()
statement are:
The user has more control over the printed messages specially if there is a project that uses multiple modules where each module prints its messages. A logger can organize the outputs.
Using the proper
Handler
, the user can log the output messages to files and not only restricted to printing it to the console. So, it is much easier to record the outputs.The format of the printed messages can be changed by customizing the
Formatter
assigned to the Logger.
This section gives some quick examples to use the logging
module and
then gives an example to use the logger with PyGAD.
Logging to the Console¶
This is an example to create a logger to log the messages to the console.
import logging
# Create a logger
logger = logging.getLogger(__name__)
# Set the logger level to debug so that all the messages are printed.
logger.setLevel(logging.DEBUG)
# Create a stream handler to log the messages to the console.
stream_handler = logging.StreamHandler()
# Set the handler level to debug.
stream_handler.setLevel(logging.DEBUG)
# Create a formatter
formatter = logging.Formatter('%(message)s')
# Add the formatter to handler.
stream_handler.setFormatter(formatter)
# Add the stream handler to the logger
logger.addHandler(stream_handler)
Now, we can log messages to the console with the format specified in the
Formatter
.
logger.debug('Debug message.')
logger.info('Info message.')
logger.warning('Warn message.')
logger.error('Error message.')
logger.critical('Critical message.')
The outputs are identical to those returned using the print()
statement.
Debug message.
Info message.
Warn message.
Error message.
Critical message.
By changing the format of the output messages, we can have more information about each message.
formatter = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
This is a sample output.
2023-04-03 18:46:27 DEBUG: Debug message.
2023-04-03 18:46:27 INFO: Info message.
2023-04-03 18:46:27 WARNING: Warn message.
2023-04-03 18:46:27 ERROR: Error message.
2023-04-03 18:46:27 CRITICAL: Critical message.
Note that you may need to clear the handlers after finishing the execution. This is to make sure no cached handlers are used in the next run. If the cached handlers are not cleared, then the single output message may be repeated.
logger.handlers.clear()
Logging to a File¶
This is another example to log the messages to a file named
logfile.txt
. The formatter prints the following about each message:
The date and time at which the message is logged.
The log level.
The message.
The path of the file.
The lone number of the log message.
import logging
level = logging.DEBUG
name = 'logfile.txt'
logger = logging.getLogger(name)
logger.setLevel(level)
file_handler = logging.FileHandler(name, 'a+', 'utf-8')
file_handler.setLevel(logging.DEBUG)
file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(file_format)
logger.addHandler(file_handler)
This is how the outputs look like.
2023-04-03 18:54:03 DEBUG: Debug message. - c:\users\agad069\desktop\logger\example2.py:46
2023-04-03 18:54:03 INFO: Info message. - c:\users\agad069\desktop\logger\example2.py:47
2023-04-03 18:54:03 WARNING: Warn message. - c:\users\agad069\desktop\logger\example2.py:48
2023-04-03 18:54:03 ERROR: Error message. - c:\users\agad069\desktop\logger\example2.py:49
2023-04-03 18:54:03 CRITICAL: Critical message. - c:\users\agad069\desktop\logger\example2.py:50
Consider clearing the handlers if necessary.
logger.handlers.clear()
Log to Both the Console and a File¶
This is an example to create a single Logger associated with 2 handlers:
A file handler.
A stream handler.
import logging
level = logging.DEBUG
name = 'logfile.txt'
logger = logging.getLogger(name)
logger.setLevel(level)
file_handler = logging.FileHandler(name,'a+','utf-8')
file_handler.setLevel(logging.DEBUG)
file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s - %(pathname)s:%(lineno)d', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(file_format)
logger.addHandler(file_handler)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_format = logging.Formatter('%(message)s')
console_handler.setFormatter(console_format)
logger.addHandler(console_handler)
When a log message is executed, then it is both printed to the console
and saved in the logfile.txt
.
Consider clearing the handlers if necessary.
logger.handlers.clear()
PyGAD Example¶
To use the logger in PyGAD, just create your custom logger and pass it
to the logger
parameter.
import logging
import pygad
import numpy
level = logging.DEBUG
name = 'logfile.txt'
logger = logging.getLogger(name)
logger.setLevel(level)
file_handler = logging.FileHandler(name,'a+','utf-8')
file_handler.setLevel(logging.DEBUG)
file_format = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
file_handler.setFormatter(file_format)
logger.addHandler(file_handler)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_format = logging.Formatter('%(message)s')
console_handler.setFormatter(console_format)
logger.addHandler(console_handler)
equation_inputs = [4, -2, 8]
desired_output = 2671.1234
def fitness_func(ga_instance, solution, solution_idx):
output = numpy.sum(solution * equation_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
def on_generation(ga_instance):
ga_instance.logger.info(f"Generation = {ga_instance.generations_completed}")
ga_instance.logger.info(f"Fitness = {ga_instance.best_solution(pop_fitness=ga_instance.last_generation_fitness)[1]}")
ga_instance = pygad.GA(num_generations=10,
sol_per_pop=40,
num_parents_mating=2,
keep_parents=2,
num_genes=len(equation_inputs),
fitness_func=fitness_func,
on_generation=on_generation,
logger=logger)
ga_instance.run()
logger.handlers.clear()
By executing this code, the logged messages are printed to the console and also saved in the text file.
2023-04-03 19:04:27 INFO: Generation = 1
2023-04-03 19:04:27 INFO: Fitness = 0.00038086960368076276
2023-04-03 19:04:27 INFO: Generation = 2
2023-04-03 19:04:27 INFO: Fitness = 0.00038214871408010853
2023-04-03 19:04:27 INFO: Generation = 3
2023-04-03 19:04:27 INFO: Fitness = 0.0003832795907974678
2023-04-03 19:04:27 INFO: Generation = 4
2023-04-03 19:04:27 INFO: Fitness = 0.00038398612055017196
2023-04-03 19:04:27 INFO: Generation = 5
2023-04-03 19:04:27 INFO: Fitness = 0.00038442348890867516
2023-04-03 19:04:27 INFO: Generation = 6
2023-04-03 19:04:27 INFO: Fitness = 0.0003854406039137763
2023-04-03 19:04:27 INFO: Generation = 7
2023-04-03 19:04:27 INFO: Fitness = 0.00038646083174063284
2023-04-03 19:04:27 INFO: Generation = 8
2023-04-03 19:04:27 INFO: Fitness = 0.0003875169193024936
2023-04-03 19:04:27 INFO: Generation = 9
2023-04-03 19:04:27 INFO: Fitness = 0.0003888816727311021
2023-04-03 19:04:27 INFO: Generation = 10
2023-04-03 19:04:27 INFO: Fitness = 0.000389832593101348
Solve Non-Deterministic Problems¶
PyGAD can be used to solve both deterministic and non-deterministic problems. Deterministic are those that return the same fitness for the same solution. For non-deterministic problems, a different fitness value would be returned for the same solution.
By default, PyGAD settings are set to solve deterministic problems. PyGAD can save the explored solutions and their fitness to reuse in the future. These instances attributes can save the solutions:
solutions
: Exists ifsave_solutions=True
.best_solutions
: Exists ifsave_best_solutions=True
.last_generation_elitism
: Exists ifkeep_elitism
> 0.last_generation_parents
: Exists ifkeep_parents
> 0 orkeep_parents=-1
.
To configure PyGAD for non-deterministic problems, we have to disable saving the previous solutions. This is by setting these parameters:
keep_elisitm=0
keep_parents=0
keep_solutions=False
keep_best_solutions=False
import pygad
...
ga_instance = pygad.GA(...,
keep_elitism=0,
keep_parents=0,
save_solutions=False,
save_best_solutions=False,
...)
This way PyGAD will not save any explored solution and thus the fitness function have to be called for each individual solution.
Reuse the Fitness instead of Calling the Fitness Function¶
It may happen that a previously explored solution in generation X is explored again in another generation Y (where Y > X). For some problems, calling the fitness function takes much time.
For deterministic problems, it is better to not call the fitness function for an already explored solutions. Instead, reuse the fitness of the old solution. PyGAD supports some options to help you save time calling the fitness function for a previously explored solution.
The parameters explored in this section can be set in the constructor of
the pygad.GA
class.
The cal_pop_fitness()
method of the pygad.GA
class checks these
parameters to see if there is a possibility of reusing the fitness
instead of calling the fitness function.
1. save_solutions
¶
It defaults to False
. If set to True
, then the population of
each generation is saved into the solutions
attribute of the
pygad.GA
instance. In other words, every single solution is saved in
the solutions
attribute.
2. save_best_solutions
¶
It defaults to False
. If True
, then it only saves the best
solution in every generation.
3. keep_elitism
¶
It accepts an integer and defaults to 1. If set to a positive integer, then it keeps the elitism of one generation available in the next generation.
4. keep_parents
¶
It accepts an integer and defaults to -1. It set to -1
or a positive
integer, then it keeps the parents of one generation available in the
next generation.
Why the Fitness Function is not Called for Solution at Index 0?¶
PyGAD has a parameter called keep_elitism
which defaults to 1. This
parameter defines the number of best solutions in generation X to
keep in the next generation X+1. The best solutions are just copied
from generation X to generation X+1 without making any change.
ga_instance = pygad.GA(...,
keep_elitism=1,
...)
The best solutions are copied at the beginning of the population. If
keep_elitism=1
, this means the best solution in generation X is kept
in the next generation X+1 at index 0 of the population. If
keep_elitism=2
, this means the 2 best solutions in generation X are
kept in the next generation X+1 at indices 0 and 1 of the population of
generation 1.
Because the fitness of these best solutions are already calculated in generation X, then their fitness values will not be recalculated at generation X+1 (i.e. the fitness function will not be called for these solutions again). Instead, their fitness values are just reused. This is why you see that no solution with index 0 is passed to the fitness function.
To force calling the fitness function for each solution in every
generation, consider setting keep_elitism
and keep_parents
to 0.
Moreover, keep the 2 parameters save_solutions
and
save_best_solutions
to their default value False
.
ga_instance = pygad.GA(...,
keep_elitism=0,
keep_parents=0,
save_solutions=False,
save_best_solutions=False,
...)
Batch Fitness Calculation¶
In PyGAD
2.19.0,
a new optional parameter called fitness_batch_size
is supported. A
new optional parameter called fitness_batch_size
is supported to
calculate the fitness function in batches. Thanks to Linan
Qiu for opening the GitHub issue
#136.
Its values can be:
1
orNone
: If thefitness_batch_size
parameter is assigned the value1
orNone
(default), then the normal flow is used where the fitness function is called for each individual solution. That is if there are 15 solutions, then the fitness function is called 15 times.1 < fitness_batch_size <= sol_per_pop
: If thefitness_batch_size
parameter is assigned a value satisfying this condition1 < fitness_batch_size <= sol_per_pop
, then the solutions are grouped into batches of sizefitness_batch_size
and the fitness function is called once for each batch. In this case, the fitness function must return a list/tuple/numpy.ndarray with a length equal to the number of solutions passed.
Example without fitness_batch_size
Parameter¶
This is an example where the fitness_batch_size
parameter is given
the value None
(which is the default value). This is equivalent to
using the value 1
. In this case, the fitness function will be called
for each solution. This means the fitness function fitness_func
will
receive only a single solution. This is an example of the passed
arguments to the fitness function:
solution: [ 2.52860734, -0.94178795, 2.97545704, 0.84131987, -3.78447118, 2.41008358]
solution_idx: 3
The fitness function also must return a single numeric value as the fitness for the passed solution.
As we have a population of 20
solutions, then the fitness function
is called 20 times per generation. For 5 generations, then the fitness
function is called 20*5 = 100
times. In PyGAD, the fitness function
is called after the last generation too and this adds additional 20
times. So, the total number of calls to the fitness function is
20*5 + 20 = 120
.
Note that the keep_elitism
and keep_parents
parameters are set
to 0
to make sure no fitness values are reused and to force calling
the fitness function for each individual solution.
import pygad
import numpy
function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44
number_of_calls = 0
def fitness_func(ga_instance, solution, solution_idx):
global number_of_calls
number_of_calls = number_of_calls + 1
output = numpy.sum(solution*function_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
return fitness
ga_instance = pygad.GA(num_generations=5,
num_parents_mating=10,
sol_per_pop=20,
fitness_func=fitness_func,
fitness_batch_size=None,
# fitness_batch_size=1,
num_genes=len(function_inputs),
keep_elitism=0,
keep_parents=0)
ga_instance.run()
print(number_of_calls)
120
Example with fitness_batch_size
Parameter¶
This is an example where the fitness_batch_size
parameter is used
and assigned the value 4
. This means the solutions will be grouped
into batches of 4
solutions. The fitness function will be called
once for each patch (i.e. called once for each 4 solutions).
This is an example of the arguments passed to it:
solutions:
[[ 3.1129432 -0.69123589 1.93792414 2.23772968 -1.54616001 -0.53930799]
[ 3.38508121 0.19890812 1.93792414 2.23095014 -3.08955597 3.10194128]
[ 2.37079504 -0.88819803 2.97545704 1.41742256 -3.95594055 2.45028256]
[ 2.52860734 -0.94178795 2.97545704 0.84131987 -3.78447118 2.41008358]]
solutions_indices:
[16, 17, 18, 19]
As we have 20 solutions, then there are 20/4 = 5
patches. As a
result, the fitness function is called only 5 times per generation
instead of 20. For each call to the fitness function, it receives a
batch of 4 solutions.
As we have 5 generations, then the function will be called 5*5 = 25
times. Given the call to the fitness function after the last generation,
then the total number of calls is 5*5 + 5 = 30
.
import pygad
import numpy
function_inputs = [4,-2,3.5,5,-11,-4.7]
desired_output = 44
number_of_calls = 0
def fitness_func_batch(ga_instance, solutions, solutions_indices):
global number_of_calls
number_of_calls = number_of_calls + 1
batch_fitness = []
for solution in solutions:
output = numpy.sum(solution*function_inputs)
fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
batch_fitness.append(fitness)
return batch_fitness
ga_instance = pygad.GA(num_generations=5,
num_parents_mating=10,
sol_per_pop=20,
fitness_func=fitness_func_batch,
fitness_batch_size=4,
num_genes=len(function_inputs),
keep_elitism=0,
keep_parents=0)
ga_instance.run()
print(number_of_calls)
30
When batch fitness calculation is used, then we saved 120 - 30 = 90
calls to the fitness function.
Use Functions and Methods to Build Fitness and Callbacks¶
In PyGAD 2.19.0, it is possible to pass user-defined functions or methods to the following parameters:
fitness_func
on_start
on_fitness
on_parents
on_crossover
on_mutation
on_generation
on_stop
This section gives 2 examples to assign these parameters user-defined:
Functions.
Methods.
Assign Functions¶
This is a dummy example where the fitness function returns a random
value. Note that the instance of the pygad.GA
class is passed as the
last parameter of all functions.
import pygad
import numpy
def fitness_func(ga_instanse, solution, solution_idx):
return numpy.random.rand()
def on_start(ga_instanse):
print("on_start")
def on_fitness(ga_instanse, last_gen_fitness):
print("on_fitness")
def on_parents(ga_instanse, last_gen_parents):
print("on_parents")
def on_crossover(ga_instanse, last_gen_offspring):
print("on_crossover")
def on_mutation(ga_instanse, last_gen_offspring):
print("on_mutation")
def on_generation(ga_instanse):
print("on_generation\n")
def on_stop(ga_instanse, last_gen_fitness):
print("on_stop")
ga_instance = pygad.GA(num_generations=5,
num_parents_mating=4,
sol_per_pop=10,
num_genes=2,
on_start=on_start,
on_fitness=on_fitness,
on_parents=on_parents,
on_crossover=on_crossover,
on_mutation=on_mutation,
on_generation=on_generation,
on_stop=on_stop,
fitness_func=fitness_func)
ga_instance.run()
Assign Methods¶
The next example has all the method defined inside the class Test
.
All of the methods accept an additional parameter representing the
method’s object of the class Test
.
All methods accept self
as the first parameter and the instance of
the pygad.GA
class as the last parameter.
import pygad
import numpy
class Test:
def fitness_func(self, ga_instanse, solution, solution_idx):
return numpy.random.rand()
def on_start(self, ga_instanse):
print("on_start")
def on_fitness(self, ga_instanse, last_gen_fitness):
print("on_fitness")
def on_parents(self, ga_instanse, last_gen_parents):
print("on_parents")
def on_crossover(self, ga_instanse, last_gen_offspring):
print("on_crossover")
def on_mutation(self, ga_instanse, last_gen_offspring):
print("on_mutation")
def on_generation(self, ga_instanse):
print("on_generation\n")
def on_stop(self, ga_instanse, last_gen_fitness):
print("on_stop")
ga_instance = pygad.GA(num_generations=5,
num_parents_mating=4,
sol_per_pop=10,
num_genes=2,
on_start=Test().on_start,
on_fitness=Test().on_fitness,
on_parents=Test().on_parents,
on_crossover=Test().on_crossover,
on_mutation=Test().on_mutation,
on_generation=Test().on_generation,
on_stop=Test().on_stop,
fitness_func=Test().fitness_func)
ga_instance.run()