# `pygad.utils` Module

This section of the documentation discusses the **pygad.utils** module.

PyGAD supports different types of operators for selecting the parents, applying the crossover, and mutation. More features will be added in the future. To ask for a new feature, please check the [Ask for Feature](https://pygad.readthedocs.io/en/latest/help_support.html#ask-for-feature) section.

The submodules in the `pygad.utils` module are:

1. `engine`: The core engine of the library. It has the `GAEngine` class implementing the main loop and related functions.
2. `crossover`: Has the `Crossover` class that implements the crossover operators.
3. `mutation`: Has the `Mutation` class that implements the mutation operators.
4. `parent_selection`: Has the `ParentSelection` class that implements the parent selection operators.
5. `nsga2`: Has the `NSGA2` class that implements the Non-Dominated Sorting Genetic Algorithm II (NSGA-II).
6. `nsga3`: Has the `NSGA3` class that implements the Non-Dominated Sorting Genetic Algorithm III (NSGA-III).
7. `quality_indicators`: Has functions to measure the quality of a Pareto front: `hypervolume`, `inverted_generational_distance`, `generational_distance`, and `spacing`.

Note that the `pygad.GA` class extends all of these classes. So, the user can access any of the methods in such classes directly by the instance/object of the `pygad.GA` class.

The next sections discuss each submodule.

## `pygad.utils.engine` Submodule

The `pygad.utils.engine` module has the `GAEngine` class that implements the engine of the library. The methods in this class are:

1. `initialize_population()`
2. `cal_pop_fitness()`
3. `run()`
   1. `run_loop_head()`
   2. `run_select_parents()`
   3. `run_crossover()`
   4. `run_mutation()`
   5. `run_update_population()`
4. `best_solution()`
5. `round_genes()`

### `initialize_population()`

It creates an initial population randomly as a NumPy array. The array is saved in the instance attribute named `population`.

Accepts the following parameters:

- `low`: The lower value of the random range from which the gene values in the initial population are selected. It defaults to -4. Available in PyGAD 1.0.20 and higher.
- `high`: The upper value of the random range from which the gene values in the initial population are selected. It defaults to +4. Available in PyGAD 1.0.20 and higher.

This method assigns the values of the following 3 instance attributes:

1. `pop_size`: Size of the population.
2. `population`: Initially, it holds the initial population and is later updated after each generation.
3. `initial_population`: Holds the initial population.

### `cal_pop_fitness()`

The `cal_pop_fitness()` method calculates and returns the fitness values of the solutions in the current population. 

This function is optimized to save time by making fewer calls to the fitness function. It follows this process:

1. If the `save_solutions` parameter is set to `True`, then it checks if the solution is already explored and saved in the `solutions` instance attribute. If so, then it just retrieves its fitness from the `solutions_fitness` instance attribute without calling the fitness function.
2. If `save_solutions` is set to `False` or if it is `True` but the solution was not explored yet, then the `cal_pop_fitness()` method checks if the `keep_elitism` parameter is set to a positive integer. If so, then it checks if the solution is saved into the `last_generation_elitism` instance attribute. If so, then it retrieves its fitness from the `previous_generation_fitness` instance attribute.
3. If neither of the above 3 conditions apply (1. `save_solutions` is set to `False` or 2. if it is `True` but the solution was not explored yet or 3. `keep_elitism` is set to zero), then the `cal_pop_fitness()` method checks if the `keep_parents` parameter is set to `-1` or a positive integer. If so, then it checks if the solution is saved into the `last_generation_parents` instance attribute. If so, then it retrieves its fitness from the `previous_generation_fitness` instance attribute.
4. If neither of the above 4 conditions apply, then we have to call the fitness function to calculate the fitness for the solution. This is by calling the function assigned to the `fitness_func` parameter. 

This function takes into consideration:

1. The `parallel_processing` parameter to check whether parallel processing is in effect.
2. The `fitness_batch_size` parameter to check if the fitness should be calculated in batches of solutions.

It returns a vector of the solutions' fitness values.

### `run()`

Runs the genetic algorithm. This is the main method in which the genetic algorithm is evolved through some generations. It accepts no parameters as it uses the instance to access all of its requirements.

For each generation, the fitness values of all solutions within the population are calculated according to the `cal_pop_fitness()` method which internally just calls the function assigned to the `fitness_func` parameter in the `pygad.GA` class constructor for each solution.

According to the fitness values of all solutions, the parents are selected using the `select_parents()` method. This method's behavior is determined by the parent selection type in the `parent_selection_type` parameter in the `pygad.GA` class constructor.

Based on the selected parents, offspring are generated by applying the crossover and mutation operations using the `crossover()` and `mutation()` methods. The behavior of such 2 methods is defined according to the `crossover_type` and `mutation_type` parameters in the `pygad.GA` class constructor.

After the generation completes, the following takes place:

- The `population` attribute is updated by the new population.
- The `generations_completed` attribute is assigned the number of the last completed generation.
- If there is a callback function assigned to the `on_generation` attribute, then it will be called.

After the `run()` method completes, the following takes place:

- The `best_solution_generation` is assigned the generation number at which the best fitness value is reached.
- The `run_completed` attribute is set to `True`.

Note that the `run()` method is calling 5 different methods during the loop:

1. `run_loop_head()`
2. `run_select_parents()`
3. `run_crossover()`
4. `run_mutation()`
5. `run_update_population()`

### `best_solution()`

Returns information about the best solution found by the genetic algorithm. 

It accepts the following parameters:

* `pop_fitness=None`: An optional parameter that accepts a list of the fitness values of the solutions in the population. If `None`, then the `cal_pop_fitness()` method is called to calculate the fitness values of the population.

It returns the following:

* `best_solution`: Best solution in the current population.

* `best_solution_fitness`: Fitness value of the best solution.

* `best_match_idx`: Index of the best solution in the current population.

### `round_genes()`

A method to round the genes in the passed solutions. It loops through each gene across all the passed solutions and rounds their values if applicable. 

## `pygad.utils.validation` Submodule

The `pygad.utils.validation` module has the `Validation` class that validates the arguments passed while instantiating the `pygad.GA` class. The methods in this class are:

1. `validate_parameters()`: A method that accepts the same list of arguments accepted by the constructor of the `pygad.GA` class. It validates all the parameters. If everything is validated, the instance attribute `valid_parameters` will be set to `True`. Otherwise, it will be `False` and an exception is raised indicating the invalid criteria.

An inner method called `validate_multi_stop_criteria()` exists to validate the `stop_criteria` argument.

## `pygad.utils.crossover` Submodule

The `pygad.utils.crossover` module has a class named `Crossover` with the supported crossover operations:

1. Single point: Implemented using the `single_point_crossover()` method.
2. Two points: Implemented using the `two_points_crossover()` method.
3. Uniform: Implemented using the `uniform_crossover()` method.
4. Scattered: Implemented using the `scattered_crossover()` method.

Crossover takes two parents and builds a child by mixing their genes. The next figure shows how single-point, two-point, and uniform crossover do this.

:::{figure} images/crossover_types.*
:alt: Single-point, two-point, and uniform crossover
:width: 560px
:align: center

How single-point, two-point, and uniform crossover build a child from two parents.
:::

All crossover methods accept these parameters:

1. `parents`: The parents to mate for producing the offspring.
2. `offspring_size`: The size of the offspring to produce.

### Crossover Methods

The `Crossover` class in the `pygad.utils.crossover` module supports several methods for applying crossover between the selected parents. All of these methods accept the same parameters which are:

* `parents`: The parents to mate for producing the offspring.
* `offspring_size`: The size of the offspring to produce.

All of such methods return an array of the produced offspring.

The next subsections list the supported methods for crossover.

#### `single_point_crossover()`

Applies the single-point crossover. It selects a point randomly at which crossover takes place between the pairs of parents.

#### `two_points_crossover()`

Applies the 2 points crossover. It selects the 2 points randomly at which crossover takes place between the pairs of parents.

#### `uniform_crossover()`

Applies the uniform crossover. For each gene, a parent out of the 2 mating parents is selected randomly and the gene is copied from it.

#### `scattered_crossover()`

Applies the scattered crossover. It randomly selects the gene from one of the 2 parents. 

## `pygad.utils.mutation` Submodule

The `pygad.utils.mutation` module has a class named `Mutation` with the supported mutation operations:

1. Random: Implemented using the `random_mutation()` method.
2. Swap: Implemented using the `swap_mutation()` method.
3. Inversion: Implemented using the `inversion_mutation()` method.
4. Scramble: Implemented using the `scramble_mutation()` method.
5. Adaptive: Implemented using the `adaptive_mutation()` method.

Mutation makes small random changes to the offspring so the search can explore new values. The next figure shows random mutation, where a few genes are picked at random and their values are changed.

:::{figure} images/mutation.*
:alt: Random mutation changes a few genes
:width: 560px
:align: center

Random mutation changes the values of a few genes that are picked at random.
:::

All mutation methods accept this parameter:

1. `offspring`: The offspring to mutate.

### Mutation Methods

The `Mutation` class in the `pygad.utils.mutation` module supports several methods for applying mutation. All of these methods accept the same parameter which is:

* `offspring`: The offspring to mutate.

All of such methods return an array of the mutated offspring.

The next subsections list the supported methods for mutation.

#### `random_mutation()`

Applies the random mutation which changes the values of some genes randomly. The number of genes is specified according to either the `mutation_num_genes` or the `mutation_percent_genes` attributes.

For each gene, a random value is selected according to the range specified by the 2 attributes `random_mutation_min_val` and `random_mutation_max_val`. The random value is added to the selected gene.

#### `swap_mutation()`

Applies the swap mutation which interchanges the values of 2 randomly selected genes.

#### `inversion_mutation()`

Applies the inversion mutation which selects a subset of genes and inverts them.

#### `scramble_mutation()`

Applies the scramble mutation which selects a subset of genes and shuffles their order randomly.

#### `adaptive_mutation()`

Applies the adaptive mutation, which selects the number/percentage of genes to mutate based on the solution's fitness. If the fitness is high (the solution quality is high), then a smaller number/percentage of genes is mutated compared to a solution with low fitness.

### Mutation Helper Methods

The `pygad.utils.mutation` module has some helper methods to assist applying the mutation operation:

1. `mutation_by_space()`: Applies the mutation using the `gene_space` parameter.
2. `mutation_probs_by_space()`: Uses the mutation probabilities in the `mutation_probabilities` instance attribute to apply the mutation using the `gene_space` parameter. For each gene, if its probability is <= the mutation probability, then it will be mutated based on the gene space.
3. `mutation_process_gene_value()`: Generate/select values for the gene that satisfy the constraint. The values could be generated randomly or from the gene space. 
4. `mutation_randomly()`: Applies the random mutation.
5. `mutation_probs_randomly()`: Uses the mutation probabilities in the `mutation_probabilities` instance attribute to apply the random mutation. For each gene, if its probability is <= the mutation probability, then it will be mutated randomly.
6. `adaptive_mutation_population_fitness()`: A helper method to calculate the average fitness of the solutions before applying the adaptive mutation.
7. `adaptive_mutation_by_space()`: Applies the adaptive mutation based on the `gene_space` parameter. A number of genes are selected randomly for mutation. This number depends on the fitness of the solution. The random values are selected from the `gene_space` parameter.
8. `adaptive_mutation_probs_by_space()`: Uses the mutation probabilities to decide which genes to apply the adaptive mutation by space.
9. `adaptive_mutation_randomly()`: Applies the adaptive mutation randomly. A number of genes are selected randomly for mutation. This number depends on the fitness of the solution. The random values are selected based on the 2 parameters `random_mutation_min_val` and `random_mutation_max_val`.
10. `adaptive_mutation_probs_randomly()`: Uses the mutation probabilities to decide which genes to apply the adaptive mutation randomly.

## `pygad.utils.parent_selection` Submodule

The `pygad.utils.parent_selection` module has a class named `ParentSelection` with the supported parent selection operations:

1. Steady-state: Implemented using the `steady_state_selection()` method.
2. Roulette wheel: Implemented using the `roulette_wheel_selection()` method.
3. Stochastic universal: Implemented using the `stochastic_universal_selection()` method.
4. Rank: Implemented using the `rank_selection()` method.
5. Random: Implemented using the `random_selection()` method.
6. Tournament: Implemented using the `tournament_selection()` method.
7. NSGA-II: Implemented using the `nsga2_selection()` method.
8. NSGA-II Tournament: Implemented using the `tournament_selection_nsga2()` method.
9. NSGA-III: Implemented using the `nsga3_selection()` method.
10. NSGA-III Tournament: Implemented using the `tournament_selection_nsga3()` method.

All parent selection methods accept these parameters:

1. `fitness`: The fitness of the entire population.
2. `num_parents`: The number of parents to select.

It has the following helper methods:

1. `wheel_cumulative_probs()`: A helper function to calculate the wheel probabilities for these 2 methods: 1) `roulette_wheel_selection()` 2) `rank_selection()`

### Parent Selection Methods

The `ParentSelection` class in the `pygad.utils.parent_selection` module has several methods for selecting the parents that will mate to produce the offspring. All of such methods accept the same parameters which are:

* `fitness`: The fitness values of the solutions in the current population.
* `num_parents`: The number of parents to be selected.

All of such methods return an array of the selected parents.

The next subsections list the supported methods for parent selection.

#### `steady_state_selection()`

Selects the parents using the steady-state selection technique.

#### `rank_selection()`

Selects the parents using the rank selection technique.

#### `random_selection()`

Selects the parents randomly.

#### `tournament_selection()`

Selects the parents using the tournament selection technique.

#### `roulette_wheel_selection()`

Selects the parents using the roulette wheel selection technique.

#### `stochastic_universal_selection()`

Selects the parents using the stochastic universal selection technique.

#### `nsga2_selection()`

Selects the parents for the NSGA-II algorithm to solve multi-objective optimization problems. It selects the parents by ranking them based on non-dominated sorting and crowding distance.

#### `tournament_selection_nsga2()`

Selects the parents for the NSGA-II algorithm to solve multi-objective optimization problems. It selects the parents using the tournament selection technique applied based on non-dominated sorting and crowding distance.

#### `nsga3_selection()`

Selects the parents for the NSGA-III algorithm to solve multi-objective optimization problems. It accepts whole Pareto fronts in order until adding the next front would overflow the requested parent count, then picks the remaining survivors from that critical front using niching against the structured reference points stored on the GA instance. Requires the `nsga3_num_divisions` parameter to be set when constructing the `pygad.GA` instance.

#### `tournament_selection_nsga3()`

Selects the parents for the NSGA-III algorithm to solve multi-objective optimization problems. It selects the parents using the tournament selection technique where the within-front comparison is based on the niche count (instead of the crowding distance used by `tournament_selection_nsga2()`). Requires the `nsga3_num_divisions` parameter to be set.

## `pygad.utils.nsga` Submodule

The `pygad.utils.nsga` module has a class named `NSGA` that holds the building blocks shared by NSGA-II and NSGA-III. The methods inside this class are:

1. `non_dominated_sorting()`: Returns all the Pareto fronts by applying non-dominated sorting over the solutions.
2. `get_non_dominated_set()`: Returns the two sets of non-dominated and dominated solutions from the passed solutions. The Pareto front is the non-dominated set.

## `pygad.utils.nsga2` Submodule

The `pygad.utils.nsga2` module has a class named `NSGA2` that implements the NSGA-II-specific primitives. The methods inside this class are:

1. `crowding_distance()`: Calculates the crowding distance for all solutions in the current Pareto front.
2. `sort_solutions_nsga2()`: Sort the solutions. If the problem is single-objective, the solutions are sorted by their fitness values. If it is multi-objective, non-dominated sorting and crowding distance are applied to sort the solutions.

## `pygad.utils.nsga3` Submodule

The `pygad.utils.nsga3` module has a class named `NSGA3` that implements the NSGA-III algorithm primitives. NSGA-III novel names start with `nsga3_` to make the algorithm surface easy to spot.

1. `nsga3_generate_reference_points()`: Build the structured grid of reference points on the unit simplex using the Das-Dennis (stars-and-bars) method.
2. `nsga3_compute_ideal_point()`: Return the ideal point (column maximum under PyGAD's maximization convention).
3. `nsga3_find_extreme_points()`: For each objective axis, return the solution that best represents the corner of that axis based on the Achievement Scalarizing Function (ASF).
4. `nsga3_compute_intercepts()`: Fit a hyperplane through the M extreme points and return the per-axis intercept point used as the normalization denominator. Falls back to the nadir (worst per objective) when the hyperplane cannot be fitted or when the intercept is degenerate.
5. `nsga3_normalize_fitness()`: Scale each fitness row to the `[0, 1]` range using the ideal point and the intercepts.
6. `nsga3_associate_to_reference_points()`: For every normalized solution, return the nearest reference index and the perpendicular distance to that reference line.
7. `nsga3_niching_select()`: Pick `num_to_select` survivors from the critical front using niche counts and per-niche tie-breaking rules.

The selection methods `nsga3_selection()` and `tournament_selection_nsga3()` live in `pygad.utils.parent_selection`. The engine-time helpers (`_bootstrap_nsga3_reference_points()`, `_nsga3_grow_population()`, `_nsga3_generate_extra_random_solutions()`, `_nsga3_generate_single_random_gene()`) live in `pygad.utils.engine`.

Two module-level constants in `pygad.utils.nsga3` control the numerical safeguards: `NSGA3_ASF_EPSILON` (default `1e-6`) and `NSGA3_INTERCEPT_NEAR_ZERO` (default `1e-12`).

## `pygad.utils.report` Submodule

The `pygad.utils.report` module has a class named `Report` that adds the `generate_report()` method to the `pygad.GA` class. It builds a PDF report of the GA run, bundling the configuration table, a run summary, the best solution, and every applicable plot. The title page shows the PyGAD logo, which ships with the package. Requires the optional dependencies `reportlab` and `matplotlib`:

```
pip install pygad[report]
```

See [`generate_report()`](https://pygad.readthedocs.io/en/latest/pygad.html#generate-report) and the runnable example at [`examples/example_generate_report.py`](https://github.com/ahmedfgad/GeneticAlgorithmPython/tree/master/examples/example_generate_report.py).

## `pygad.utils.quality_indicators` Submodule

The `pygad.utils.quality_indicators` module has functions to measure the quality of a Pareto front. All functions take fitness values in PyGAD's maximization format. The functions are:

1. `hypervolume(fitness, reference_point)`: Volume of the objective space dominated by the front. The reference point must be worse than every solution on every objective. A larger value is better.
2. `inverted_generational_distance(fitness, reference_front)`: Mean distance from each reference-front point to its nearest approximation point. Reports both convergence and diversity. A smaller value is better.
3. `generational_distance(fitness, reference_front)`: Mean distance from each approximation point to its nearest reference point. Reports convergence only. A smaller value is better.
4. `spacing(fitness)`: Standard deviation of the distance from each solution to its nearest neighbour. A smaller value means the solutions are spread more evenly.

Example:

```python
from pygad.utils.quality_indicators import hypervolume, inverted_generational_distance

# After ga.run()
fitness = ga.last_generation_fitness
reference_point = [-10.0, -10.0]   # worse than every solution
hv = hypervolume(fitness, reference_point)

# If the true Pareto front is known
from pygad.benchmarks.zdt import ZDT1
problem = ZDT1()
true_front = problem.pareto_front(num_points=100)
igd = inverted_generational_distance(fitness, true_front)
```

A runnable example per indicator lives under `examples/quality_indicators/`.

## More about the Operators

::::{grid} 1 2 2 2
:gutter: 3

:::{grid-item-card} Adaptive Mutation
:link: adaptive_mutation
:link-type: doc

Change the mutation rate per solution based on its fitness.
:::

:::{grid-item-card} User-Defined Operators
:link: user_defined_operators
:link-type: doc

Plug in your own crossover, mutation, and parent selection.
:::

::::

:::{toctree}
:hidden:

adaptive_mutation
user_defined_operators
:::