# Image Classification In the documentation of the `pygad.nn` module, a neural network is created for classifying images from the Fruits360 dataset without being trained using an optimization algorithm. This section discusses how to train such a classifier using the genetic algorithm with the help of the `pygad.gann` module. Please make sure that the training data files [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy) and [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy) are available. For downloading them, use these links: 1. [dataset_features.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy): The features https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy 2. [outputs.npy](https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy): The class labels https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy After the data is available, here is the complete code that builds and trains a neural network using the genetic algorithm for classifying images from 4 classes of the Fruits360 dataset. Because there are 4 classes, the output layer is assigned has 4 neurons according to the `num_neurons_output` parameter of the `pygad.gann.GANN` class constructor. ```python import numpy import pygad import pygad.nn import pygad.gann def fitness_func(ga_instance, solution, sol_idx): global GANN_instance, data_inputs, data_outputs predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], data_inputs=data_inputs) correct_predictions = numpy.where(predictions == data_outputs)[0].size solution_fitness = (correct_predictions/data_outputs.size)*100 return solution_fitness def callback_generation(ga_instance): global GANN_instance, last_fitness population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, population_vectors=ga_instance.population) GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices) print(f"Generation = {ga_instance.generations_completed}") print(f"Fitness = {ga_instance.best_solution()[1]}") print(f"Change = {ga_instance.best_solution()[1] - last_fitness}") last_fitness = ga_instance.best_solution()[1].copy() # Holds the fitness value of the previous generation. last_fitness = 0 # Reading the input data. data_inputs = numpy.load("dataset_features.npy") # Download from https://github.com/ahmedfgad/NumPyANN/blob/master/dataset_features.npy # Optional step of filtering the input data using the standard deviation. features_STDs = numpy.std(a=data_inputs, axis=0) data_inputs = data_inputs[:, features_STDs>50] # Reading the output data. data_outputs = numpy.load("outputs.npy") # Download from https://github.com/ahmedfgad/NumPyANN/blob/master/outputs.npy # The length of the input vector for each sample (i.e. number of neurons in the input layer). num_inputs = data_inputs.shape[1] # The number of neurons in the output layer (i.e. number of classes). num_classes = 4 # Creating an initial population of neural networks. The return of the initial_population() function holds references to the networks, not their weights. Using such references, the weights of all networks can be fetched. num_solutions = 8 # A solution or a network can be used interchangeably. GANN_instance = pygad.gann.GANN(num_solutions=num_solutions, num_neurons_input=num_inputs, num_neurons_hidden_layers=[150, 50], num_neurons_output=num_classes, hidden_activations=["relu", "relu"], output_activation="softmax") # population does not hold the numerical weights of the network instead it holds a list of references to each last layer of each network (i.e. solution) in the population. A solution or a network can be used interchangeably. # If there is a population with 3 solutions (i.e. networks), then the population is a list with 3 elements. Each element is a reference to the last layer of each network. Using such a reference, all details of the network can be accessed. population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks) # To prepare the initial population, there are 2 ways: # 1) Prepare it yourself and pass it to the initial_population parameter. This way is useful when the user wants to start the genetic algorithm with a custom initial population. # 2) Assign valid integer values to the sol_per_pop and num_genes parameters. If the initial_population parameter exists, then the sol_per_pop and num_genes parameters are useless. initial_population = population_vectors.copy() num_parents_mating = 4 # Number of solutions to be selected as parents in the mating pool. num_generations = 500 # Number of generations. mutation_percent_genes = 10 # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists. parent_selection_type = "sss" # Type of parent selection. crossover_type = "single_point" # Type of the crossover operator. mutation_type = "random" # Type of the mutation operator. keep_parents = -1 # Number of parents to keep in the next population. -1 means keep all parents and 0 means keep nothing. ga_instance = pygad.GA(num_generations=num_generations, num_parents_mating=num_parents_mating, initial_population=initial_population, fitness_func=fitness_func, mutation_percent_genes=mutation_percent_genes, parent_selection_type=parent_selection_type, crossover_type=crossover_type, mutation_type=mutation_type, keep_parents=keep_parents, on_generation=callback_generation) ga_instance.run() # After the generations complete, a plot is shown that summarizes how the fitness values evolve over the generations. ga_instance.plot_fitness() # Returning the details of the best solution. solution, solution_fitness, solution_idx = ga_instance.best_solution() print(f"Parameters of the best solution : {solution}") print(f"Fitness value of the best solution = {solution_fitness}") print(f"Index of the best solution : {solution_idx}") if ga_instance.best_solution_generation != -1: print(f"Best fitness value reached after {ga_instance.best_solution_generation} generations.") # Predicting the outputs of the data using the best solution. predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[solution_idx], data_inputs=data_inputs) print(f"Predictions of the trained network : {predictions}") # Calculating some statistics num_wrong = numpy.where(predictions != data_outputs)[0] num_correct = data_outputs.size - num_wrong.size accuracy = 100 * (num_correct/data_outputs.size) print(f"Number of correct classifications : {num_correct}.") print(f"Number of wrong classifications : {num_wrong.size}.") print(f"Classification accuracy : {accuracy}.") ``` After training completes, here are the outputs of the print statements. The number of wrong classifications is only 1 and the accuracy is 99.949%. This accuracy is reached after 482 generations. ``` Fitness value of the best solution = 99.94903160040775 Index of the best solution : 0 Best fitness value reached after 482 generations. Number of correct classifications : 1961. Number of wrong classifications : 1. Classification accuracy : 99.94903160040775. ``` The next figure shows how fitness value evolves by generation. ![Training Neural Networks using Genetic Algorithm](https://user-images.githubusercontent.com/16560492/82152993-21898180-9865-11ea-8387-b995f88b83f7.png)