What if I told you we have a massively complex, multi-threaded optimization function simultaneously running on trillions of distributed biological computers that update their own hardware and software — and it’s all solar powered. Sounds too good to be true.
It is. That function’s called evolution, and it doesn’t seem too compatible with our computers (without human intervention designing better computers).
However, we have some ways of emulating the efficiency of evolution: Genetic Algorithms are a way to emulate natural selection using math.
The evolution I’m fond of is punctuated equilibrium, which is much more rational than gradualism. Species don’t change much unless they need to, because there’s no room or reward for doing so. An environment-transforming event occurs, and in the ensuing vacuum, change happens rapidly until efficiency plateaus are reached.
Now consider a maximization/optimization function that you ran once, and it gives you its best solutions. Run it again, and it won’t change (probably). Change the parameters, and suddenly there’s room for change.
The core idea of genetic algorithms is the comparison, combination and induced variation of multiple individual solutions to a given problem.
The steps: population, fitness function, selection, crossover & mutation, ending in eventual convergence and termination. We begin with a population of individuals.
Modeling Historical Bunny-Related Issues
In 1859, Thomas Austin brought 24 European rabbits from England to Australia, and released them into the bush to create a population for hunting. 70 years later, there were 10 billion rabbits in Australia.
Rabbits living in the outback are environmentally problematic. They’re suited to their environment: the low-lying vegetation is suitable for grazing, they lack indigenous predators, and they reproduce at exponential rates. They’re an invasive species that cause hundreds of millions annually in crop damage and drive indigenous endangered species closer to extinction, the little bastards.
So we’ll model our genetic algorithm as a sample of Australia’s rabbit population, individuals containing ‘chromosomes’ comprised of ‘genes’. We’re representing their DNA as binary strings.
In context: It’s the 1950s, and the Aussies have tried everything to contain the furry menace, including a 2,000 mile long rabbit-proof fence (it wasn’t rabbit-proof). They’re really fed up with rabbits. They’ve been getting into the Vegemite and reducing native biodiversity everywhere they go.
So the Aussies get some fleas and mosquitoes from South America carrying the myxoma virus, a strain native to the continent that only affects rabbits. They release the virus on Australian rabbit populations.
Myxomatosis, in 6~12 days after infection, causes fever, skin lesions and growths, respiratory distress, hypothermia, extreme internal inflammation and death. Virulent strains from South America usually don’t manifest in native rabbits beyond some cosmetic growths, but in the invasive European species initially saw a 99.8% fatality rate.
The equilibrium-punctuating introduction of myxoma in the outback serves as our fitness function, a method that scores each individual as a potential solution for the problem. Real solutions are as complex as you need them to be, but for simplicity, we’ll model the solution as ‘complete resistance to myxomatosis’ = a string of five 1s.
This creates room for comparative, gradual optimization by counting 1s. Our fitness function is how many binary digits out of the six are 1, or what level of resistance a rabbit’s genes grant it. Let’s pretend that partial resistance applies here, and a rabbit may be more likely to survive with a better ‘partial solution’ than one with a worse one. This allows us to give the rabbits their fitness scores, the probability of surviving outback myxomatosis.
Next, we select the fittest individuals using the fitness scores. The chromosomes with the most 1s are chosen to reproduce; the rabbits with the most resistance survive while their multitudinous comrades lay dead in the unforgiving Australian bush.
The fittest individuals mate, and for each pair, a crossover point is chosen at random. In our binary example, we just do a bit swap for each pair. The new offspring are added to the population, and for the offspring = N, the N least fit individuals (mated or not) die out.
In some of the new offspring, some genes are subjected to mutation; some bits are flipped at random. This ensures the population has enough diversity to reach the solution without getting ’stuck’ at a performance plateau.
The surviving rabbits do what they do best, and make more rabbits. They’ve get new mixes of genes, but they don’t know that. They’re just rabbits. Some of the new rabbits have randomly mutated genes. Perhaps this will help them live long enough in the plagued world that they inhabit to destroy more indigenous specious and cause billions more in agricultural damages.
When the population doesn’t produce offspring different enough from the previous generation, it’s essentially reached the best solution it can for the given parameters. This occurs when the population’s change ‘levels out’.
The rabbits adapt to the virus ‘enough’: the slope of their adaptation line on a graph levels out. In Australia, some rabbits developed genetic immunity; today, only ~40% are susceptible. The rabbits’ genetic algorithm gave them a solution.
Vijini has a great article on TDS demonstrating a 5-digit GA implementation in Java, where the goal is to reach 5 1s.
Adam Dick used GAs to encode Battleship grids as 10 individuals with 10-digit chromosomes, and implemented a fitness function as to how well a chromosome matched reality (1 for ship space, 0 for empty space).
Genetic models have a lot of room to tweak parameters. Of a population of 10, elitism is the rate of selection (20% = 2 parents). A crossover rate tells us the number of spliced children (20% = one pair). Mutation rate determines the number of mutant children, and bit flip rates (usually low) determine intensity of mutation.
As a field, genetic algorithms are always evolving, and there’s many ways to run them differently: parallel populations, non-binary encoding, selection and survival mechanisms.
There are difficulties, however, in implementing them. Fitness functions are difficult to program correctly due to the potential variation in parameters. Heavier loads, especially in parallel ‘island’ models, take a long time to run. They’re non-deterministic, and may arrive at different solutions for the same input and parameters. Overall performance is highly sensitive to the initial population and fine-tuning parameters, and it’s difficult to prove or guarantee optimality. Like evolution, genetic algorithms give the best solution for a specific environment — the winning species might not be the true ‘best’, but merely better than the competitors.
There is No End to Rabbits
In Australia, the myxoma-carrying insects couldn’t survive well in the drier climates of the inner continent, so the deeper bush was still plagued by bunnies. In 1995, Australia released flies carrying Rabbit Hemorrhagic Disease that permeate into arid areas, initially reducing populations by 90%. However, the cooler, humid coastal regions are less favorable for the carrier flies, and resistance has since began to develop; fast breeding means fast mutation and adaptation.
Genetic algorithms are a fascinating method for optimization that can provide interesting solutions. They’re comparatively ‘wasteful’ compared to some other algorithms (in the same way evolution is physically wasteful compared to artificial design), but have immense potential to evolve beyond what they currently are.