Genetic optimization with Dakota 3/3: Practice
Part 1: The optimization loop
Part 2: A sandbox for experimentation
Part 3: Practice
In my first three months of using Dakota, I have learned several things worth discussing. The configuration described in part 1 of this series makes for a correctly-working but crude optimization. Each optimization is different, and some experimentation and tweaking is necessary for the optimizer to progress efficiently.
I run Dakota in a context where the evaluations are extremely expensive compared to the operation of the optimizer itself. In this context, here are notes regarding a few topics:
Controlling the size of evaluation group
In the limit, having a very small number of individuals sent for evaluation in any given generation maximizes the effectiveness of the genetic algorithm, because the active population is moved more often towards the Pareto front, eliminating each time the weakest members. If only one new individual is created for each generation, then by the 100th generation, this individual is created out of a much higher-quality base data than if it is the 100th new individual in a large first generation.
On the other hand, computationally-intensive evaluations take time, and it may not be possible to complete the optimization at all unless large numbers of evaluations are ran in parallel. This is an incentive to increase the number of new individuals created in each generation. In those situations where (human) time is the limiting factor, the available parallel evaluation broadband becomes the desired number of new cases per generation.
Other constraints may exist, for example if the machine carrying out the evaluations is not correctly administered, with local waiting queue dynamics to be taken advantage of if the waiting time is to be minimized.
In Dakota, the number of new cases created each generation is very hard to control. In my (short) experience, it has always grown or decayed exponentially.
This could be mitigated with tight control over the population size in the selection process. However, in some optimizations, the performance domain is very wide, and there is a need to weed out catastrophically bad individuals. This ability to remove poorly performing individuals is especially important since the mutation and crossover operators are both blind to performance: they select parent individuals indiscriminately.
Ultimately, I maintained some control over the size of the evaluation group by interrupting optimizations and re-starting them with selected populations. I wish some functionality existed that would set upper and lower limits on the number of offspring in any generation.
Generation-dependent settings
To my knowledge, all settings passed to moga
are fixed for the complete optimization. This is regrettable, because it makes sense to progressively decrease the occurrence of mutation, or the magnitude of crossover averaging, as the optimization progresses towards the final Pareto front. A workaround is to interrupt the optimization, save its output, and use it as the initial population of a new optimization with more moderate settings.
Handling distant individuals
In my understanding, the crossover module of moga
picks parents randomly (indiscriminately) when building groups. In optimizations where the Pareto front ranges over a very wide range of parameters, members that are very distant from one another may not be usefully crossed. This unproductive crossing (producing low-performing members) becomes more frequent as one progresses over the optimization, since the Pareto front becomes longer and thinner.
I do not see an immediate solution to this issue. Again, a workaround is interrupt the optimization, “split” the domain of parameters into several subdomains, and assign one new optimization to each. However, this likely tends to result in slower convergence and a more discontinuous Pareto front, since the subdomain optimizers are not communicating with one another.
Using remote evaluations
Dakota communicates with external evaluators through the file system, awaiting a response
file for each evaluation, in which it reads the output. The same technique can be used to run evaluations on remote machines.
I configured Dakota to run on my local desktop. The evaluator is a shell script which, running from each local evaluation folder, successively:
- assembles and concatenates configuration files locally;
- synchronizes files with a remote high-performance cluster, using
rsync
; - submits a
slurm
job for the computation to be carried out on the cluster. That computation is configured to create a file “finished
” in the cluster work folder once it has completed; - sleeps a number of hours (to avoid congestion);
- checks for the existence of the “
finished
” file on the remote cluster; - retrieves key files from the cluster to the local machine;
- performs postprocessing with a local Python script, which itself outputs the precious
response
file which is awaited by Dakota.
In this way, the operation is carried with the best of both worlds.
The optimization proper takes place locally, where there are no time constraints, and where installation, configuration and administration are easily carried out. The computations themselves are carried out remotely on a cluster, and submitted in small batches on a rolling basis, which minimizes total waiting time.
Handling gaps in population
As noted above already, moga
selects parent individuals indiscriminately (randomly). It does not attempt to compensate for thinly-populated areas of the parameter domain. The indiscriminate sampling tends to reinforce already-dense areas, and there is a risk that parts of the domain remain unexplored. This is especially true in the edges of the domain, because while mutation can in principle create any kind of offspring, crossover can only produce offspring which stand “in between their parents” (their input parameters are always within the range of those of their parents).
I do not know of a solution to this issue (although a workaround is again to interrupt the optimization and initialize the following optimizations more carefully). This self-reinforcement of well-populated domains is a fundamental feature of genetic algorithms. Dakota supports hybrid approaches, in which one may run a genetic optimizer, followed by another method such as a gradient-based search; I have not yet looked into this functionality.
Specifying constraints
I have not yet learned how to specify constraints between input variables in Dakota.
Moga
does not attempt to know why an individual performs well or poorly, and so does not fundamentally distinguish between a poorly-performing and an invalid individual. In this view, one could think (as I did) that the rejection of invalid individuals may just as well be carried out by the evaluator (where programming of constraints may be more convenient) instead of by Dakota itself. This approach works, but has two undesirable consequences:
- This further reduces the control over the number of new individuals to be sent for evaluation (Dakota never knows how many of the offspring it produces are invalid);
- When the domain of inputs is not uniform, this biases Dakota against “narrow” domain areas. For example, if in one end of the input parameter domain, many combinations of domain parameters are invalid (not permitted by constraints), then the probability that Dakota will produce invalid individuals is higher in that area. Consequently, “narrow” domain areas receive proportionately less children, and become less populated, exacerbating the population gap problem described above.
For that reason, if the constraints on the input parameters do not apply uniformly on the input design space, it is well worth your time to specify them in Dakota itself rather than in the evaluator.
Preventing repetition
In several of the issues discussed above, one workaround was to restart new optimizations with modified settings or scope. In cases where the evaluations are comparatively computationally expensive, it is often worth restarting an optimization with more adequate settings rather than risk wasting computational resources. When doing so, it is important that no evaluation carried out previously is ever repeated.
Ideally, a database could be fed to Dakota, in which it would search through previously-ran cases before launching any evaluations. I do not know of a possibility to do this, and I implemented this at the evaluator level instead. The first step of my evaluator script is to parse a csv
file (the concatenation of all former optimization tabular_data_file
summaries) and check whether an entry matches the current evaluation. If so, a response
file is immediately created and the evaluation terminates before any computation is carried out.
Generally, I found that having a clear, well-scripted rountine for rapidly deploying new optimizations is a major advantage.
Conclusion
Genetic optimization with Dakota is conceptually not complicated, but a lot of thought must go into the setup and monitoring of the optimizations if meaningful results are to be obtained in reasonable amounts of time. I shared here everything I learned over the last three months in the hope it can help others. I hope others may similarly share their experiences online.
Part 1: The optimization loop
Part 2: A sandbox for experimentation
Part 3: Practice