If you've ever looked at genetic algorithms, you may have been struck by the counter-intuitive nature of the selection step. In each iteration, you don't take the best results from the previous iteration and combine them for the next iteration; in fact, if you do this, it works very poorly as you get premature convergence to a local minimum as you've gone all exploitation and no exploration. What you need to do is hold yourself back, and select 'goodish' solutions, because goodish can lead to much better in the long run.
One way to do this is with tournament selection. The one parameter is the size of the tournament, T. It works very simply; choose T items at random and keep the best. Remove that item from the pool and repeat this procedure until you have selected the required number of items.
I usually use a value of T=3, but I was curious what exactly this distribution looked like, and the effect of the tournament size. Since I was not able to find an analytical solution (feel free to leave a comment with the details if you do), I wrote a simulation which selects 1000 numbers between 0 and 99 inclusive with different tournament sizes (here I did selection with replacement just to get the distribution of the first selected item). The code also prints out the smallest number where the cumulative sum of the distribution exceeds 50%:
Tournament size of 1: 50% reached at 49 Tournament size of 2: 50% reached at 29 Tournament size of 3: 50% reached at 20 Tournament size of 4: 50% reached at 15 Tournament size of 5: 50% reached at 12 Tournament size of 6: 50% reached at 10 Tournament size of 7: 50% reached at 9
import random random.seed(1) import numpy as np import matplotlib.pyplot as plt if __name__ == "__main__": tour_sizes = [1, 2, 3, 4, 5, 6, 7] num_repetitions = 1000 num_selections = 1000 for tour_size in tour_sizes: averages = [[] for x in range(100)] for rep in range(num_repetitions): # repeat to get the mean count = [0] * 100 for i in range(num_selections): # observe the distribution across selections # tournament selection of size tour_size select = min(random.sample(range(100), tour_size)) count[select] += 1 for i in range(100): averages[i].append(count[i]) for i in range(100): averages[i] = np.mean(averages[i]) plt.plot(range(100), averages, label=f'{tour_size}') cumsum = 0 for i in range(100): cumsum += averages[i] if cumsum >= num_selections / 2: print(f"Tournament size of {tour_size}: 50% reached at {i}") break plt.legend() plt.ylabel("Count") plt.title("Effect of tournament size on distribution") plt.xlim(0, 100) plt.ylim(bottom=0) plt.show()
Update (2023-06-22): I've guessed the analytical solution as y = 10 . N . f**(N-1) where N is the tournament size and f = 1-(x/100) in the diagram above.