Machine learning algorithms and the art of hyperparameter selection


TNW
Principal Data
KNIME
IoT
Netflix
x=8.4


KNIME
Mischa Lisovyi
Mischa Lisovyi
Rosaria Silipo
Hill


Bayesian

No matching tags

No matching tags


Costa Rica
Amsterdam

No matching tags

Positivity     36.00%   
   Negativity   64.00%
The New York Times
SOURCE: https://thenextweb.com/podium/2019/11/11/machine-learning-algorithms-and-the-art-of-hyperparameter-selection/
Write a review: The Next Web
Summary

We just need to move to the next iteration and iterate over the next performance evaluation, and so on.The key factor in all different optimization strategies is how to select the next set of hyperparameter values in step 2a, depending on the previous metric outputs in step 2d. This gradient coloring will be useful later to illustrate the differences across the optimization strategies.The goal of the optimization procedure in this simplified use case is to find the one hyperparameter that maximizes the value of the function.Let’s begin our review of four common optimization strategies used to identify the new set of hyperparameter values for the next iteration of the optimization loop.This is a basic brute-force strategy. Whiter points correspond to hyperparameter values generated earlier on in the process; red points correspond to hyperparameter values generated later on.As Figure 1 shows, the range of the hyperparameter is scanned from small to large values.The grid search strategy can work well in the case of a single parameter, but it becomes very inefficient when multiple parameters have to be optimized simultaneously.For the random search strategy, the values of the hyperparameters are selected randomly, as the name suggests. Whiter points correspond to hyperparameter values generated earlier on in the process; red points correspond to hyperparameter values generated later on.As expected, the hyperparameter values from the generated sequence are used in no decreasing or increasing order: white and red dots mix randomly in the plot.The hill climbing approach at each iteration selects the best direction in the hyperparameter space to choose the next hyperparameter value. Whiter points correspond to hyperparameter values generated earlier on in the process; red points correspond to hyperparameter values generated later on.Figure 3 shows that the hill climbing strategy applied to our function started at a random hyperparameter value, x=8.4, and then moved toward the function maximum y=0.4 at x=6.9. A good rule of thumb for this method is to run it multiple times with different starting values and to check whether the algorithm converges to the same maximum.The Bayesian optimization strategy selects the next hyperparameter value based on the function outputs in the previous iterations, similar to the hill climbing strategy. The gray points are generated in the first random phase of the strategy.Figure 4 demonstrates that the Bayesian optimization strategy uses the warm-up phase to define the most promising area and then selects the next values for the hyperparameters in that area.

As said here by Mischa Lisovyi and Rosaria Silipo