Bi-level Optimization for Support Vector Machines

Teresa Klatzer

Research output: ThesisMaster's Thesis


This thesis deals with an efficient approach for learning the optimal hyper-parameters for
Support Vector Machines (SVMs). The common method to determine hyper-parameters
is grid search. Grid search typically involves the definition of a discretized ”grid” of
possible parameter values with a certain resolution and a search for the values that
result in the minimal validation error of the learned model. A major limitation of grid
search is that the search space grows exponentially in the parameters which makes the
approach only practical for determining very few hyper-parameters. Additionally, grid
search operates on discrete parameter values which leads to suboptimal solutions. In
this thesis we develop an approach to use bi-level optimization for learning the optimal
hyper-parameters and solve both major shortcomings of grid search in an efficient and
elegant way. Bi-level learning is an optimization method where one optimization problem
has another optimization problem as its constraint. The goal of the bi-level program is to
find optimal hyper-parameters such that the validation error (the higher level objective)
is minimized, while the optimal training problem is solved for the underlying SVM (the
lower level objective). We use Lagrange multipliers to solve the bi-level problem and
formulate the solution for several variants of the SVM (linear, kernel, multiple kernel).
We can show that, using this method, the model selection problem (i.e. selection of
hyper-parameters) can be solved also for a large number of hyper-parameters. The bi-
level approach exploits the continuity of the hyper-parameters which allows for better
solutions than with grid search. In the experiments, we investigate different properties
of the bi-level approach and try to give insights into the advantages of this method. We
find that highly parametrized kernel SVMs perform best compared to simpler models
which is a clear advantage of bi-level optimization against grid search for model selection.
Original languageEnglish
Publication statusPublished - 2014

Cite this