Jean Gallier,

Computer and Information Science Department,

School of Engineering and Applied Science,

University of Pennsylvania,

Philadelphia, PA, USA.

## Supporting documents

Here are the slides of the presentation.

The videos:

- First part (1h19:50):
- Second part (1h36:01):

## Abstract

Two central problems in machine learning are

- Data fitting (learning a function),
- Data classification.

### Data fitting

Learning a function is the following problem: given a list ((*x*_{1}, *y*_{1}), …, (*x*_{m}, *y*_{m})), *x*_{i} ∈ **R**^{n}, *y*_{i} ∈ **R**, viewed as input/output of some unknown function, find a “nice” function *f*: **R**^{n} → **R** such that *f*(*x*_{i}) = *y*_{i}. If we look for an affine function *f*(*x*) = *w*^{T} *x* + *b*, then we can view the problem as an optimization problem: minimize the error ∑_{i=1}^{m} (*y*_{i} – *x*_{i}^{T} *w* – *b*)^{2}.

The difficulty is that since *w* is unconstrained, this minimization problem “blows up.” Thus it is necessary is to add some regularization term. We will discuss three scenarios:

- (i) Add an
*l*^{2}term*K**w*^{T}*w*. This is*ridge regression*. - (ii) Add an
*l*^{1}term τ ∑_{i=1}^{m}|*w*_{i}|. This is*lasso*. - (iii) Add both an
*l*^{1}term and an*l*^{2}term. This is*elastic net*.

Problems (ii) and (iii) do not have a closed form solution. They can be solved using a powerful iterative method, *ADMM*.

### Data classification

The problem of data classification is that we have two disjoint sets of data points, {*u*_{1}, …, *u*_{p}} (the blue points), and {*v*_{1}, …, *v*_{q}} (the red points), and we want to find a hyperplane *H* of equation *w*^{T} *x* – *b* = 0 such that the blue points and the red points belong to the two disjoint open half spaces determined by *H*.

If this is possible, the method SVM due to Vapnik picks *H* as the hyperplane that maximizes the least distance of the blue and the red points to *H*. The problem can be formulated as an optimization problem that can be solved using its Lagrangian dual.

When separation is impossible, we relax the separation criterion by allowing a soft margin of error (points can invade the wrong side of the hyperplane). Again, this problem (soft marging SVM) can be formulated as an optimization problem and solved using its dual (using ADMM).

We will sketch how the Lagrangian framework dealing with inequality constraints is used to solve the SVM (hard and soft) problems.