Support vector machine classifier with :math:`\ell_1`-regularization
====================================================================

In this example we use CVXPY to train a SVM classifier with
:math:`\ell_1`-regularization. We are given data :math:`(x_i,y_i)`,
:math:`i=1,\ldots, m`. The :math:`x_i \in {\bf R}^n` are feature
vectors, while the :math:`y_i \in \{\pm 1\}` are associated boolean
outcomes. Our goal is to construct a good linear classifier
:math:`\hat y = {\rm sign}(\beta^T x - v)`. We find the parameters
:math:`\beta,v` by minimizing the (convex) function

.. math::


   f(\beta,v) = (1/m) \sum_i \left(1 - y_i ( \beta^T x_i-v) \right)_+ + \lambda
   \| \beta\|_1

The first term is the average hinge loss. The second term shrinks the
coefficients in :math:`\beta` and encourages sparsity. The scalar
:math:`\lambda \geq 0` is a (regularization) parameter. Minimizing
:math:`f(\beta,v)` simultaneously selects features and fits the
classifier.

Example
~~~~~~~

In the following code we generate data with :math:`n=20` features by
randomly choosing :math:`x_i` and a sparse
:math:`\beta_{\mathrm{true}} \in {\bf R}^n`. We then set
:math:`y_i = {\rm sign}(\beta_{\mathrm{true}}^T x_i -v_{\mathrm{true}} - z_i)`,
where the :math:`z_i` are i.i.d. normal random variables. We divide the
data into training and test sets with :math:`m=1000` examples each.

.. code:: python

    # Generate data for SVM classifier with L1 regularization.
    from __future__ import division
    import numpy as np
    np.random.seed(1)
    n = 20
    m = 1000
    TEST = m
    DENSITY = 0.2
    beta_true = np.random.randn(n,1)
    idxs = np.random.choice(range(n), int((1-DENSITY)*n), replace=False)
    for idx in idxs:
        beta_true[idx] = 0
    offset = 0
    sigma = 45
    X = np.random.normal(0, 5, size=(m,n))
    Y = np.sign(X.dot(beta_true) + offset + np.random.normal(0,sigma,size=(m,1)))
    X_test = np.random.normal(0, 5, size=(TEST,n))
    Y_test = np.sign(X_test.dot(beta_true) + offset + np.random.normal(0,sigma,size=(TEST,1)))

We next formulate the optimization problem using CVXPY.

.. code:: python

    # Form SVM with L1 regularization problem.
    import cvxpy as cp
    beta = cp.Variable((n,1))
    v = cp.Variable()
    loss = cp.sum(cp.pos(1 - cp.multiply(Y, X @ beta - v)))
    reg = cp.norm(beta, 1)
    lambd = cp.Parameter(nonneg=True)
    prob = cp.Problem(cp.Minimize(loss/m + lambd*reg))

We solve the optimization problem for a range of :math:`\lambda` to
compute a trade-off curve. We then plot the train and test error over
the trade-off curve. A reasonable choice of :math:`\lambda` is the value
that minimizes the test error.

.. code:: python

    # Compute a trade-off curve and record train and test error.
    TRIALS = 100
    train_error = np.zeros(TRIALS)
    test_error = np.zeros(TRIALS)
    lambda_vals = np.logspace(-2, 0, TRIALS)
    beta_vals = []
    for i in range(TRIALS):
        lambd.value = lambda_vals[i]
        prob.solve()
        train_error[i] = (np.sign(X.dot(beta_true) + offset) != np.sign(X.dot(beta.value) - v.value)).sum()/m
        test_error[i] = (np.sign(X_test.dot(beta_true) + offset) != np.sign(X_test.dot(beta.value) - v.value)).sum()/TEST
        beta_vals.append(beta.value)

.. code:: python

    # Plot the train and test error over the trade-off curve.
    import matplotlib.pyplot as plt
    %matplotlib inline
    %config InlineBackend.figure_format = 'svg'
    
    plt.plot(lambda_vals, train_error, label="Train error")
    plt.plot(lambda_vals, test_error, label="Test error")
    plt.xscale('log')
    plt.legend(loc='upper left')
    plt.xlabel(r"$\lambda$", fontsize=16)
    plt.show()


.. image:: svm_files/svm_8_0.svg


We also plot the regularization path, or the :math:`\beta_i` versus
:math:`\lambda`. Notice that the :math:`\beta_i` do not necessarily
decrease monotonically as :math:`\lambda` increases. 4 features remain
non-zero longer for larger :math:`\lambda` than the rest, which suggests
that these features are the most important. In fact
:math:`\beta_{\mathrm{true}}` had 4 non-zero values.

.. code:: python

    # Plot the regularization path for beta.
    for i in range(n):
        plt.plot(lambda_vals, [wi[i,0] for wi in beta_vals])
    plt.xlabel(r"$\lambda$", fontsize=16)
    plt.xscale("log")


.. image:: svm_files/svm_10_0.svg