Derivatives fundamentals ======================== This notebook will introduce you to the fundamentals of computing the derivative of the solution map to optimization problems. The derivative can be used for **sensitivity analysis**, to see how a solution would change given small changes to the parameters, and to compute **gradients** of scalar-valued functions of the solution. In this notebook, we will consider a simple disciplined geometric program. The geometric program under consideration is .. math:: \begin{equation} \begin{array}{ll} \mbox{minimize} & 1/(xyz) \\ \mbox{subject to} & a(xy + xz + yz) \leq b\\ & x \geq y^c, \end{array} \end{equation} where :math:`x \in \mathbf{R}_{++}`, :math:`y \in \mathbf{R}_{++}`, and :math:`z \in \mathbf{R}_{++}` are the variables, and :math:`a \in \mathbf{R}_{++}`, :math:`b \in \mathbf{R}_{++}` and :math:`c \in \mathbf{R}` are the parameters. The vector .. math:: \alpha = \begin{bmatrix} a \\ b \\ c \end{bmatrix} is the vector of parameters. .. code:: ipython3 import cvxpy as cp x = cp.Variable(pos=True) y = cp.Variable(pos=True) z = cp.Variable(pos=True) a = cp.Parameter(pos=True) b = cp.Parameter(pos=True) c = cp.Parameter() objective_fn = 1/(x*y*z) objective = cp.Minimize(objective_fn) constraints = [a*(x*y + x*z + y*z) <= b, x >= y**c] problem = cp.Problem(objective, constraints) problem.is_dgp(dpp=True) .. parsed-literal:: True Notice the keyword argument ``dpp=True``. The parameters must enter in the DGP problem acording to special rules, which we refer to as ``dpp``. The DPP rules are described in an `online tutorial `__. Next, we solve the problem, setting the parameters :math:`a`, :math:`b` and :math:`c` to :math:`2`, :math:`1`, and :math:`0.5`. .. code:: ipython3 a.value = 2.0 b.value = 1.0 c.value = 0.5 problem.solve(gp=True, requires_grad=True) print(x.value) print(y.value) print(z.value) .. parsed-literal:: 0.5612147353889386 0.31496200373359456 0.36892055859991446 Notice the keyword argument ``requires_grad=True``; this is necessary to subsequently compute derivatives. Solution map ------------ The **solution map** of the above problem is a function .. math:: \mathcal{S} : \mathbf{R}^2_{++} \times \mathbf{R} \to \mathbf{R}^3_{++} which maps the parameter vector to the vector of optimal solutions .. math:: \mathcal S(\alpha) = \begin{bmatrix} x(\alpha) \\ y(\alpha) \\ z(\alpha)\end{bmatrix}. Here, :math:`x(\alpha)`, :math:`y(\alpha)`, and :math:`z(\alpha)` are the optimal values of the variables corresponding to the parameter vector. As an example, we just saw that .. math:: \mathcal S((2.0, 1.0, 0.5)) = \begin{bmatrix} 0.5612 \\ 0.3150 \\ 0.3690 \end{bmatrix}. Sensitivity analysis -------------------- When the solution map is differentiable, we can use its derivative .. math:: \mathsf{D}\mathcal{S}(\alpha) \in \mathbf{R}^{3 \times 3} to perform a **sensitivity analysis**, which studies how the solution would change given small changes to the parameters. Suppose we perturb the parameters by a vector of small magnitude :math:`\mathsf{d}\alpha \in \mathbf{R}^3`. We can approximate the change :math:`\Delta` in the solution due to the perturbation using the derivative, as .. math:: \Delta = \mathcal{S}(\alpha + \mathsf{d}\alpha) - \mathcal{S}(\alpha) \approx \mathsf{D}\mathcal{S}(\alpha) \mathsf{d}\alpha. We can compute this in CVXPY, as follows. Partition the perturbation as .. math:: \mathsf{d}\alpha = \begin{bmatrix} \mathsf{d}a \\ \mathsf{d}b \\ \mathsf{d}c\end{bmatrix}. We set the ``delta`` attributes of the parameters to their perturbations, and then call the ``derivative`` method. .. code:: ipython3 da, db, dc = 1e-2, 1e-2, 1e-2 a.delta = da b.delta = db c.delta = dc problem.derivative() The ``derivative`` method populates the ``delta`` attributes of the variables as a side-effect, with the predicted change in the variable. We can compare the predictions to the actual solution of the perturbed problem. .. code:: ipython3 x_hat = x.value + x.delta y_hat = y.value + y.delta z_hat = z.value + z.delta a.value += da b.value += db c.value += dc problem.solve(gp=True) print('x: predicted {0:.5f} actual {1:.5f}'.format(x_hat, x.value)) print('y: predicted {0:.5f} actual {1:.5f}'.format(y_hat, y.value)) print('z: predicted {0:.5f} actual {1:.5f}'.format(z_hat, z.value)) a.value -= da b.value -= db c.value -= dc .. parsed-literal:: x: predicted 0.55729 actual 0.55734 y: predicted 0.31783 actual 0.31783 z: predicted 0.37179 actual 0.37175 In this case, the predictions and the actual solutions are fairly close. Gradient -------- We can compute gradient of a scalar-valued function of the solution with respect to the parameters. Let :math:`f : \mathbf{R}^{3} \to \mathbf{R}`, and suppose we want to compute the gradient of the composition :math:`f \circ \mathcal S`. By the chain rule, .. math:: \nabla f(S(\alpha)) = \mathsf{D}^T\mathcal{S}(\alpha) \begin{bmatrix}\mathsf{d}x \\ \mathsf{d}y \\ \mathsf{d}z\end{bmatrix}, where :math:`\mathsf{D}^T\mathcal{S}` is the adjoint (or transpose) of the derivative operator, and :math:`\mathsf{d}x`, :math:`\mathsf{d}y`, and :math:`\mathsf{d}z` are the partial derivatives of :math:`f` with respect to its arguments. We can compute the gradient in CVXPY, using the ``backward`` method. As an example, suppose .. math:: f(x, y, z) = \frac{1}{2}(x^2 + y^2 + z^2), so that :math:`\mathsf{d}x = x`, :math:`\mathsf{d}y = y`, and :math:`\mathsf{d}z = z`. Let :math:`\mathsf{d}\alpha = \nabla f(S(\alpha))`, and suppose we subtract :math:`\eta \mathsf{d}\alpha` from the parameter, where :math:`\eta` is a positive constant. Using the following code, we can compare :math:`f(\mathcal S(\alpha - \eta \mathsf{d}\alpha))` with the value predicted by the gradient, .. math:: f(\mathcal S(\alpha - \eta \mathsf{d}\alpha)) \approx f(\mathcal S(\alpha)) - \eta \mathsf{d}\alpha^T\mathsf{d}\alpha. .. code:: ipython3 problem.solve(gp=True, requires_grad=True) def f(x, y, z): return 1/2*(x**2 + y**2 + z**2) original = f(x, y, z).value x.gradient = x.value y.gradient = y.value z.gradient = z.value problem.backward() eta = 0.5 dalpha = cp.vstack([a.gradient, b.gradient, c.gradient]) predicted = float((original - eta*dalpha.T @ dalpha).value) a.value -= eta*a.gradient b.value -= eta*b.gradient c.value -= eta*c.gradient problem.solve(gp=True) actual = f(x, y, z).value print('original {0:.5f} predicted {1:.5f} actual {2:.5f}'.format( original, predicted, actual)) .. parsed-literal:: original 0.27513 predicted 0.22709 actual 0.22942