Derivatives fundamentals ======================== This notebook will introduce you to the fundamentals of computing the derivative of the solution map to optimization problems. The derivative can be used for **sensitvity analysis**, to see how a solution would change given small changes to the parameters, and to compute **gradients** of scalar-valued functions of the solution. In this notebook, we will consider a simple disciplined geometric program. The geometric program under consideration is .. math:: $$\begin{array}{ll} \mbox{minimize} & 1/(xyz) \\ \mbox{subject to} & a(xy + xz + yz) \leq b\\ & x \geq y^c, \end{array}$$ where :math:x \in \mathbf{R}_{++}, :math:y \in \mathbf{R}_{++}, and :math:z \in \mathbf{R}_{++} are the variables, and :math:a \in \mathbf{R}_{++}, :math:b \in \mathbf{R}_{++} and :math:c \in \mathbf{R} are the parameters. The vector .. math:: \alpha = \begin{bmatrix} a \\ b \\ c \end{bmatrix} is the vector of parameters. .. code:: ipython3 import cvxpy as cp x = cp.Variable(pos=True) y = cp.Variable(pos=True) z = cp.Variable(pos=True) a = cp.Parameter(pos=True) b = cp.Parameter(pos=True) c = cp.Parameter() objective_fn = 1/(x*y*z) objective = cp.Minimize(objective_fn) constraints = [a*(x*y + x*z + y*z) <= b, x >= y**c] problem = cp.Problem(objective, constraints) problem.is_dgp(dpp=True) .. parsed-literal:: True Notice the keyword argument dpp=True. The parameters must enter in the DGP problem acording to special rules, which we refer to as dpp. The DPP rules are described in an online tutorial __. Next, we solve the problem, setting the parameters :math:a, :math:b and :math:c to :math:2, :math:1, and :math:0.5. .. code:: ipython3 a.value = 2.0 b.value = 1.0 c.value = 0.5 problem.solve(gp=True, requires_grad=True) print(x.value) print(y.value) print(z.value) .. parsed-literal:: 0.5612147353889386 0.31496200373359456 0.36892055859991446 Notice the keyword argument requires_grad=True; this is necessary to subsequently compute derivatives. Solution map ------------ The **solution map** of the above problem is a function .. math:: \mathcal{S} : \mathbf{R}^2_{++} \times \mathbf{R} \to \mathbf{R}^3_{++} which maps the parameter vector to the vector of optimal solutions .. math:: \mathcal S(\alpha) = \begin{bmatrix} x(\alpha) \\ y(\alpha) \\ z(\alpha)\end{bmatrix}. Here, :math:x(\alpha), :math:y(\alpha), and :math:z(\alpha) are the optimal values of the variables corresponding to the parameter vector. As an example, we just saw that .. math:: \mathcal S((2.0, 1.0, 0.5)) = \begin{bmatrix} 0.5612 \\ 0.3150 \\ 0.3690 \end{bmatrix}. Sensitivity analysis -------------------- When the solution map is differentiable, we can use its derivative .. math:: \mathsf{D}\mathcal{S}(\alpha) \in \mathbf{R}^{3 \times 3} to perform a **sensitivity analysis**, which studies how the solution would change given small changes to the parameters. Suppose we perturb the parameters by a vector of small magnitude :math:\mathsf{d}\alpha \in \mathbf{R}^3. We can approximate the change :math:\Delta in the solution due to the perturbation using the derivative, as .. math:: \Delta = \mathcal{S}(\alpha + \mathsf{d}\alpha) - \mathcal{S}(\alpha) \approx \mathsf{D}\mathcal{S}(\alpha) \mathsf{d}\alpha. We can compute this in CVXPY, as follows. Partition the perturbation as .. math:: \mathsf{d}\alpha = \begin{bmatrix} \mathsf{d}a \\ \mathsf{d}b \\ \mathsf{d}c\end{bmatrix}. We set the delta attributes of the parameters to their perturbations, and then call the derivative method. .. code:: ipython3 da, db, dc = 1e-2, 1e-2, 1e-2 a.delta = da b.delta = db c.delta = dc problem.derivative() The derivative method populates the delta attributes of the variables as a side-effect, with the predicted change in the variable. We can compare the predictions to the actual solution of the perturbed problem. .. code:: ipython3 x_hat = x.value + x.delta y_hat = y.value + y.delta z_hat = z.value + z.delta a.value += da b.value += db c.value += dc problem.solve(gp=True) print('x: predicted {0:.5f} actual {1:.5f}'.format(x_hat, x.value)) print('y: predicted {0:.5f} actual {1:.5f}'.format(y_hat, y.value)) print('z: predicted {0:.5f} actual {1:.5f}'.format(z_hat, z.value)) a.value -= da b.value -= db c.value -= dc .. parsed-literal:: x: predicted 0.55729 actual 0.55734 y: predicted 0.31783 actual 0.31783 z: predicted 0.37179 actual 0.37175 In this case, the predictions and the actual solutions are fairly close. Gradient -------- We can compute gradient of a scalar-valued function of the solution with respect to the parameters. Let :math:f : \mathbf{R}^{3} \to \mathbf{R}, and suppose we want to compute the gradient of the composition :math:f \circ \mathcal S. By the chain rule, .. math:: \nabla f(S(\alpha)) = \mathsf{D}^T\mathcal{S}(\alpha) \begin{bmatrix}\mathsf{d}x \\ \mathsf{d}y \\ \mathsf{d}z\end{bmatrix}, where :math:\mathsf{D}^T\mathcal{S} is the adjoint (or transpose) of the derivative operator, and :math:\mathsf{d}x, :math:\mathsf{d}y, and :math:\mathsf{d}z are the partial derivatives of :math:f with respect to its arguments. We can compute the gradient in CVXPY, using the backward method. As an example, suppose .. math:: f(x, y, z) = \frac{1}{2}(x^2 + y^2 + z^2), so that :math:\mathsf{d}x = x, :math:\mathsf{d}y = y, and :math:\mathsf{d}z = z. Let :math:\mathsf{d}\alpha = \nabla f(S(\alpha)), and suppose we subtract :math:\eta \mathsf{d}\alpha from the parameter, where :math:\eta is a positive constant. Using the following code, we can compare :math:f(\mathcal S(\alpha - \eta \mathsf{d}\alpha)) with the value predicted by the gradient, .. math:: f(\mathcal S(\alpha - \eta \mathsf{d}\alpha)) \approx f(\mathcal S(\alpha)) - \eta \mathsf{d}\alpha^T\mathsf{d}\alpha. .. code:: ipython3 problem.solve(gp=True, requires_grad=True) def f(x, y, z): return 1/2*(x**2 + y**2 + z**2) original = f(x, y, z).value x.gradient = x.value y.gradient = y.value z.gradient = z.value problem.backward() eta = 0.5 dalpha = cp.vstack([a.gradient, b.gradient, c.gradient]) predicted = float((original - eta*dalpha.T @ dalpha).value) a.value -= eta*a.gradient b.value -= eta*b.gradient c.value -= eta*c.gradient problem.solve(gp=True) actual = f(x, y, z).value print('original {0:.5f} predicted {1:.5f} actual {2:.5f}'.format( original, predicted, actual)) .. parsed-literal:: original 0.27513 predicted 0.22709 actual 0.22942