{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# 19. Automatic Differentiation\nThis tutorial focuses on the use of :class:`pylops.TorchOperator` to allow performing\nAutomatic Differentiation (AD) on chains of operators which can be:\n\n- native PyTorch mathematical operations (e.g., :func:`torch.log`,\n  :func:`torch.sin`, :func:`torch.tan`, :func:`torch.pow`, ...)\n- neural network operators in :mod:`torch.nn`\n- PyLops linear operators\n\nThis opens up many opportunities, such as easily including linear regularization\nterms to nonlinear cost functions or using linear preconditioners with nonlinear\nmodelling operators.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import gradcheck\n\nimport pylops\n\nplt.close(\"all\")\nnp.random.seed(10)\ntorch.manual_seed(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In this example we consider a simple multidimensional functional:\n\n\\begin{align}\\mathbf{y} = \\mathbf{A} sin(\\mathbf{x})\\end{align}\n\nand we use AD to compute the gradient with respect to the input vector\nevaluated at $\\mathbf{x}=\\mathbf{x}_0$ :\n$\\mathbf{g} = d\\mathbf{y} / d\\mathbf{x} |_{\\mathbf{x}=\\mathbf{x}_0}$.\n\nLet's start by defining the Jacobian:\n\n  .. math::\n       \\textbf{J} = \\begin{bmatrix}\n       dy_1 / dx_1 & ... & dy_1 / dx_M \\\\\n       ... & ... & ... \\\\\n       dy_N / dx_1 & ... & dy_N / dx_M\n       \\end{bmatrix} = \\begin{bmatrix}\n       a_{11} cos(x_1) & ... & a_{1M} cos(x_M) \\\\\n       ... & ... & ... \\\\\n       a_{N1} cos(x_1) & ... & a_{NM} cos(x_M)\n       \\end{bmatrix} = \\textbf{A} cos(\\mathbf{x})\n\nSince both input and output are multidimensional,\nPyTorch ``backward`` actually computes the product between the transposed\nJacobian and a vector $\\mathbf{v}$:\n$\\mathbf{g}=\\mathbf{J^T} \\mathbf{v}$.\n\nTo validate the correctness of the AD result, we can in this simple case\nalso compute the Jacobian analytically and apply it to the same vector\n$\\mathbf{v}$ that we have provided to PyTorch ``backward``.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "nx, ny = 10, 6\nx0 = torch.arange(nx, dtype=torch.double, requires_grad=True)\n\n# Forward\nA = np.random.normal(0.0, 1.0, (ny, nx))\nAt = torch.from_numpy(A)\nAop = pylops.TorchOperator(pylops.MatrixMult(A))\ny = Aop.apply(torch.sin(x0))\n\n# AD\nv = torch.ones(ny, dtype=torch.double)\ny.backward(v, retain_graph=True)\nadgrad = x0.grad\n\n# Analytical\nJ = At * torch.cos(x0)\nanagrad = torch.matmul(J.T, v)\n\nprint(\"Input: \", x0)\nprint(\"AD gradient: \", adgrad)\nprint(\"Analytical gradient: \", anagrad)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Similarly we can use the :func:`torch.autograd.gradcheck` directly from\nPyTorch. Note that doubles must be used for this to succeed with very small\n`eps` and `atol`\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "input = (\n    torch.arange(nx, dtype=torch.double, requires_grad=True),\n    Aop.matvec,\n    Aop.rmatvec,\n    Aop.device,\n    \"cpu\",\n)\ntest = gradcheck(Aop.Top, input, eps=1e-6, atol=1e-4)\nprint(test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note that while matrix-vector multiplication could have been performed using\nthe native PyTorch operator :func:`torch.matmul`, in this case we have shown\nthat we are also able to use a PyLops operator wrapped in\n:class:`pylops.TorchOperator`. As already mentioned, this gives us the\nability to use much more complex linear operators provided by PyLops within\na chain of mixed linear and nonlinear AD-enabled operators.\nTo conclude, let's see how we can chain a torch convolutional network\nwith PyLops :class:`pylops.Smoothing2D` operator. First of all, we consider\na single training sample.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class Network(nn.Module):\n    def __init__(self, input_channels):\n        super(Network, self).__init__()\n        self.conv1 = nn.Conv2d(\n            input_channels, input_channels // 2, kernel_size=3, padding=1\n        )\n        self.conv2 = nn.Conv2d(\n            input_channels // 2, input_channels // 4, kernel_size=3, padding=1\n        )\n        self.activation = nn.LeakyReLU(0.2)\n        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.activation(x)\n        x = self.conv2(x)\n        x = self.activation(x)\n        return x\n\n\nnet = Network(4)\nCop = pylops.TorchOperator(pylops.Smoothing2D((5, 5), dims=(32, 32)))\n\n# Forward\nx = torch.randn(1, 4, 32, 32).requires_grad_()\ny = Cop.apply(net(x).view(-1)).reshape(32, 32)\n\n# Backward\nloss = y.sum()\nloss.backward()\n\nfig, axs = plt.subplots(1, 2, figsize=(12, 3))\naxs[0].imshow(y.detach().numpy())\naxs[0].set_title(\"Forward\")\naxs[0].axis(\"tight\")\naxs[1].imshow(x.grad.reshape(4 * 32, 32).T)\naxs[1].set_title(\"Gradient\")\naxs[1].axis(\"tight\")\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "And finally we do the same with a batch of 3 training samples.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "net = Network(4)\nCop = pylops.TorchOperator(pylops.Smoothing2D((5, 5), dims=(32, 32)), batch=True)\n\n# Forward\nx = torch.randn(3, 4, 32, 32).requires_grad_()\ny = Cop.apply(net(x).reshape(3, 32 * 32)).reshape(3, 32, 32)\n\n# Backward\nloss = y.sum()\nloss.backward()\n\nfig, axs = plt.subplots(1, 2, figsize=(12, 3))\naxs[0].imshow(y[0].detach().numpy())\naxs[0].set_title(\"Forward\")\naxs[0].axis(\"tight\")\naxs[1].imshow(x.grad[0].reshape(4 * 32, 32).T)\naxs[1].set_title(\"Gradient\")\naxs[1].axis(\"tight\")\nplt.tight_layout()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.15"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}