{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Polynomial Regression\n\nThis example shows how to use the :py:class:`pylops.Regression` operator\nto perform *Polynomial regression analysis*.\n\nIn short, polynomial regression is the problem of finding the best fitting\ncoefficients for the following equation:\n\n    .. math::\n        y_i = \\sum_{n=0}^\\text{order} x_n t_i^n  \\qquad \\forall i=0,1,\\ldots,N-1\n\nAs we can express this problem in a matrix form:\n\n    .. math::\n        \\mathbf{y}=  \\mathbf{A} \\mathbf{x}\n\nour solution can be obtained by solving the following optimization problem:\n\n    .. math::\n        J= ||\\mathbf{y} - \\mathbf{A} \\mathbf{x}||_2\n\nSee documentation of :py:class:`pylops.Regression` for more detailed\ndefinition of the forward problem.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\nimport numpy as np\n\nimport pylops\n\nplt.close(\"all\")\nnp.random.seed(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define the input parameters: number of samples along the t-axis (``N``),\norder (``order``), regression coefficients (``x``), and standard deviation\nof noise to be added to data (``sigma``).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "N = 30\norder = 3\nx = np.array([1.0, 0.05, 0.0, -0.01])\nsigma = 1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's create the time axis and initialize the\n:py:class:`pylops.Regression` operator\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "t = np.arange(N, dtype=\"float64\") - N // 2\nPRop = pylops.Regression(t, order=order, dtype=\"float64\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can then apply the operator in forward mode to compute our data points\nalong the x-axis (``y``). We will also generate some random gaussian noise\nand create a noisy version of the data (``yn``).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "y = PRop * x\nyn = y + np.random.normal(0, sigma, N)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We are now ready to solve our problem. As we are using an operator from the\n:py:class:`pylops.LinearOperator` family, we can simply use ``/``,\nwhich in this case will solve the system by means of an iterative solver\n(i.e., :py:func:`scipy.sparse.linalg.lsqr`).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "xest = PRop / y\nxnest = PRop / yn"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's plot the best fitting curve for the case of noise free and noisy data\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(5, 7))\nplt.plot(\n    t,\n    PRop * x,\n    \"k\",\n    lw=4,\n    label=r\"true: $x_0$ = %.2f, $x_1$ = %.2f, \"\n    r\"$x_2$ = %.2f, $x_3$ = %.2f\" % (x[0], x[1], x[2], x[3]),\n)\nplt.plot(\n    t,\n    PRop * xest,\n    \"--r\",\n    lw=4,\n    label=\"est noise-free: $x_0$ = %.2f, $x_1$ = %.2f, \"\n    r\"$x_2$ = %.2f, $x_3$ = %.2f\" % (xest[0], xest[1], xest[2], xest[3]),\n)\nplt.plot(\n    t,\n    PRop * xnest,\n    \"--g\",\n    lw=4,\n    label=\"est noisy: $x_0$ = %.2f, $x_1$ = %.2f, \"\n    r\"$x_2$ = %.2f, $x_3$ = %.2f\" % (xnest[0], xnest[1], xnest[2], xnest[3]),\n)\nplt.scatter(t, y, c=\"r\", s=70)\nplt.scatter(t, yn, c=\"g\", s=70)\nplt.legend(fontsize=\"x-small\")\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We consider now the case where some of the observations have large errors.\nSuch elements are generally referred to as *outliers* and can affect the\nquality of the least-squares solution if not treated with care. In this\nexample we will see how using a L1 solver such as\n:py:func:`pylops.optimization.sparsity.IRLS` can drammatically improve the\nquality of the estimation of intercept and gradient.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Add outliers\nyn[1] += 40\nyn[N - 2] -= 20\n\n# IRLS\nnouter = 20\nepsR = 1e-2\nepsI = 0\ntolIRLS = 1e-2\n\nxnest = PRop / yn\nxirls, nouter = pylops.optimization.sparsity.irls(\n    PRop,\n    yn,\n    nouter=nouter,\n    threshR=False,\n    epsR=epsR,\n    epsI=epsI,\n    tolIRLS=tolIRLS,\n)\nprint(f\"IRLS converged at {nouter} iterations...\")\n\nplt.figure(figsize=(5, 7))\nplt.plot(\n    t,\n    PRop * x,\n    \"k\",\n    lw=4,\n    label=r\"true: $x_0$ = %.2f, $x_1$ = %.2f, \"\n    r\"$x_2$ = %.2f, $x_3$ = %.2f\" % (x[0], x[1], x[2], x[3]),\n)\nplt.plot(\n    t,\n    PRop * xnest,\n    \"--r\",\n    lw=4,\n    label=r\"L2: $x_0$ = %.2f, $x_1$ = %.2f, \"\n    r\"$x_2$ = %.2f, $x_3$ = %.2f\" % (xnest[0], xnest[1], xnest[2], xnest[3]),\n)\nplt.plot(\n    t,\n    PRop * xirls,\n    \"--g\",\n    lw=4,\n    label=r\"IRLS: $x_0$ = %.2f, $x_1$ = %.2f, \"\n    r\"$x_2$ = %.2f, $x_3$ = %.2f\" % (xirls[0], xirls[1], xirls[2], xirls[3]),\n)\nplt.scatter(t, y, c=\"r\", s=70)\nplt.scatter(t, yn, c=\"g\", s=70)\nplt.legend(fontsize=\"x-small\")\nplt.tight_layout()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.15"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}