Home > Commands -> Function Commands -> fit_line


EXAMPLES ON MY SAGE PAGE: Scatterplots


Goal: Using the method of least squares, fit a line to list of x and list of y data values


def fit_line(x_data,y_data) :
   import numpy.linalg
   n=len(x_data); a=0.; b=0.
   if n==len(y_data) :
      A=numpy.array([[x_data[j],1] for j in range(n)])
      B=numpy.array(y_data)
      X=numpy.linalg.lstsq(A,B)[0]
      a=X[0]; b=X[1]
   return a, b


  • Example 1
    x_d = [18,23,25,35,65,54]                  # ages of individuals
    y_d = [202,186,187,180,156,169]      # maximum heart rate of each one
    a,b=fit_line(x_d,y_d)
    print "The regression line is: y=",a,"x+",b
    Result:
    The regression line is: y= -0.813015753938 x+ 209.810577644


Example 1 - To get this plot add code lines:
var('x')
LineF=plot(a*x+b,(x,min(x_d),max(x_d)))
SP=scatter_plot(zip(x_d,y_d), figsize=4, facecolor="lightgreen", edgecolor="green", markersize=30, marker='s')
show(LineF+SP)

fitline.png

Your responsibility: x_data and y_data to be lists of numbers

Sage commands used: len, numpy.array, numpy.linalg.lstsq

Extra comments:

  • If your data is given as 2d-points, you can unzip it into x and y data lists using example 2 of zip.
  • If you need the residuals, that is the sum of the squared errors, it is the second or [1] element of the lstsq array, that is: SSE=numpy.linalg.lstsq(A,B)[1].
  • If you are going to use this function a bunch of times in your program, I would move import numpy.linalg into your program (so it only runs once).

Related SageMath Pages: scatter_plot, Least Squares Approximation


Keywords: fitline, linear approximation, least squares, statistics, regression