A Step-by-Step Guide to Linear Regression for Beginners

A Step-by-Step Guide to Linear Regression for Beginners

A beginners guide

Photo by Scott Graham on Unsplash

Linear regression is a statistical and machine learning algorithm and it is the most well-known and well-understood. Don’t worry you don’t have to study statistics and machine learning to know Linear regression. In this blog, I have written its meaning in simple words.

Machine learning, especially in the predictive modeling industry is primarily concerned with minimizing model error or making more accurate predictions possible, at a descriptive rate. In the application of the learning machine, we will borrow, reuse, and steal algorithms in many fields, including mathematics, and use them too.

Thus, linear regression was developed in the field of mathematics and is studied as a model for understanding the relationship between numerical input and output.

Linear regression has a simple representation.

Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). That’s why the name is Linear Regression.

Hypothesis function for Linear Regression:

where,
x: input training data (univariate )
y: labels to data

When training the model — it fits the best line to predict the value of y for a given value of x. The model gets the best regression fit line by finding the best θ1 and θ2 values.
θ1: intercept
θ2: coefficient of x

Once we find the best θ1 and θ2 values, we get the best fit line. When we are using our model for prediction, it will predict the value of y for the input value of x.

In the above figure, you can see the X-axis is having independent values whereas Y-axis is having dependent values.

Dots represent data points of given data and the vertical distance between that point and the plotted line is an error. As a machine learning engineer its our job to fit this line to the data points in such a way that there should less errors.

Mathematically we can represent Linear regression as:

Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

Types of linear regression

Linear regression is divided into two types

  1. Simple linear regression — If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression.

  2. Multiple linear regression — If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

How can we find the best-fit line?

If we are working with linear regression, our main goal is to find the most accurate line which means the error between predicted values ​​and actual values ​​should be reduced. The most relevant line will have the slightest error.

Different weight values ​​or coefficients of lines (a0, a1) provide a different line of regression, so we need to calculate the best values ​​of a0 and a1 to find the most relevant line, so to calculate this we use a cost function and gradient descent.

Here is the simple project on Linear regression Link

Thank you for reading.

Have a nice day! 😁 For more such content make sure to subscribe to my Newsletter here
Follow me on

Twitter

Github

Linkedin

Did you find this article valuable?

Support writtenbykaushal by becoming a sponsor. Any amount is appreciated!