It is a technique to fit a nonlinear equation by taking polynomial functions of independent variable.
In the figure given below, you can see the red curve fits the data better than the green curve. Hence in the situations where the relation between the dependent and independent variable seems to be non-linear we can deploy Polynomial Regression Models.
Thus a polynomial of degree k in one variable is written as:
Here we can create new features like
and can fit linear regression in the similar manner.
In case of multiple variables say X1 and X2, we can create a third new feature (say X3) which is the product of X1 and X2 i.e.
It is to be kept in mind that creating unnecessary extra features or fitting polynomials of higher degree may lead to overfitting.
Polynomial regression in R
We are using poly.csv data for fitting polynomial regression where we try to estimate the Prices of the house given their area.
Firstly we read the data using read.csv( ) and divide it into the dependent and independent variable
data = read.csv(“poly.csv”)
x = data$Area
y = data$Price
In order to compare the results of linear and polynomial regression, firstly we fit linear regression:
model1 = lm(y ~x)
The coefficients and predicted values obtained are:
We create a dataframe where the new variable are x and x square.
new_x = cbind(x,x^2)
Now we fit usual OLS to the new data:
model2 = lm(y~new_x)
The fitted values and regression coefficients of polynomial regression are:
Using ggplot2 package we try to create a plot to compare the curves by both linear and polynomial regression.
ggplot(data = data) + geom_point(aes(x = Area,y = Price)) +
geom_line(aes(x = Area,y = model1$fit),color = “red”) +
geom_line(aes(x = Area,y = model2$fit),color = “blue”) +
theme(panel.background = element_blank())