While linear regression can address the question “is prenatal care important?” it cannot answer an important question: “does prenatal care influence birth weight differently for infants with low birth weight than for those with average birth weights? “
Quantile regression models the relation between a set of predictor variables and specific percentiles (or quantiles) of the response variable. It specifies changes in the quantiles of the response. For example, a median regression (median is the 50th percentile) of infant birth weight on mothers’ characteristics specifies the changes in the median birth weight as a function of the predictors. The effect of prenatal care on median infant birth weight can be compared to its effect on other quantiles of infant birth weight. In linear regression, the regression coefficient represents the increase in the response variable produced by a one unit increase in the predictor variable associated with that coefficient. The quantile regression parameter estimates the change in a specified quantile of the response variable produced by a one unit change in the predictor variable. This allows comparing how some percentiles of the birth weight may be more affected by certain mother characteristics than other percentiles. This is reflected in the change in the size of the regression coefficient.
Quantile regression is the extension of linear regression and we generally use it when outliers, high skeweness and heteroscedasticity exist in the data.
In linear regression, we predict the mean of the dependent variable for given independent variables. Since mean does not describe the whole distribution, so modeling the mean is not a full description of a relationship between dependent and independent variables. So we can use quantile regression which predicts a quantile (or percentile) for given independent variables.
The term “quantile” is the same as “percentile”
Basic Idea of Quantile Regression:In quantile regression we try to estimate the quantile of the dependent variable given the values of X’s. Note that the dependent variable should be continuous.
The quantile regression model:
For qth quantile we have the following regression model:
This seems similar to linear regression model but here the objective function we consider to minimize is:
where q is the qth quantile.
If q = 0.5 i.e. if we are interested in the median then it becomes median regression (or least absolute deviation regression) and substituting the value of q = 0.5 in above equation we get the objective function as:
Interpreting the coefficients in quantile regression:
Suppose the regression equation for 25th quantile of regression is:
y = 5.2333 + 700.823 x
It means that for one unit increase in x the estimated increase in 25th quantile of y by 700.823 units.
Advantages of Quantile over Linear Regression
– Quite beneficial when heteroscedasticity is present in the data.
– Robust to outliers
– Distribution of dependent variable can be described via various quantiles.
– It is more useful than linear regression when the data is skewed.
Disclaimer on using quantile regression!
It is to be kept in mind that the coefficients which we get in quantile regression for a particular quantile should differ significantly from those we obtain from linear regression. If it is not so then our usage of quantile regression isn’t justifiable. This can be done by observing the confidence intervals of regression coefficients of the estimates obtained from both the regressions.
Quantile Regression in R
We need to install quantreg package in order to carry out quantile regression.
Using rq function we try to predict the estimate the 25th quantile of Fertility Rate in Swiss data. For this we set tau = 0.25.
model1 = rq(Fertility~.,data = swiss,tau = 0.25)
tau:  0.25
Setting tau = 0.5 we run the median regression.
model2 = rq(Fertility~.,data = swiss,tau = 0.5)
tau:  0.5
We can run quantile regression for multiple quantiles in a single plot.
model3 = rq(Fertility~.,data = swiss, tau = seq(0.05,0.95,by = 0.05))
quantplot = summary(model3)
We can check whether our quantile regression results differ from the OLS results using plots.
Various quantiles are depicted by X axis. The red central line denotes the estimates of OLS coefficients and the dotted red lines are the confidence intervals around those OLS coefficients for various quantiles. The black dotted line are the quantile regression estimates and the gray area is the confidence interval for them for various quantiles. We can see that for all the variable both the regression estimated coincide for most of the quantiles. Hence our use of quantile regression is not justifiable for such quantiles. In other words we want that both the red and the gray lines should overlap as less as possible to justify our use of quantile regression.