Quantile Regression
Quantile regression is a type of regression analysis used in statistics and machine learning that allows you to estimate conditional quantiles
of the response variable, such as the median or the 90th percentile, instead of just the mean (as in ordinary least squares regression).
In ordinary least squares (OLS) regression
, we estimate the mean
of the dependent variable \(Y\) given independent variables \(X\). That is
\[ \mathbb{E}(Y|X) = X\beta \]
But in quantile regression
, we estimate a quantile
(\(\tau\)) (like the median, \(\tau = 0.5\)), or the 90th percentile, (\(\tau = 0.9\)) of \(Y\) given \(X\):
\[ Q_\tau(Y|X) = X\beta_\ \]
So, we’re modeling: - The median
of \(Y\) (when $= $) - The 25th percentile
of \(Y\) (when \(\tau = 0.25\)) - The 90th percentile
of \(Y\) (when \(\tau = 0.9\)) … and so on.
Why Use Quantile Regression?
Quantile regression is useful when:
The relationship between variables changes at different points in the distribution.
When we care about the extremes (e.g., predicting high incomes, high risks, etc.).
When data has
heteroscedasticity
(non-constant variance) oroutliers
, which can affect OLS estimates.
Key Features:
Robust to outliers
: Especially median regression (\(\tau = 0.5\)) is more robust than mean regression.Provides a fuller picture
: It tells you how predictors affect not just the center but also the tails of the response variable distribution.Linear form, different fit
: It still assumes a linear model form but optimizes a different loss function.
Types of Quantile Regression
Here’s a practical guide to performing different types of quantile regression in R, using popular packages like {quantreg}
, {brms}
, and others. I’ll use built-in datasets (e.g., mtcars
) for illustration.
Linear Quantile Regression
Package: quantreg
(Koenker & Bassett, 1978)
Use Case: Modeling linear relationships at specific quantiles (e.g., median).
# Install and load the package
install.packages("quantreg")
library(quantreg)
# Fit median regression (tau = 0.5)
data(mtcars)
model <- rq(mpg ~ wt + hp, data = mtcars, tau = 0.5)
summary(model)
# Plot coefficients across quantiles
model_multi <- rq(mpg ~ wt + hp, data = mtcars, tau = c(0.1, 0.5, 0.9))
plot(summary(model_multi))
Non-Linear Quantile Regression
Package: quantreg
+ splines
Use Case: Capturing non-linear relationships using splines or polynomials.
library(quantreg)
library(splines)
# Fit a spline-based quantile regression (tau = 0.7)
<- rq(mpg ~ ns(wt, df = 3) + ns(hp, df = 3), data = mtcars, tau = 0.7)
model summary(model)
Bayesian Quantile Regression
Package: brms
(Bayesian framework)
Use Case: Incorporating prior distributions for uncertainty quantification.
# Install and load
install.packages("brms")
library(brms)
# Bayesian quantile regression (tau = 0.5)
model_bayes <- brm(
bf(mpg ~ wt + hp, quantile = 0.5),
data = mtcars,
family = asym_laplace()
)
summary(model_bayes)
Censored Quantile Regression*
Package: quantreg
(for censored data)
Use Case: Handling censored outcomes (e.g., survival data).
library(survival)
library(quantreg)
# Simulate censored data (mpg left-censored at 20)
data(mtcars)
y_censored <- pmax(mtcars$mpg, 20)
status <- ifelse(mtcars$mpg > 20, 1, 0)
# Fit censored quantile regression (median)
model <- crq(
Surv(y_censored, status, type = "left") ~ wt + hp,
data = mtcars,
method = "Portnoy"
)
summary(model)
Composite Quantile Regression
Package: cqrReg
Use Case: Simultaneously estimating multiple quantiles for efficiency.
# Install and load
install.packages("cqrReg")
library(cqrReg)
# Fit composite quantile regression (tau = 0.25, 0.5, 0.75)
<- cqr.fit(cbind(1, mtcars$wt, mtcars$hp), mtcars$mpg, tau = c(0.25, 0.5, 0.75))
model print(model)
High-Dimensional Quantile Regression
Package: hdm
or quantreg
with penalization
Use Case: Regularized regression (Lasso/Ridge) for high-dimensional data.
library(quantreg)
# Simulate high-dimensional data
set.seed(123)
<- matrix(rnorm(1000), ncol = 100) # 100 predictors
X <- X[,1] + X[,2] + rnorm(10)
y
# Lasso-penalized quantile regression (tau = 0.5)
<- rq.fit.lasso(X, y, tau = 0.5, lambda = 0.1)
model coef(model)
Non-Parametric Quantile Regression
Package: quantregForest
Use Case: Flexible modeling without assuming functional forms.
library(quantregForest)
# Train a quantile regression forest
model <- quantregForest(x = mtcars[, c("wt", "hp")], y = mtcars$mpg)
predict(model, newdata = data.frame(wt = 3.5, hp = 150), quantiles = c(0.1, 0.5, 0.9))
Time Series Quantile Regression
Package: quantreg
Use Case: Modeling temporal dependencies (e.g., lagged effects).
library(quantreg)
data("AirPassengers") # Built-in time series
# Create lagged variables
df <- data.frame(
y = as.numeric(AirPassengers),
lag1 = dplyr::lag(AirPassengers, 1),
lag2 = dplyr::lag(AirPassengers, 2)
)
df <- na.omit(df)
# Fit quantile autoregressive model (tau = 0.5)
model <- rq(y ~ lag1 + lag2, data = df, tau = 0.5)
summary(model)
Panel Data Quantile Regression
Package: lqmm
(Linear Quantile Mixed Models)
Use Case: Accounting for group-specific effects (e.g., longitudinal data).
install.packages("lqmm")
library(lqmm)
# Simulate panel data
set.seed(123)
df <- data.frame(
y = rnorm(100),
x = rnorm(100),
group = rep(1:10, each = 10)
)
# Fit panel quantile regression with random intercepts
model <- lqmm(y ~ x, random = ~1, group = group, data = df, tau = 0.5)
summary(model)
Summary and Coclusion
- Core Package:
quantreg
covers most needs (linear, non-linear, censored, time series).
- Advanced Use Cases: Use
brms
(Bayesian),lqmm
(panel data), orquantregForest
(non-parametric).
- Key Functions:
rq()
: Linear quantile regression.
crq()
: Censored quantile regression.
quantregForest()
: Non-parametric forests.