Linear Quantile Regression (LQR) is a statistical method that models the relationship between predictor variables and specific quantiles (e.g., median, 90th percentile) of the response variable. Unlike Ordinary Least Squares (OLS) regression, which estimates the conditional mean of the response variable, quantile regression provides a more complete picture by estimating conditional quantiles. This makes it robust to outliers and useful for analyzing non-normal or heteroscedastic data.
Key Concepts
(1) Quantiles vs. Mean
Mean (OLS Regression): Minimizes the sum of squared residuals → sensitive to outliers.
Quantiles (Quantile Regression): Minimizes the sum of asymmetrically weighted absolute residuals → robust to outliers.
(2) Conditional Quantile Function
For a given quantile τ ∈ (0,1) (e.g., τ = 0.5 for the median), the model estimates:
\[ Q_{Y|X}(\tau) = X \beta(\tau) \]
where:
\(Q_{Y|X}(\tau)\) = τ-th quantile of \(Y\) given predictors \(X\).
\(\beta(\tau)\) = regression coefficients for quantile τ.
where: \[
\rho_{\tau}(u) =
\begin{cases}
\tau u & \text{if } u \geq 0 \\
(\tau - 1)u & \text{if } u < 0
\end{cases}
\] - This asymmetrically weights residuals depending on whether they are above or below the quantile.
✅ Skewed Data (e.g., income, medical costs)
✅ Heteroscedasticity (variance changes with predictors)
✅ Outliers Present (OLS can be misleading)
✅ Interest in Extremes (e.g., 90th percentile risk analysis)
Quantile Regression in R
Using the quantreg Package
Code
#install.packages("quantreg")library(quantreg)
Loading required package: SparseM
Code
# Fit median regression (τ = 0.5)model <-rq(mpg ~ wt + hp, data = mtcars, tau =0.5)summary(model)
Linear Quantile Regression is a powerful alternative to OLS when: - You care about different parts of the distribution (not just the mean). - Your data has outliers, skewness, or heteroscedasticity. - You need robust estimates for decision-making (e.g., risk analysis).
By estimating conditional quantiles, it provides deeper insights into how predictors influence the entire distribution of the response variable.