3.5. Generalized Linear Mixed Models using Adaptive Gaussian Quadrature (GLMMadaptive)
This tutorial provides an overview of the mathematical framework behind GLMMs, the likelihood function, numerical integration using AGQ, parameter estimation, and prediction. We demonstrate how to fit GLMMs using the {GLMMadaptive} package in R with examples of binary, Poisson, and negative binomial response data.
Overview
Generalized Linear Mixed Models (GLMMs) are a powerful statistical framework for analyzing complex data structures that involve hierarchical or clustered data with non-normal outcomes. GLMMs extend the concept of Generalized Linear Models (GLMs) by incorporating random effects to account for the correlation among observations within the same group. The Generalized Linear Mixed Models using Adaptive Gaussian Quadrature with {GLMMadaptive} package in R provides a flexible and efficient tool for fitting GLMMs using Adaptive Gaussian Quadrature (AGQ) for numerical integration using the adaptive Gauss-Hermite quadrature rule. This involves modeling fixed effects, random effects, and the response distribution using likelihood-based methods. The package supports various response distributions, complex random effects structures, and custom link functions. Multiple random effects terms can be included for the grouping factor (e.g., random intercepts, random linear slopes, random quadratic slopes), but currently only a single grouping factor is allowed.
Key Features of GLMMadaptive
The key features of the {GLMMadaptive} package include:
Exponential Family Distributions:
Supports various response distributions, such as:
Normal
Binomial
Poisson
Negative Binomial
Beta (and other non-exponential family distributions via quasi-likelihood methods)
Random Effects:
complex random effects structures.
Random effects can be nested or crossed, with multivariate normal distribution assumptions.
Custom Link Functions:
Supports standard link functions (e.g., logit, log, identity).
Allows user-defined link functions for custom models.
Robust Estimation:
Utilizes AGQ for accurate and efficient approximation of high-dimensional integrals required in GLMM estimation.
Supports both marginal and conditional likelihood estimation.
Predictive Modeling:
tools for making predictions with confidence intervals.
Supports marginal predictions (accounting for random effects).
General Formulation of GLMMs
A GLMM can be expressed as:
\[ g(\mu_{ij}) = X{ij}\beta + Z_{ij}u_i \]
\(g(\cdot)\): Link function.
\(\mu_{ij}\): Mean of the response for observation \(j\) in group \(i\).
\(X{ij}\): Fixed-effects design matrix.
\(\beta\): Fixed-effects coefficients.
\(Z\_{ij}\): Random-effects design matrix.
\(u_i\): Random effects (assumed to follow a multivariate normal distribution, \(u_i \sim N(0, G)\).
The conditional distribution of the response \(y_{ij}\), given the random effects \(u_i\), is part of the exponential family:
\(f(y_{ij} \| u_i)\): Conditional density of the response.
\(f(u_i)\): Random effects density ( \(N(0, G)\) ).
Numerical Integration via Adaptive Gaussian Quadrature (AGQ)
AGQ is a numerical integration technique used to approximate these integrals efficiently and accurately. It is particularly suitable for GLMMs because it:
Exploits the assumption that random effects are normally distributed.
the quadrature points to the mode of the integrand for better accuracy.
To approximate the likelihood, AGQ is used. The key steps are:
1. Transformation of Variables
The integral is centered around the mode of the conditional random-effects density:
\(y_i\) = \(y_{i1}, y_{i2}, \dots, y_{in_i})\): Observations for group \(i\).
The mode \(\hat{u}_i\) is found by maximizing \(\log f(u_i \| y_i)\).
2. Quadrature Approximation
The integral is approximated using Gaussian quadrature. Let \(\tilde{u}_i = \hat{u}_i + s z_k\), where \(z_k\) are quadrature points and \(s\) is a scaling factor based on the curvature (Hessian) at \(\hat{u}_i\). The integral becomes:
Optimization methods like Newton-Raphson or quasi-Newton are used.
Random Effects Posterior Distribution
After fitting the model, the posterior distribution of the random effects \(u_i\) can be estimated:
\[ f(u_i \| y_i) \propto f(y_i \| u_i) f(u_i). \] Predictions for random effects are often based on:
Maximum a posteriori (MAP) estimates (the mode of the posterior).
Empirical Bayes estimates (conditional means).
Prediction
Predictions can be made for:
Marginal Means (without random effects): \(\hat{y}_{ij} = g^{-1}(X_{ij}\hat{\beta})\).
Subject-Specific Means (with random effects): \(\hat{y}_{ij} = g^{-1}(X_{ij}\hat{\beta} + Z_{ij}\hat{u}_i)\)
Fit Generalized Linear Mixed Models using Adaptive Gaussian Quadrature in R
In this tutorial, we will demonstrate how to fit Generalized Linear Mixed Models (GLMMs) using Adaptive Gaussian Quadrature in R with the {GLMMadaptive} package. Basic features of the package include:
The package contains a single model-fitting function named mixed_model() with four required arguments, fixed a formula for the fixed effects, random a formula for the random effects, family a family object specifying the type of response variable, and data a data frame containing the variables in the previously mentioned formulas.
Gamma mixed effects models using the Gamma() or Gamma.fam() family object.
Linear mixed effects models with right and left censored data using the censored.normal() family object.
Users may also specify their own log-density function for the repeated measurements response variable, and the internal algorithms will take care of the optimization.
Calculates the marginalized coefficients using the idea of Hedeker et al. (2017) using function marginal_coefs().
Predictions with confidence interval for constructing effects plots are provided by function effectPlotData().
The basic model-fitting function in {GLMMadaptive} is called mixed_model(), and has four required arguments, namely fixed a formula for the fixed effects, random a formula for the random effects, family a family object specifying the type of response variable, and data a data frame containing the variables in the previously mentioned formulas. nAGQ is an optional argument that specifies the number of quadrature points to use in the adaptive Gaussian quadrature approximation. The default is 11 when the number of random effects is one or two, and 7 otherwise. We will walk through the process of specifying the model, fitting the model, and interpreting the results.
Install Required R Packages
Following R packages are required to run this notebook. If any of these packages are not installed, you can install them using the code below:
We will use the Contraception dataset from the mlmRev package, which come from the 1988 Bangladesh Fertility Survey and available for download from here.
Code
# load the data as tibblemf.binomial<-as_tibble(Contraception, package ="mlmRev")str(mf.binomial)
We fit a mixed effects logistic regression for use and age, urban, livch are fixed effect and district is random effect (random intercepts) random-effects part.
Code
# covert the livch variable to factormf.binomial$livch<-as.factor(mf.binomial$livch)# fit the mixed effects logistic regression modelmodel.logit_q11 <-mixed_model(fixed = use ~ age+urban+livch, random =~1|district, family =binomial(), control =glmerControl(optimizer ="nAGQ = 11"),data = mf.binomial)
Warning in mixed_model(fixed = use ~ age + urban + livch, random = ~1 | :
unknown names in control: restart_edge, boundary.tol, calc.derivs,
use.last.params, checkControl, checkConv, optCtrl, tolPwrss, compDev,
nAGQ0initStep
Code
summary(model.logit_q11)
Call:
mixed_model(fixed = use ~ age + urban + livch, random = ~1 |
district, data = mf.binomial, family = binomial(), control = glmerControl(optimizer = "nAGQ = 11"))
Data Descriptives:
Number of Observations: 1934
Number of Groups: 60
Model:
family: binomial
link: logit
Fit statistics:
log.Lik AIC BIC
-1206.674 2427.348 2442.009
Random effects covariance matrix:
StdDev
(Intercept) 0.4645476
Fixed effects:
Estimate Std.Err z-value p-value
(Intercept) -1.6904 0.1478 -11.4406 < 1e-04
age -0.0266 0.0079 -3.3727 0.00074429
urbanY 0.7324 0.1195 6.1290 < 1e-04
livch1 1.1094 0.1580 7.0206 < 1e-04
livch2 1.3766 0.1748 7.8750 < 1e-04
livch3+ 1.3457 0.1796 7.4923 < 1e-04
Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11
Optimization:
method: EM
converged: TRUE
We continue by checking the impact of the chosen number of quadrature points to the parameters estimates and the log-likelihood value at convergence. First, we refit the model with an increasing number of quadrature points (nAGQ). We fit then with 15, and 21 points:
We now extract from the model the estimated parameter for the fixed effects (using function fixef()), for the random effects, and the log-likelihood (using function logLik()):
We observe a rather stable model with virtually no differences between the different choices of quadrature points.
Mixed Effects Poisson Regression
In this section we use the Salamanders data set from the {glmmTMB} package. The data set containing counts of salamanders with site covariates and sampling covariates. Each of 23 sites was sampled 4 times. (Price et al. (2016); Price et al. 2015).
Code
# load the data as tibblemf.poisson<-as_tibble(Salamanders, package ="glmmTMB")str(mf.poisson)
We fit a mixed effects Poisson regression for count and mined, spp, cover are fixed effect and site is random effect (random intercepts) random-effects part.
Code
# fit the mixed effects Poisson regression modelmodel.poisson_01 <-mixed_model(fixed = count ~ mined + spp + cover, random =~1|site, family =poisson(), data = mf.poisson)summary(model.poisson_01)
We extend model by also including a random slopes term; however, we assume that the covariance between the random intercepts and random slopes is zero. This is achieved by using the ||symbol in the specification of the random argument, i.e.,
Code
# fit the mixed effects Poisson regression modelmodel.poisson_02 <-mixed_model(fixed = count ~ mined + spp + cover, random =~ mined||site, family =poisson(), data = mf.poisson)summary(model.poisson_02)
The results now suggest that indeed the covariance between the two random effects terms is not statistically different from zero.
{GLMMapative} package also allows to place a penalty in the fixed effects regression coefficients \(\beta\). The penalty/prior is in the form of a Student’s t distribution with mean 0, scale parameter 1, and 3 degrees of freedom, and it is placed in all \(\beta\) coefficients except from the intercept. The penalized model can be fitted by setting argument penalized to TRUE, i.e.,
Code
# fit the mixed effects Poisson regression model with penalized fixed effectsmodel.poisson_03 <-mixed_model(fixed = count ~ mined + spp + cover, random =~1|site, family =poisson(), penalized =TRUE, data = mf.poisson)summary(model.poisson_03)
In this example we observe small differences between the penalized and unpenalized models. The users have the option to alter the specification of the Student’s t penalty by directly specifying the mean, scale and degrees of freedom arguments of the distribution. For example, a ridge penalty could be placed by setting the degrees of freedom to a high value. The call in this case should be:
Code
# fit the mixed effects Poisson regression model with penalized fixed effectsmodel.poisson_04 <-mixed_model(fixed = count ~ mined + spp + cover, random =~1|site, family =poisson(), penalized =list(pen_mu =0, pen_sigma =1, pen_df =200), data = mf.poisson)summary(model.poisson_04)
Negative binomial mixed models can be fitted using the negative.binomial() family object.
Code
# fit the mixed effects Poisson regression modelmodel.nb <-mixed_model(fixed = count ~ mined + spp + cover, random =~1|site, family =negative.binomial(), data = mf.poisson)summary(model.nb)
This tutorial provided an overview of Generalized Linear Mixed Models (GLMMs) using Adaptive Gaussian Quadrature (AGQ) with the {GLMMadaptive} package in R. We discussed the mathematical framework behind GLMMs, the likelihood function, numerical integration using AGQ, parameter estimation, and prediction. We demonstrated how to fit GLMMs using the {GLMMadaptive} package with examples of binary, Poisson, and negative binomial response data. The package provides a flexible and efficient tool for fitting GLMMs with various response distributions, complex random effects structures, and custom link functions. The package supports multiple random effects terms, custom link functions, and robust estimation methods. The {GLMMadaptive} package is a valuable resource for researchers analyzing complex data structures with non-normal outcomes.