1. Random Intercept Model

Mixed-effects models, particularly Random Intercept Models, are vital for analyzing hierarchical data, like students in classes. This tutorial covers 1. Understanding Random Intercept Models: Definitions, uses, and contexts. 2. Model Construction: Building from basic principles. 3. Fitting in R: Using {lme4} and {nlme} packages. 4. Interpreting Results: Analyzing outputs and fit metrics. By the end, you’ll confidently apply and interpret Random Intercept Models.

Overview

random intercept model is a type of mixed-effects model where the intercept (baseline level) of the outcome variable is allowed to vary across groups, while the slope (relationship between the predictors and outcome) remains constant across groups.

For an observation $y_{ij}$, where $i$ indexes the group (e.g., schools) and $j$ indexes the individual (e.g., students within a school):

\[ y_{ij} = \beta_0 + \beta_1 x_{ij} + u_i + \epsilon_{ij} \]

Where:

$y_{ij}$: Outcome for the $j$-th individual in the $i$-th group.
$\beta_0$: Fixed intercept (average baseline outcome across all groups).
$\beta_1$: Fixed slope (effect of predictor $x$ on $y$).
$x_{ij}$: Predictor variable for the $j$-th individual in the $i$-th group.
$u_i$: Random intercept, representing the deviation of the $i$-th group’s intercept from the overall average $\beta_0$. Assumed to follow $u_i \sim N(0, \sigma_u^2)$.
$\epsilon_{ij}$: Residual error for the $j$-th individual in the $i$-th group. Assumed to follow $\epsilon_{ij} \sim N(0, \sigma_\epsilon^2)$.

Key Features

Intercept Varies by Group:
- Each group $i$ has its own intercept:
  
  \[ \beta_0^i = \beta_0 + u_i \]
$u_i$ is the random effect for group $i$, capturing group-specific deviations.
Slope Remains Constant:
- The relationship between $x_{ij}$ and $y_{ij}$(slope $\beta_1$) is the same across all groups.
Variance Decomposition:
- Total variance is partitioned into:
  - Between-group variance: Variance of the random intercepts ($ _u^2$).
  - Within-group variance: Variance of residuals ($\sigma_\epsilon^2$).

Example

We want to model students’ test scores $y$ as a function of their study hours $x$, accounting for differences between schools.

\[ y_{ij} = \beta_0 + \beta_1 x_{ij} + u_i + \epsilon_{ij} \]

$\beta_0$: Average baseline test score across all schools.
$u_i$: Random intercept capturing school-specific effects.
$epsilon_{ij}$: Student-level random noise.

Estimation

Fixed Effects:
- $\beta_0$: Average baseline score for all schools.
- $\beta_1$: Average effect of study hours on test scores (slope).
Random Effects:
- $u_i$: Captures the deviation of each school’s baseline score from the overall average $\beta_0$.
- For example:
  - School A might have a $u_i > 0$ (higher baseline scores).
  - School B might have a $u_i < 0$ (lower baseline scores).

Intraclass Correlation Coefficient (ICC)

The ICC measures the proportion of variance explained by the grouping structure (e.g., schools):

\[ \text{ICC} = \frac{\sigma_u^2}{\sigma_u^2 + \sigma_\epsilon^2} \]

High ICC ($\text{ICC}\approx 1$): Large between-group variability (schools strongly affect scores).
Low ICC ($\text{ICC}\approx 0$): Most variability is at the individual level.

Random Intercept Model from Scratch

To fit a Random Intercept Model in R from scratch without relying on any external R packages, you need to manually calculate the model parameters using Maximum Likelihood Estimation (MLE) or some iterative process like Expectation-Maximization (EM). Here’s how you could do it:

Simulate the data:

First we create dataset with variables score, hours, and school. score is the outcome variable, hours is the predictor, and school is the grouping variable. We also generate random effects for each school and fixed effects for the intercept and hours.

Code

# Step 1: Simulate Data
set.seed(42)
n_schools <- 10    # Number of schools
n_students <- 50   # Students per school

# Random effects
schools <- rep(1:n_schools, each = n_students)
u <- rnorm(n_schools, mean = 0, sd = 5)  # Random intercept for schools

# Fixed effects
beta_0 <- 50     # Intercept
beta_1 <- 2      # Coefficient for hours
hours <- rnorm(n_schools * n_students, mean = 5, sd = 2)

# Residual error
epsilon <- rnorm(n_schools * n_students, mean = 0, sd = 3)

# Outcome variable
score <- beta_0 + beta_1 * hours + u[schools] + epsilon

Define the REML Log-Likelihood Function

The log-likelihood function calculates the negative log-likelihood of the model parameters given the data. Here, we use a block-diagonal approach to handle the school-specific random effects efficiently. Cholesky Decomposition is used to compute the log-determinant of the block covariance matrices to avoids directly computing $det(V)$, which can be numerically unstable.

The total variance-covariance matrix $V$ is block-diagonal because observations are grouped by schools. Each block $V_k$ for school $k$ has the form:

\[ V_k = \sigma_u^2 J_k + \sigma_e^2 I_k \]

where:

$J_k$: matrix of ones ($n_k \times n_k$),
$I_k$: identity matrix ($n_k \times n_k$),
$\sigma_u^2$: variance of random intercepts,
$\sigma_e^2$: residual variance.

For $n_k$ students in school $k$, $V_k$ is a $n_k \times n_k$ matrix that captures the correlation structure due to the random intercept.

The block-diagonal structure of $V$ ensures that:
- Matrix inversion and determinant computation are done for smaller blocks ($V_k$) instead of the full $V$, reducing computational complexity.
- For $n_schools$ blocks and average block size $n_k$, complexity scales as $O(n_schools \cdot n_k^3)$ instead of $O(n^3)$ for the full matrix.

The log-likelihood for the multivariate normal distribution of residuals is:

\[ \ell = -\frac{1}{2} \left[ \log(\det(V)) + \mathbf{r}^\top V^{-1} \mathbf{r} \right] \]

where:

$\mathbf{r} = \mathbf{y} - \mathbf{X} \boldsymbol{\beta}$ is the vector of residuals,
$\boldsymbol{\beta} = [\beta_0, \beta_1]^\top$,
$\mathbf{X}$: design matrix ($n \times 2$) with columns of ones and $x_{ij}$,
( V ): block-diagonal variance-covariance matrix.

Since $V$ is block-diagonal, we can compute $\log(\det(V))$ and $\mathbf{r}^\top V^{-1} \mathbf{r}$ efficiently for each block $V_k$ independently.

Following are the steps to implement the log-likelihood function:

Code

# Step 2: Define the REML log-likelihood function (block-diagonal version)
reml_loglik <- function(params) {
  # Ensure positivity with exponential transform
  sigma_u <- exp(params[1])  # Random intercept variance
  sigma_e <- exp(params[2])  # Residual variance
  beta_0 <- params[3]        # Fixed intercept
  beta_1 <- params[4]        # Fixed slope

  # Design matrix and fixed effects
  X <- cbind(1, hours)
  beta <- c(beta_0, beta_1)
  residuals <- score - X %*% beta

  # Initialize log-likelihood components
  log_det_sum <- 0
  quad_form_sum <- 0

  # Loop through schools to process blocks
  for (k in 1:n_schools) {
    # Get data for school k
    idx <- (1:n_students) + (k - 1) * n_students
    n_k <- length(idx)
    residuals_k <- residuals[idx]
    
    # Block covariance matrix for school k
    J_k <- matrix(1, n_k, n_k)  # All-ones matrix
    I_k <- diag(n_k)            # Identity matrix
    V_k <- sigma_u^2 * J_k + sigma_e^2 * I_k

    # Compute log-determinant using Cholesky
    chol_decomp <- tryCatch(chol(V_k), error = function(e) NULL)
    if (is.null(chol_decomp)) {
      cat("Cholesky decomposition failed for school", k, "sigma_u =", sigma_u, "sigma_e =", sigma_e, "\n")
      return(Inf)
    }
    log_det_V_k <- 2 * sum(log(diag(chol_decomp)))

    # Compute quadratic form
    quad_form_k <- t(residuals_k) %*% solve(V_k, residuals_k)

    # Accumulate results
    log_det_sum <- log_det_sum + log_det_V_k
    quad_form_sum <- quad_form_sum + quad_form_k
  }

  # Log-likelihood
  ll <- -0.5 * (log_det_sum + quad_form_sum)

  # Penalize non-finite likelihood
  if (!is.finite(ll)) {
    cat("Non-finite likelihood: sigma_u =", sigma_u, "sigma_e =", sigma_e, "\n")
    return(Inf)
  }
  
  return(as.numeric(-ll)) # Minimization
}

Optimization with Bounds

Optimization is performed using the optim() function with bounds on the parameters to ensure positivity. We log-transform the variance components to ensure they remain positive during optimization. The method L-BFGS-B is used for optimization. The initial guesses and bounds are set for the parameters.

Code

# Step 3: Optimization with bounds
initial_params <- c(log(1), log(1), 50, 2) # Log-transformed initial guesses
lower_bounds <- c(log(1e-6), log(1e-6), -Inf, -Inf) # Avoid zero variance
upper_bounds <- c(log(1e3), log(1e3), Inf, Inf)

result <- optim(initial_params, reml_loglik, method = "L-BFGS-B", 
                lower = lower_bounds, upper = upper_bounds, control = list(maxit = 1000))

# Transform back to original scale
sigma_u_est <- exp(result$par[1])
sigma_e_est <- exp(result$par[2])
beta_0_est <- result$par[3]
beta_1_est <- result$par[4]

Summary of Model Parameters

Finally, we summarize the fixed effects and variance components estimated by the model.

Code

# Step 4: Model Summary
cat("Random Intercept Model Summary:\n")

Random Intercept Model Summary:

Code

cat("Fixed Effects:\n")

Fixed Effects:

Code

cat(sprintf("Intercept: %.3f\n", beta_0_est))

Intercept: 52.551

Code

cat(sprintf("Slope (hours): %.3f\n", beta_1_est))

Slope (hours): 2.026

Code

cat("\nVariance Components:\n")


Variance Components:

Code

cat(sprintf("Random intercept variance (sigma_u^2): %.3f\n",  sigma_u_est^2))

Random intercept variance (sigma_u^2): 15.521

Code

cat(sprintf("Residual variance (sigma_e^2): %.3f\n", sigma_e_est^2))

Residual variance (sigma_e^2): 9.746

Random Intercept Model in R

We will fit the Random Intercept Model using the {lme4} and {nlme} packages in R. These packages provide convenient functions to fit mixed-effects models with random intercepts and slopes.

Install Required R Packages

Following R packages are required to run this notebook. If any of these packages are not installed, you can install them using the code below:

Code

packages <- c('tidyverse',
              'ggeffects',
              'lme4',
              'nlme',
              'lmerTest',
              'sjPlot',
              'margins',
              'report',
              'performance'
         )

#| warning: false
#| error: false

# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Verify installation
cat("Installed packages:\n")
print(sapply(packages, requireNamespace, quietly = TRUE))

Load Packages

Code

# Load packages with suppressed messages
invisible(lapply(packages, function(pkg) {
  suppressPackageStartupMessages(library(pkg, character.only = TRUE))
}))

Code

# Check loaded packages
cat("Successfully loaded packages:\n")

Successfully loaded packages:

Code

print(search()[grepl("package:", search())])

 [1] "package:performance" "package:report"      "package:margins"    
 [4] "package:sjPlot"      "package:lmerTest"    "package:nlme"       
 [7] "package:lme4"        "package:Matrix"      "package:ggeffects"  
[10] "package:lubridate"   "package:forcats"     "package:stringr"    
[13] "package:dplyr"       "package:purrr"       "package:readr"      
[16] "package:tidyr"       "package:tibble"      "package:ggplot2"    
[19] "package:tidyverse"   "package:stats"       "package:graphics"   
[22] "package:grDevices"   "package:utils"       "package:datasets"   
[25] "package:methods"     "package:base"

Data

In this exercise, we will use the Wheat yield dataset that has been used in our publication entitled Untangling crop management and environmental influences on wheat yield variability in Bangladesh: An application of non-parametric approaches. This paper assessed three non-parametric methods, including linear mixed effect models and two recursive partitioning approaches, for their effectiveness in explaining wheat yield variability in 422 wheat ﬁelds in six distinct production environments in southern Bangladesh. Full data set is available for download can download from my Dropbox or from my Github accounts.

The dataset contains several variables, but we will following variables for analysis:

ENV: Growing environment represents six Upazila in the Southern Bangladesh (1=Barisal, 2=Bhola, 3=Jhalokathi, 4=Patuakhali, 5=Pirojpur, 6=Barguna)

SOWING: Sowing date of the wheat crop (1=Early, 2=Late)

DAYS: Wheat sowing Before/after December 15

N_RATE: Nitrogen rate (kg/ha)

GEN: Three wheat varieties (BG 25 =BARI Gom 25, BG 26 =BARI Gom 26, BG 27 =BARI Gom 27)

Maturity_days: Maturity days of the wheat crop

Weeding: Weeding frequency (1=1 time, 2=2 times, 3=3 times)

Irrigation: Irrigation frequency (1=1 time, 2=2 times, 3=3 times)

GY: Grain yield (t/ha) of Wheat

We will use read_csv() function of {readr} package to import data as a tidy data.

Code

# Load the data
mf<-readr::read_csv("https://github.com/zia207/r-colab/raw/main/Data/Regression_analysis/wheat_data_bd.csv")|> 
 glimpse()

Rows: 422 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): SEASON, ENV, SOWING, GEN, DATE_SOW, Optimum_date, STY, Weeding_No, ...
dbl (7): N_RATE, DAYS, GY, Maturity_days, Weeding, Irrigation, FARMER

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Rows: 422
Columns: 16
$ SEASON        <chr> "Rabi 12-13", "Rabi 12-13", "Rabi 12-13", "Rabi 12-13", …
$ ENV           <chr> "KALAPARA", "KALAPARA", "KALAPARA", "KALAPARA", "KALAPAR…
$ SOWING        <chr> "LATE", "LATE", "LATE", "LATE", "LATE", "LATE", "LATE", …
$ GEN           <chr> "BG 27", "BG 25", "BG 21", "BG 27", "BG 25", "BG 21", "B…
$ N_RATE        <dbl> 0, 0, 0, 67, 67, 67, 100, 100, 100, 0, 0, 67, 67, 67, 10…
$ DATE_SOW      <chr> "1/5/2013", "1/5/2013", "1/5/2013", "1/5/2013", "1/5/201…
$ Optimum_date  <chr> "12/15/2012", "12/15/2012", "12/15/2012", "12/15/2012", …
$ DAYS          <dbl> 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, …
$ GY            <dbl> 0.319, 0.622, 0.737, 0.732, 1.095, 1.348, 1.365, 0.407, …
$ STY           <chr> "0.5699", "1.3067", "0.987", "1.2628", "0.251883333", "2…
$ Maturity_days <dbl> 103, 101, 102, 104, 105, 105, 104, 104, 105, 92, 92, 92,…
$ Weeding       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Weeding_No    <chr> "1 weeding", "1 weeding", "1 weeding", "1 weeding", "1 w…
$ Irrigation    <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,…
$ Irrigation_No <chr> "2 irrigation", "2 irrigation", "2 irrigation", "2 irrig…
$ FARMER        <dbl> 95, 95, 95, 95, 95, 95, 95, 95, 95, 94, 94, 94, 94, 94, …

Data Processing

Code

df <- mf |> 
dplyr::select(FARMER, ENV, SOWING, GEN, N_RATE, DAYS, Maturity_days, Weeding, Irrigation, GY )
head(df)

# A tibble: 6 × 10
  FARMER ENV    SOWING GEN   N_RATE  DAYS Maturity_days Weeding Irrigation    GY
   <dbl> <chr>  <chr>  <chr>  <dbl> <dbl>         <dbl>   <dbl>      <dbl> <dbl>
1     95 KALAP… LATE   BG 27      0    21           103       1          2 0.319
2     95 KALAP… LATE   BG 25      0    21           101       1          2 0.622
3     95 KALAP… LATE   BG 21      0    21           102       1          2 0.737
4     95 KALAP… LATE   BG 27     67    21           104       1          2 0.732
5     95 KALAP… LATE   BG 25     67    21           105       1          2 1.10 
6     95 KALAP… LATE   BG 21     67    21           105       1          2 1.35

Convert to Factor

Code

# Convert to factor
df$FARMER<-as.factor(df$FARMER)
df$N_RATE=as.factor(df$N_RATE)
df$ENV=as.factor(df$ENV)
df$SOWING=as.factor(df$SOWING)
df$GEN=as.factor(df$GEN)
df$Weeding=as.factor(df$Weeding)
df$Irrigation=as.factor(df$Irrigation)

Data Exploration

Code

cols <- c("0" = "red","67" = "yellow","100" = "green", "133" = "blue")
shap = c(15,17)

# ENV Mean 

p1= ggplot(df, aes(y=GY, x=ENV)) +
         geom_point(aes(colour=N_RATE, shape=SOWING),size = I(3.5),
         position=position_jitter(width=0.05, height=0.05)) +
         geom_boxplot(fill=NA, outlier.colour=NA) +
         labs(title="Growing Environment")+
         xlab("") + ylab("Wheat yield (t/ha)") +
         theme_bw() +
            theme(axis.line = element_line(colour = "black"),
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            axis.text.y=element_text(size=20),
            axis.text.x = element_text(size=10)) +
     scale_colour_manual(values = cols) +
     scale_shape_manual(values = shap) 

p1

Code

# GEN Mean

p2= ggplot(df, aes(y=GY, x=GEN)) +
         geom_point(aes(colour=N_RATE, shape=SOWING),size = I(3.5),
         position=position_jitter(width=0.05, height=0.05)) +
         geom_boxplot(fill=NA, outlier.colour=NA) +
        labs(title="Genotype")+
         xlab("") + ylab("Wheat yield (t/ha)") +
         theme_bw() +
            theme(axis.line = element_line(colour = "black"),
            panel.grid.major = element_blank(),
            panel.grid.minor = element_blank(),
            panel.background = element_blank(),
            axis.text.y=element_text(size=20),
            axis.text.x = element_text(size=12))+
     scale_colour_manual(values = cols) +
     scale_shape_manual(values = shap) 

p2

Model Fitting

To follow the hierarchical modeling process, we’ll fit the models step-by-step using the dataset. The steps will include:

Intercept-only model: this includes only the random intercept for the grouping variable (ENV).
Mixed effect model: adds all the fixed effects: GEN, DAYS, N_RATE , Maturity, Weeingand Irrigation.

Using `lme4` Package

Intercept-Only Model

This model includes only the random intercept for the grouping variable FARMER. The syntax (1 | FARMER) specifies that we are modeling the random intercept for the grouping variable FARMER. We use the lmer() function from the {lme4} package to fit the model via REML or maximum likelihood estimation.

Code

# Intercept-only model
m_01<- lme4::lmer(GY ~ 1 + (1 | FARMER), data = df)
# Summary of the model
summary(m_01)

Linear mixed model fit by REML ['lmerMod']
Formula: GY ~ 1 + (1 | FARMER)
   Data: df

REML criterion at convergence: 1402.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.1431 -0.8755  0.2872  0.7592  1.6306 

Random effects:
 Groups   Name        Variance Std.Dev.
 FARMER   (Intercept) 0.658    0.8112  
 Residual             1.339    1.1570  
Number of obs: 422, groups:  FARMER, 48

Fixed effects:
            Estimate Std. Error t value
(Intercept)    2.632      0.130   20.25

In lmer models (from lme4), the t-statistics and p-values are not directly provided. Instead, you can compute p-values using either Satterthwaite’s approximation, Kenward-Roger approximation, or parametric bootstrapping.

lmerTest::lmer() provides p-values for fixed effects using Satterthwaite’s approximation for degrees of freedom.

Code

# Intercept-only model
m_01<- lmerTest::lmer(GY ~ 1 + (1|FARMER), data = df)
# Summary of the model
jtools::summ(m_01)

Observations	422
Dependent variable	GY
Type	Mixed effects linear regression

AIC	1408.10
BIC	1420.23
Pseudo-R² (fixed effects)	0.00
Pseudo-R² (total)	0.33

Fixed Effects
	Est.	S.E.	t val.	d.f.	p
(Intercept)	2.63	0.13	20.25	47.03	0.00
p values calculated using Satterthwaite d.f.

Random Effects
Group	Parameter	Std. Dev.
FARMER	(Intercept)	0.81
Residual		1.16

Grouping Variables
Group	# groups	ICC
FARMER	48	0.33

Mixed-Effects Model

Mixed-effects model includes all the fixed effects: GEN*ENV, DAYS, N_RATE , Maturity_days + Weeding, Irrigation and the random intercept for the grouping variable FARMER. The syntax (1 | FARMER) specifies that we are modeling the random intercept for the grouping variable ENV.

Code

# Mixed effects model
m_02<- lmerTest::lmer(GY ~ GEN + ENV + DAYS+ N_RATE + Maturity_days + Weeding + Irrigation + (1 | FARMER), data = df)
# Summary of the model
jtools::summ(m_02)

Observations	422
Dependent variable	GY
Type	Mixed effects linear regression

AIC	731.57
BIC	808.43
Pseudo-R² (fixed effects)	0.82
Pseudo-R² (total)	0.89

Fixed Effects
	Est.	S.E.	t val.	d.f.	p
(Intercept)	3.22	1.98	1.63	206.56	0.11
GENBG 25	0.03	0.06	0.55	402.97	0.58
GENBG 27	0.01	0.06	0.10	372.85	0.92
ENVFULTALA	0.87	0.22	4.01	41.08	0.00
ENVKALAPARA	-0.18	0.31	-0.60	56.13	0.55
ENVKALIGANJ	0.56	0.22	2.52	41.74	0.02
ENVSADAR	0.75	0.22	3.42	43.25	0.00
ENVUJIRPUR	0.51	0.23	2.20	50.25	0.03
DAYS	-0.05	0.01	-4.16	138.86	0.00
N_RATE67	1.66	0.10	16.59	403.97	0.00
N_RATE100	1.94	0.09	22.75	404.18	0.00
N_RATE133	2.42	0.10	25.05	400.41	0.00
Maturity_days	-0.02	0.02	-1.21	209.21	0.23
Weeding1	-0.07	0.11	-0.64	377.36	0.52
Weeding2	-0.70	0.40	-1.77	58.40	0.08
Irrigation2	0.38	0.11	3.60	305.77	0.00
Irrigation3	-0.11	0.16	-0.70	398.51	0.49
p values calculated using Satterthwaite d.f.

Random Effects
Group	Parameter	Std. Dev.
FARMER	(Intercept)	0.39
Residual		0.47

Grouping Variables
Group	# groups	ICC
FARMER	48	0.41

Mixed-Effects Model with Cross Level Interaction

We can add a cross-level interaction between GEN and ENV to the model to account for the interaction effect between the genotype and the growing environment.

Code

# Mixed effects model
m_03<- lmerTest::lmer(GY ~ GEN + ENV + DAYS+ N_RATE + Maturity_days + Weeding + Irrigation + GEN:ENV+(1 | FARMER), data = df)
# Summary of the model
jtools::summ(m_03)

Observations	422
Dependent variable	GY
Type	Mixed effects linear regression

AIC	767.72
BIC	885.02
Pseudo-R² (fixed effects)	0.81
Pseudo-R² (total)	0.89

Fixed Effects
	Est.	S.E.	t val.	d.f.	p
(Intercept)	4.39	2.17	2.03	161.54	0.04
GENBG 25	-0.07	0.14	-0.45	381.68	0.65
GENBG 27	0.01	0.14	0.08	361.06	0.94
ENVFULTALA	0.80	0.24	3.26	65.18	0.00
ENVKALAPARA	-0.13	0.33	-0.38	72.30	0.70
ENVKALIGANJ	0.56	0.25	2.24	65.46	0.03
ENVSADAR	0.74	0.25	3.00	68.03	0.00
ENVUJIRPUR	0.49	0.26	1.90	74.96	0.06
DAYS	-0.05	0.01	-4.34	120.26	0.00
N_RATE67	1.69	0.10	16.39	394.77	0.00
N_RATE100	1.96	0.09	22.35	394.39	0.00
N_RATE133	2.45	0.10	24.60	394.38	0.00
Maturity_days	-0.04	0.02	-1.63	162.78	0.10
Weeding1	-0.08	0.11	-0.72	367.15	0.47
Weeding2	-0.81	0.41	-2.00	58.52	0.05
Irrigation2	0.38	0.11	3.55	298.80	0.00
Irrigation3	-0.11	0.16	-0.69	388.91	0.49
GENBG 25:ENVFULTALA	0.19	0.20	0.96	357.63	0.34
GENBG 27:ENVFULTALA	0.05	0.19	0.25	355.32	0.80
GENBG 25:ENVKALAPARA	0.13	0.21	0.63	363.57	0.53
GENBG 27:ENVKALAPARA	-0.05	0.20	-0.23	359.06	0.82
GENBG 25:ENVKALIGANJ	0.01	0.19	0.04	356.50	0.97
GENBG 27:ENVKALIGANJ	0.01	0.19	0.06	355.17	0.95
GENBG 25:ENVSADAR	0.04	0.19	0.20	355.25	0.84
GENBG 27:ENVSADAR	0.03	0.19	0.17	355.15	0.87
GENBG 25:ENVUJIRPUR	0.15	0.20	0.77	362.59	0.44
GENBG 27:ENVUJIRPUR	-0.05	0.20	-0.27	360.79	0.79
p values calculated using Satterthwaite d.f.

Random Effects
Group	Parameter	Std. Dev.
FARMER	(Intercept)	0.39
Residual		0.48

Grouping Variables
Group	# groups	ICC
FARMER	48	0.40

Crossed Random Effects Model

A cross-random effects model is a type of mixed-effects model designed to handle situations where observations are influenced by two or more unrelated (crossed) grouping factors. Unlike hierarchical or nested data (e.g., students nested within schools), crossed effects occur when groupings overlap or intersect without a clear nesting structure. Component groupings are independent and not hierarchically related.

Random Effects: Represents variations at different levels of the data hierarchy. In a crossed random effects model, there are multiple random effects that interact with each other.

Crossed Structure: Unlike a hierarchical or nested structure where one random effect is nested within another, in a crossed structure, the random effects are not nested and can appear with multiple levels of other random effects.

Here below is the example for crossed random effects model where we comsdered FARMER and N_RATE as crossed random effects.

Code

# Crossed random effects model
m_04<- lmerTest::lmer(GY ~ GEN+ENV+DAYS+  Maturity_days + Weeding + Irrigation + GEN:ENV+(1|FARMER) + (1|N_RATE), data = df)
# Summary of the model
jtools::summ(m_04)

Observations	422
Dependent variable	GY
Type	Mixed effects linear regression

AIC	773.98
BIC	883.20
Pseudo-R² (fixed effects)	0.26
Pseudo-R² (total)	0.89

Fixed Effects
	Est.	S.E.	t val.	d.f.	p
(Intercept)	5.79	2.26	2.56	152.81	0.01
GENBG 25	-0.06	0.14	-0.44	381.77	0.66
GENBG 27	0.01	0.14	0.07	361.15	0.94
ENVFULTALA	0.80	0.24	3.26	65.14	0.00
ENVKALAPARA	-0.14	0.33	-0.42	72.26	0.68
ENVKALIGANJ	0.55	0.25	2.23	65.41	0.03
ENVSADAR	0.74	0.25	2.98	67.98	0.00
ENVUJIRPUR	0.49	0.26	1.89	74.91	0.06
DAYS	-0.05	0.01	-4.30	120.49	0.00
Maturity_days	-0.03	0.02	-1.58	163.31	0.12
Weeding1	-0.07	0.11	-0.68	367.63	0.50
Weeding2	-0.80	0.41	-1.96	58.51	0.05
Irrigation2	0.38	0.11	3.58	299.09	0.00
Irrigation3	-0.10	0.16	-0.66	389.25	0.51
GENBG 25:ENVFULTALA	0.19	0.20	0.95	357.72	0.34
GENBG 27:ENVFULTALA	0.05	0.19	0.25	355.41	0.80
GENBG 25:ENVKALAPARA	0.13	0.21	0.62	363.67	0.54
GENBG 27:ENVKALAPARA	-0.04	0.20	-0.22	359.15	0.83
GENBG 25:ENVKALIGANJ	0.01	0.19	0.04	356.60	0.97
GENBG 27:ENVKALIGANJ	0.01	0.19	0.06	355.26	0.95
GENBG 25:ENVSADAR	0.04	0.19	0.21	355.35	0.84
GENBG 27:ENVSADAR	0.03	0.19	0.17	355.24	0.87
GENBG 25:ENVUJIRPUR	0.15	0.20	0.76	362.68	0.45
GENBG 27:ENVUJIRPUR	-0.05	0.20	-0.26	360.88	0.79
p values calculated using Satterthwaite d.f.

Random Effects
Group	Parameter	Std. Dev.
FARMER	(Intercept)	0.39
N_RATE	(Intercept)	1.06
Residual		0.48

Grouping Variables
Group	# groups	ICC
FARMER	48	0.10
N_RATE	4	0.75

Nested Random Effects Model

A nested random effects model is a type of mixed-effects model designed to handle situations where observations are influenced by two or more nested grouping factors. In nested structures, one grouping factor (e.g., schools) contains the other (e.g., students), creating a hierarchical relationship.

Code

# Nested  random effects model
m_05<- lmerTest::lmer(GY ~ GEN+ENV+DAYS+  Maturity_days + N_RATE +Weeding + Irrigation + GEN:ENV+ (1|ENV/FARMER), data = df)

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 1.49338 (tol = 0.002, component 1)

Code

# Summary of the model
jtools::summ(m_05)

Observations	422
Dependent variable	GY
Type	Mixed effects linear regression

AIC	770.65
BIC	892.00
Pseudo-R² (fixed effects)	0.77
Pseudo-R² (total)	0.90

Fixed Effects
	Est.	S.E.	t val.	d.f.	p
(Intercept)	4.45	2.27	1.96	0.00	1.00
GENBG 25	-0.07	0.14	-0.46	386.18	0.64
GENBG 27	0.01	0.14	0.08	365.50	0.93
ENVFULTALA	0.81	0.47	1.73	0.00	1.00
ENVKALAPARA	-0.13	0.53	-0.25	0.00	1.00
ENVKALIGANJ	0.55	0.47	1.18	0.00	1.00
ENVSADAR	0.74	0.47	1.59	0.00	1.00
ENVUJIRPUR	0.49	0.47	1.04	0.00	1.00
DAYS	-0.05	0.01	-4.13	102.59	0.00
Maturity_days	-0.04	0.02	-1.59	156.02	0.11
N_RATE67	1.69	0.10	16.38	393.27	0.00
N_RATE100	1.96	0.09	22.29	392.29	0.00
N_RATE133	2.44	0.10	24.57	394.99	0.00
Weeding1	-0.07	0.11	-0.68	363.79	0.50
Weeding2	-0.80	0.44	-1.82	42.15	0.08
Irrigation2	0.39	0.11	3.56	297.72	0.00
Irrigation3	-0.12	0.16	-0.79	389.07	0.43
GENBG 25:ENVFULTALA	0.19	0.19	0.97	361.83	0.33
GENBG 27:ENVFULTALA	0.05	0.19	0.25	359.34	0.80
GENBG 25:ENVKALAPARA	0.13	0.20	0.64	368.12	0.52
GENBG 27:ENVKALAPARA	-0.05	0.20	-0.23	363.33	0.82
GENBG 25:ENVKALIGANJ	0.01	0.19	0.05	360.62	0.96
GENBG 27:ENVKALIGANJ	0.01	0.19	0.06	359.18	0.95
GENBG 25:ENVSADAR	0.04	0.19	0.21	359.27	0.84
GENBG 27:ENVSADAR	0.03	0.19	0.17	359.16	0.86
GENBG 25:ENVUJIRPUR	0.15	0.20	0.78	367.08	0.44
GENBG 27:ENVUJIRPUR	-0.05	0.19	-0.27	365.22	0.78
p values calculated using Satterthwaite d.f.

Random Effects
Group	Parameter	Std. Dev.
FARMER:ENV	(Intercept)	0.44
ENV	(Intercept)	0.27
Residual		0.47

Grouping Variables
Group	# groups	ICC
FARMER:ENV	48	0.40
ENV	6	0.15

Model Comparison

To assess model improvement, compare models using information criteria such as AIC or likelihood ratio tests.

To compare the two models, we can use the anova() function to perform Likelihood ratio tests

Code

anova(m_01, m_02, m_03, m_04,m_05)

Data: df
Models:
m_01: GY ~ 1 + (1 | FARMER)
m_02: GY ~ GEN + ENV + DAYS + N_RATE + Maturity_days + Weeding + Irrigation + (1 | FARMER)
m_04: GY ~ GEN + ENV + DAYS + Maturity_days + Weeding + Irrigation + GEN:ENV + (1 | FARMER) + (1 | N_RATE)
m_03: GY ~ GEN + ENV + DAYS + N_RATE + Maturity_days + Weeding + Irrigation + GEN:ENV + (1 | FARMER)
m_05: GY ~ GEN + ENV + DAYS + Maturity_days + N_RATE + Weeding + Irrigation + GEN:ENV + (1 | ENV/FARMER)
     npar     AIC     BIC  logLik deviance   Chisq Df Pr(>Chisq)    
m_01    3 1405.84 1417.98 -699.92  1399.84                          
m_02   19  676.79  753.64 -319.39   638.79 761.055 16  < 2.2e-16 ***
m_04   27  715.77  824.99 -330.89   661.77   0.000  8          1    
m_03   29  694.02  811.32 -318.01   636.02  25.755  2  2.555e-06 ***
m_05   30  696.02  817.37 -318.01   636.02   0.000  1          1    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

AIC(m_01, m_02, m_03, m_04, m_05)

     df       AIC
m_01  3 1408.0976
m_02 19  731.5747
m_03 29  767.7177
m_04 27  773.9837
m_05 30  770.6451

Model Interpretaion

We can use {report} package to generate a comprehensive report of the model output.

Code

report::report(m_02)

We fitted a linear mixed model (estimated using REML and nloptwrap optimizer)
to predict GY with GEN, ENV, DAYS, N_RATE, Maturity_days, Weeding and
Irrigation (formula: GY ~ GEN + ENV + DAYS + N_RATE + Maturity_days + Weeding +
Irrigation). The model included FARMER as random effect (formula: ~1 | FARMER).
The model's total explanatory power is substantial (conditional R2 = 0.89) and
the part related to the fixed effects alone (marginal R2) is of 0.82. The
model's intercept, corresponding to GEN = BG 21, ENV = DUMURIA, DAYS = 0,
N_RATE = 0, Maturity_days = 0, Weeding = 0 and Irrigation = 1, is at 3.22 (95%
CI [-0.67, 7.10], t(403) = 1.63, p = 0.104). Within this model:

  - The effect of GEN [BG 25] is statistically non-significant and positive (beta
= 0.03, 95% CI [-0.09, 0.16], t(403) = 0.55, p = 0.583; Std. beta = 0.02, 95%
CI [-0.06, 0.11])
  - The effect of GEN [BG 27] is statistically non-significant and positive (beta
= 5.41e-03, 95% CI [-0.11, 0.12], t(403) = 0.10, p = 0.924; Std. beta =
3.84e-03, 95% CI [-0.08, 0.08])
  - The effect of ENV [FULTALA] is statistically significant and positive (beta =
0.87, 95% CI [0.44, 1.30], t(403) = 4.01, p < .001; Std. beta = 0.62, 95% CI
[0.32, 0.92])
  - The effect of ENV [KALAPARA] is statistically non-significant and negative
(beta = -0.18, 95% CI [-0.79, 0.42], t(403) = -0.60, p = 0.552; Std. beta =
-0.13, 95% CI [-0.56, 0.30])
  - The effect of ENV [KALIGANJ] is statistically significant and positive (beta
= 0.56, 95% CI [0.12, 0.99], t(403) = 2.52, p = 0.012; Std. beta = 0.39, 95% CI
[0.09, 0.70])
  - The effect of ENV [SADAR] is statistically significant and positive (beta =
0.75, 95% CI [0.32, 1.18], t(403) = 3.42, p < .001; Std. beta = 0.53, 95% CI
[0.23, 0.84])
  - The effect of ENV [UJIRPUR] is statistically significant and positive (beta =
0.51, 95% CI [0.05, 0.97], t(403) = 2.20, p = 0.028; Std. beta = 0.36, 95% CI
[0.04, 0.69])
  - The effect of DAYS is statistically significant and negative (beta = -0.05,
95% CI [-0.07, -0.02], t(403) = -4.16, p < .001; Std. beta = -0.47, 95% CI
[-0.70, -0.25])
  - The effect of N RATE [67] is statistically significant and positive (beta =
1.66, 95% CI [1.47, 1.86], t(403) = 16.59, p < .001; Std. beta = 1.18, 95% CI
[1.04, 1.32])
  - The effect of N RATE [100] is statistically significant and positive (beta =
1.94, 95% CI [1.77, 2.10], t(403) = 22.75, p < .001; Std. beta = 1.37, 95% CI
[1.26, 1.49])
  - The effect of N RATE [133] is statistically significant and positive (beta =
2.42, 95% CI [2.23, 2.61], t(403) = 25.05, p < .001; Std. beta = 1.72, 95% CI
[1.58, 1.85])
  - The effect of Maturity days is statistically non-significant and negative
(beta = -0.02, 95% CI [-0.06, 0.01], t(403) = -1.21, p = 0.228; Std. beta =
-0.12, 95% CI [-0.31, 0.07])
  - The effect of Weeding [1] is statistically non-significant and negative (beta
= -0.07, 95% CI [-0.27, 0.14], t(403) = -0.64, p = 0.524; Std. beta = -0.05,
95% CI [-0.19, 0.10])
  - The effect of Weeding [2] is statistically non-significant and negative (beta
= -0.70, 95% CI [-1.48, 0.08], t(403) = -1.77, p = 0.077; Std. beta = -0.50,
95% CI [-1.05, 0.05])
  - The effect of Irrigation [2] is statistically significant and positive (beta
= 0.38, 95% CI [0.17, 0.59], t(403) = 3.60, p < .001; Std. beta = 0.27, 95% CI
[0.12, 0.42])
  - The effect of Irrigation [3] is statistically non-significant and negative
(beta = -0.11, 95% CI [-0.41, 0.20], t(403) = -0.70, p = 0.486; Std. beta =
-0.08, 95% CI [-0.29, 0.14])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.

Model Performance

Model performance can be assessed using metrics such as Marginal $R^2$ and Conditional $R^2$.

Code

performance::model_performance(m_02)

# Indices of model performance

AIC     |    AICc |     BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
-----------------------------------------------------------------------------
731.575 | 733.465 | 808.430 |      0.890 |      0.815 | 0.408 | 0.442 | 0.472

Note

Marginal $R^2$ and Conditional $R^2$ are two measures used to assess the goodness-of-fit of multilevel (mixed-effects) models. These measures help determine how much of the variance in the response variable is explained by the fixed effects and the combination of fixed and random effects, respectively.

Marginal $R^2$ represents the proportion of variance in the response variable that is explained only by the fixed effects in the model. Shows the explanatory power of the fixed predictors (e.g., variables like GEN, DAYS, N_RATE , Maturity_days + Weeding, Irrigation in your model).

Conditional $R^2$ represents the proportion of variance in the response variable that is explained by both fixed and random effects in the models. It reflects the combined explanatory power of the fixed predictors and the hierarchical structure (random effects).

Total Variance ($\sigma^2_{\text{total}}$) = Variance explained by fixed effects ($\sigma^2_{\text{fixed}}$) + Variance explained by random effects ($\sigma^2_{\text{random}}$) + Residual variance ($\sigma^2_{\text{residual}}$).

\[ R^2_{\text{marginal}} = \frac{\sigma^2_{\text{fixed}}}{\sigma^2_{\text{total}}} \]

\[ R^2_{\text{conditional}} = \frac{\sigma^2_{\text{fixed}} + \sigma^2_{\text{random}}}{\sigma^2_{\text{total}}} \]

Visualize Random Effects

sjPlot::plot_model() function can be used to visualize random effects.

Code

# Visualize random effects 
sjPlot::plot_model(m_02, type = "re", show.values = TRUE)

The values you see are NOT actual values, but rather the difference between the general intercept or slope value found in your model summary and the estimate for this specific level of random effect.

Marginal Effects and Adjusted Predictions

If we want the marginal effects for “N_RATE”, you may use margins() function of {margins} package:

Code

margins::margins(m_02, variables = "N_RATE")

 N_RATE67 N_RATE100 N_RATE133
    1.663     1.937     2.422

{ggeffects} package supports labelled data and the plot() method automatically sets titles, axis - and legend-labels depending on the value and variable labels of the data.

Code

plot(ggeffects::predict_response(m_02, terms = c("DAYS", "GEN")), facets = TRUE)

Cross-validation

To perform cross-validation for the full model and evaluate its performance, we can use techniques such as k-fold cross-validation. Here, we split the data into k folds, train the model on k-1 folds, and evaluate it on the remaining fold. We repeat this process k times, each time with a different fold held out for testing.

Code

n<-length(df$GY)
k <- 5  # Number of folds
folds <- cut(seq(1, n), breaks = k, labels = FALSE)

# Initialize lists to store results
results <- list()
rmse_values <- c()
# Cross-validation loop
for (i in seq_along(folds)) {
  # Split data into training and testing sets
  train_data <- df[-folds[[i]], ]
  test_data <- df[folds[[i]], ]
  
  # Fit the full model on training data
  model <- lmerTest::lmer(GY ~ GEN+ENV+DAYS+Maturity_days + N_RATE +Weeding + Irrigation + (1|FARMER), data = train_data)
  
  # Predict on test data
  predictions <- predict(model, newdata = test_data, allow.new.levels = TRUE)
  
  # Calculate evaluation metrics
  rmse <- MLmetrics::RMSE(predictions, test_data$GY)
  # Store results
  rmse_values <- c(rmse_values, rmse)
}

Code

# Summarize RMSE and R-squared
mean_rmse <- mean(rmse_values)
sd_rmse <- sd(rmse_values)

# Print performance
cat("Cross-Validation Results:\n")

Cross-Validation Results:

Code

cat("Mean RMSE:", mean_rmse, "±", sd_rmse, "\n")

Mean RMSE: 0.772124 ± 0.3546553

Code

# Plot RMSE values
boxplot(rmse_values, main = "RMSE across folds", ylab = "RMSE")

Using `nlme` Package

lme() function of {nlme} package is used to fit linear mixed-effects models. The syntax (1 | ENV) specifies that we are modeling the random intercept for the grouping variable ENV.

Code

# Full model
model_full_nlme <- nlme::lme(fixed = GY ~ GEN+ENV + DAYS+ N_RATE + Maturity_days + Weeding + Irrigation,
                        random = ~ 1 | FARMER, data = df)

# Summary of the model
summary(model_full_nlme)

Linear mixed-effects model fit by REML
  Data: df 
       AIC      BIC    logLik
  731.5747 807.6486 -346.7874

Random effects:
 Formula: ~1 | FARMER
        (Intercept)  Residual
StdDev:   0.3914348 0.4715591

Fixed effects:  GY ~ GEN + ENV + DAYS + N_RATE + Maturity_days + Weeding + Irrigation 
                  Value Std.Error  DF   t-value p-value
(Intercept)    3.215360 1.9754016 365  1.627700  0.1045
GENBG 25       0.034703 0.0630947 365  0.550019  0.5826
GENBG 27       0.005414 0.0567474 365  0.095407  0.9240
ENVFULTALA     0.871707 0.2172221  40  4.012976  0.0003
ENVKALAPARA   -0.182924 0.3070499  40 -0.595747  0.5547
ENVKALIGANJ    0.555312 0.2204178  40  2.519360  0.0159
ENVSADAR       0.751582 0.2199770  40  3.416638  0.0015
ENVUJIRPUR     0.511433 0.2322791  40  2.201805  0.0335
DAYS          -0.047093 0.0113234  40 -4.158851  0.0002
N_RATE67       1.662902 0.1002650 365 16.585078  0.0000
N_RATE100      1.937429 0.0851462 365 22.754149  0.0000
N_RATE133      2.421983 0.0966878 365 25.049514  0.0000
Maturity_days -0.023807 0.0197033 365 -1.208287  0.2277
Weeding1      -0.067232 0.1053938 365 -0.637913  0.5239
Weeding2      -0.702821 0.3963778  40 -1.773110  0.0838
Irrigation2    0.379142 0.1053512 365  3.598840  0.0004
Irrigation3   -0.108248 0.1552854 365 -0.697089  0.4862
 Correlation: 
              (Intr) GENBG25 GENBG27 ENVFUL ENVKALA ENVKALI ENVSAD ENVUJI
GENBG 25      -0.456                                                     
GENBG 27       0.148  0.368                                              
ENVFULTALA    -0.011 -0.019   0.007                                      
ENVKALAPARA    0.397 -0.191   0.065   0.313                              
ENVKALIGANJ   -0.010 -0.017   0.007   0.492  0.348                       
ENVSADAR       0.053 -0.045   0.018   0.505  0.389   0.530               
ENVUJIRPUR     0.031 -0.031   0.014   0.454  0.407   0.479   0.495       
DAYS          -0.891  0.396  -0.145   0.035 -0.476   0.051  -0.033 -0.035
N_RATE67       0.388 -0.166   0.070   0.036  0.254   0.110   0.159  0.109
N_RATE100      0.419 -0.180   0.071   0.037  0.293   0.147   0.183  0.120
N_RATE133      0.373 -0.155   0.064   0.029  0.264   0.145   0.159  0.095
Maturity_days -0.997  0.445  -0.163  -0.043 -0.434  -0.043  -0.107 -0.081
Weeding1      -0.126  0.050  -0.022   0.009 -0.267  -0.119  -0.153 -0.396
Weeding2      -0.445  0.198  -0.068   0.051 -0.702   0.029  -0.028 -0.104
Irrigation2   -0.012  0.000  -0.003   0.021  0.008  -0.107  -0.062  0.260
Irrigation3    0.009 -0.006  -0.001  -0.184  0.041  -0.078  -0.158  0.036
              DAYS   N_RATE6 N_RATE10 N_RATE13 Mtrty_ Wedng1 Wedng2 Irrgt2
GENBG 25                                                                  
GENBG 27                                                                  
ENVFULTALA                                                                
ENVKALAPARA                                                               
ENVKALIGANJ                                                               
ENVSADAR                                                                  
ENVUJIRPUR                                                                
DAYS                                                                      
N_RATE67      -0.380                                                      
N_RATE100     -0.372  0.687                                               
N_RATE133     -0.295  0.505   0.667                                       
Maturity_days  0.890 -0.400  -0.433   -0.385                              
Weeding1       0.107 -0.355  -0.383   -0.306    0.128                     
Weeding2       0.573 -0.232  -0.259   -0.230    0.444  0.305              
Irrigation2    0.028 -0.220  -0.295   -0.287    0.012 -0.428 -0.093       
Irrigation3   -0.090 -0.304  -0.284   -0.179   -0.002 -0.064 -0.110  0.263

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.87400688 -0.48372421  0.05306451  0.50309706  3.00270066 

Number of Observations: 422
Number of Groups: 48

Code

performance(model_full_nlme)

# Indices of model performance

AIC     |    AICc |     BIC | R2 (cond.) | R2 (marg.) |   ICC |  RMSE | Sigma
-----------------------------------------------------------------------------
731.575 | 733.465 | 808.430 |      0.890 |      0.815 | 0.408 | 0.442 | 0.472

Summary and Concusion

In this tutorial, we explored the construction, implementation, and evaluation of Random Intercept Models using R. These are essential tools in mixed-effects modeling, enabling us to analyze hierarchical or grouped data by accounting for fixed and random effects. They effectively capture group-level variability, making them powerful for analyzing hierarchical data.

Acquiring these skills is crucial for accurately analyzing complex data structures and making robust inferences about predictors and their effects. With this foundation, you can apply Random Intercept Models to your datasets and delve into more advanced multilevel modeling techniques, such as random slopes and cross-level interactions.

References

Books:

“Multilevel and Longitudinal Modeling Using R” by Douglas A. Luke
- This book provides a practical guide to the use of multilevel modeling in applied research, covering both the theory and application of multilevel modeling techniques using R.
“Multilevel Analysis: Techniques and Applications” by Joop Hox
- This book is an introduction to multilevel analysis, focusing on the practical application of multilevel modeling techniques using R.
“Hierarchical Linear Models: Applications and Data Analysis Methods” by Stephen W. Raudenbush and Anthony S. Bryk
- This book provides a comprehensive introduction to hierarchical linear modeling (HLM) and its applications, including examples and applications in R.
“Applied Multilevel Analysis” by Jos Twisk
- This book is a practical guide to the use of multilevel analysis in applied research, covering both the theory and application of multilevel modeling techniques using R.

Online Tutorials:

“Multilevel Modeling in R” by UCLA Institute for Digital Research and Education
- URL: UCLA IDRE Guide
- This tutorial provides an introduction to multilevel modeling in R, including examples and code.
“Random Intercept and Random Slope Models in R” by DataCamp
- URL: DataCamp Guide
- This tutorial covers the basics of random intercept and random slope models in R, including practical examples using the lme4 package.
“Linear Mixed-Effects Modeling in R: A Step-by-Step Tutorial” by Towards Data Science
- URL: Towards Data Science Tutorial
- This tutorial provides a step-by-step guide to linear mixed-effects modeling in R, with practical examples and code.
“An Introduction to Mixed Models in R” by the University of Virginia Library Research Data Services + Sciences
- URL: UVA Library Guide
- This tutorial provides an introductory guide to mixed models in R, covering both theory and practical examples.
“Mixed Models in R Using the lme4 Package: Part 1” by R-bloggers
- URL: R-bloggers Tutorial
- This tutorial provides an introduction to mixed models in R using the lme4 package, with a focus on random intercept models.

1. Random Intercept Model

Overview

Key Features

Example

Estimation

Intraclass Correlation Coefficient (ICC)

Random Intercept Model from Scratch

Simulate the data:

Define the REML Log-Likelihood Function

Optimization with Bounds

Summary of Model Parameters

Random Intercept Model in R

Install Required R Packages

Load Packages

Data

Data Processing

Convert to Factor

Data Exploration

Model Fitting

Using lme4 Package

Intercept-Only Model

Mixed-Effects Model

Mixed-Effects Model with Cross Level Interaction

Crossed Random Effects Model

Nested Random Effects Model

Model Comparison

Model Interpretaion

Model Performance

Visualize Random Effects

Marginal Effects and Adjusted Predictions

Cross-validation

Using nlme Package

Summary and Concusion

References

Books:

Online Tutorials:

Using `lme4` Package

Using `nlme` Package