3. Correspondence Analysis (CA)

In this tutorial, we will discuss the key concepts of Simple Correspondence Analysis (CA), and its’ variants such as Multiple Correspondence Abnalysis, Detrended Correspondence Analysis (DCA), Canonical Correspondence Analysis (CCA), and Canonical Correlation Analysis (CCA) including the steps involved in performing these analyses, interpreting the results, and visualizing the relationships between categories. We will also demonstrate how to perform these analyses in R using both a manual approach and the different R-packages

Overview

Simple Correspondence Analysis (SCA)

Simple Correspondence Analysis (SCA) is a multivariate dimension reduction technique used to summarize large contingency tables into a smaller number of dimensions, allowing you to visualize the relationships between categories. CA is similar to Principal Component Analysis (PCA) but is specifically designed for categorical data. It is often used in fields such as marketing, market research, and social sciences to analyze survey data, customer preferences, and other categorical data.

Input: A contingency table \(\mathbf{N}\) (rows = categories, columns = categories).
Step 1: Compute the relative frequencies:

\[\mathbf{P} = \frac{\mathbf{N}}{\text{grand total}}\]
Step 2: Compute row and column margins:

\[ \mathbf{r} = \text{row sums of } \mathbf{P}, \quad \mathbf{c} = \text{column sums of } \mathbf{P}\]
Step 3: Standardize deviations:

\[\mathbf{S} = \mathbf{P} - \mathbf{r} \mathbf{c}^\top\]
Step 4: Perform Singular Value Decomposition (SVD):

\[\mathbf{S} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top\]
Step 5: Obtain principal coordinates for rows and columns:

\[ \mathbf{F} = \mathbf{U} \boldsymbol{\Sigma}, \quad \mathbf{G} = \mathbf{V} \boldsymbol{\Sigma} \]

Multiple Correspondence Analysis (MCA)

Multiple Correspondence Analysis (MCA) is an extension of Correspondence Analysis (CA) used to analyze datasets involving more than two categorical variables. It explores and visualizes relationships between individuals (rows) and categorical variables (columns) in a low-dimensional space. MCA is particularly useful for uncovering patterns in survey data, market research, or any dataset with multiple categorical variables. Mathmatical representation of MCA is similar to CA but it is applied to more than two categorical variables.

Input: A dataset with \(m\) categorical variables converted to a complete disjunctive table \(\mathbf{Z}\).
Step 1: Compute \(\mathbf{P}\) and margins \(\mathbf{r}\), \(\mathbf{c}\).
Step 2: Standardize \(\mathbf{Z}\):

\[ \mathbf{S} = \mathbf{Z} - \mathbf{r} \mathbf{c}^\top \]
Step 3: Perform SVD: Same as in SCA.

Detrended Correspondence Analysis (DCA)

Detrended Correspondence Analysis (DCA) is an ordination technique used primarily in ecological studies to handle species composition data. It is an extension of Correspondence Analysis (CA) that corrects for two main problems that arise in CA: arch effects (curvature in ordination space) and compression of gradients (uneven scaling along axes).

Follow SCA to compute initial scores.
Step 1: Divide the ordination axes into segments.
Step 2: Compute segment means and subtract these from the scores to remove the arch effect.
Step 3: Rescale the axes to represent species turnover.

Canonical Correspondence Analysis (CCA)

Input:
- \(\mathbf{Y}\): Species abundance data (rows = sites, columns = species).
- \(\mathbf{X}\): Environmental variables (rows = sites, columns = variables).
Step 1: Compute \(\mathbf{X}'\mathbf{X}\) and project \(\mathbf{Y}\) into the space of \(\mathbf{X}\):

\[ \mathbf{Y}_c = \mathbf{Y} - \mathbf{X} (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}' \mathbf{Y} \]
Step 2: Perform SVD on \(\mathbf{Y}_c\).

Canonical Correlation Analysis (CCA)

Canonical Correlation Analysis (CCA) is a statistical method used to analyze the relationship between two sets of variables. It is a multivariate technique that seeks to find the linear combination of variables in each set that has the highest correlation with the linear combination of variables in the other set.

In other words, CCA aims to find the linear combinations of variables in two sets that are most related to each other, while also maximizing the correlation within each set. It is a useful tool in data analysis when there are two or more sets of variables that are thought to be related.

Input:

Two datasets, \(\mathbf{X}\) (e.g., environmental data) and \(\mathbf{Y}\) (e.g., species data).

Step 1: Compute covariance matrices:
Step 2: Solve the generalized eigenvalue problem:

\[ \mathbf{C}_{XY} \mathbf{C}_{YY}^{-1} \mathbf{C}_{YX} \mathbf{a} = \lambda \mathbf{C}_{XX} \mathbf{a} \]

Step 3: Use eigenvectors to compute canonical scores.

Summary Table

Method	Input	Key Computation	R Output Example
Simple CA	Contingency table \(N\))	SVD on standardized residuals \(S\)	\(F\) and \(G\) (row, column scores)
Multiple CA	Indicator matrix \(Z\)	SVD on \(Z\)	\(F\) and \(G\)
Detrended CA	Contingency table \(N\)	Detrend and rescale \(F\), \(G\)	Adjusted \(F\), \(G\)
Canonical CA	Species \(Y\), Environment \(X\)	Constrain scores using \(\beta\)	\(Z\) (constrained ordination)
Canonical Correlation	Matrices \(X\), \(Y\)	Solve eigenvalue problem	Canonical coefficients \(a, b\)

Performing Correspondence Analysis Scratch

Simple Correspondence Analysis (SCA)

Here below is the step-by-step guide to perform Simple Correspondence Analysis (SCA) in R without using any packages:

Code

# Example contingency table
N <- matrix(c(50, 30, 20, 40, 60, 50), nrow = 2)
rownames(N) <- c("Row1", "Row2")
colnames(N) <- c("Col1", "Col2", "Col3")

# Step 1: Compute relative frequencies
P <- N / sum(N)

# Step 2: Compute row and column margins
row_margins <- rowSums(P)
col_margins <- colSums(P)

# Step 3: Standardize deviations
S <- P - row_margins %*% t(col_margins)

# Step 4: Perform SVD
SVD <- svd(S)
U <- SVD$u
V <- SVD$v
Sigma <- diag(SVD$d)

# Step 5: Compute principal coordinates
F <- U %*% Sigma
G <- V %*% Sigma

# Print results
cat("Row Scores (F):\n", F, "\n")

Row Scores (F):
 -0.05710902 0.05710902 0 0

Code

cat("Column Scores (G):\n", G, "\n")

Column Scores (G):
 -0.04751758 0.06335677 -0.01583919 0 0 0

Code

# Visualization
plot(F, type = "p", xlab = "Dim1", ylab = "Dim2", col = "blue", main = "SCA Plot")
points(G, col = "red", pch = 2)
legend("topright", legend = c("Rows", "Columns"), col = c("blue", "red"), pch = c(1, 2))

Modifications for MCA, DCA, CCA, and Canonical Correlation

MCA

Replace \(N\) with a complete disjunctive table for multiple categorical variables.

DCA

Detrending would involve manually adjusting axis scores, as described in the explanation.

CCA and Canonical Correlation

For CCA and Canonical Correlation Analysis, compute projections or eigenvalues as outlined in the math sections above using matrix operations.

Example Summary and Visualization

Results:

SCA: Visualizes relationships between rows and columns of the contingency table.
MCA: Shows relationships among multiple variables and individuals.
CCA: Highlights species-environment interactions.

Plot Example:

Plots will use scatterplots with different colors for rows, columns, or variable types to highlight associations.

By coding this way, you maintain full transparency of the mathematical process, but it requires careful handling of data to avoid errors!

Performing Correspondance Analysis in R

Performing different types Correspondence Analysis (CA) in R is straightforward using the {FactoMineR}, {vegan}, and {ade4} packages. These packages provide functions to perform CA, MCA, DCA, CCA, and Canonical Correlation Analysis with minimal coding effort.

Install Required R Packages

Following R packages are required to run this notebook. If any of these packages are not installed, you can install them using the code below:

Code

packages <- c('tidyverse', 
              'plyr',
              'corrr',
              'ggcorrplot',
              'factoextra',
              'ade4',
              'psych',
              'FactoMineR',
              'CCA',
              'vegan'
         )

#| warning: false
#| error: false

# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Verify installation
cat("Installed packages:\n")
print(sapply(packages, requireNamespace, quietly = TRUE))

Load Packages

Code

# Load packages with suppressed messages
invisible(lapply(packages, function(pkg) {
  suppressPackageStartupMessages(library(pkg, character.only = TRUE))
}))

Code

# Check loaded packages
cat("Successfully loaded packages:\n")

Successfully loaded packages:

Code

print(search()[grepl("package:", search())])

 [1] "package:vegan"       "package:lattice"     "package:permute"    
 [4] "package:CCA"         "package:fields"      "package:viridisLite"
 [7] "package:spam"        "package:fda"         "package:deSolve"    
[10] "package:fds"         "package:RCurl"       "package:rainbow"    
[13] "package:pcaPP"       "package:MASS"        "package:splines"    
[16] "package:FactoMineR"  "package:psych"       "package:ade4"       
[19] "package:factoextra"  "package:ggcorrplot"  "package:corrr"      
[22] "package:plyr"        "package:lubridate"   "package:forcats"    
[25] "package:stringr"     "package:dplyr"       "package:purrr"      
[28] "package:readr"       "package:tidyr"       "package:tibble"     
[31] "package:ggplot2"     "package:tidyverse"   "package:stats"      
[34] "package:graphics"    "package:grDevices"   "package:utils"      
[37] "package:datasets"    "package:methods"     "package:base"

Simple Correspondence Analysis (SCA)

We will perform Simple Correspondence Analysis (SCA) using following R packages:

{FactoMineR}: ideal for a beginner-friendly and detailed analysis, with good default visualizations.
{vegan}: useful for ecological and environmental data.
{ade4}: flexible and supports advanced multivariate analyses.

Data

In this exercise we will use the children dataset from the {FactoMineR} package to perform Canonical Correlation Analysis (CCA). The children dataset contains information about children’s preferences for different activities. The data used here is a contingency table that summarizes the answers given by different categories of people to the following question : according to you, what are the reasons that can make hesitate a woman or a couple to have children?

Code

data(children)
mf.children<-na.omit(children)
glimpse(mf.children)

Rows: 14
Columns: 8
$ unqualified         <int> 51, 53, 71, 1, 7, 7, 21, 12, 10, 4, 8, 25, 18, 35
$ cep                 <int> 64, 90, 111, 7, 11, 13, 37, 35, 7, 7, 22, 45, 27, …
$ bepc                <int> 32, 78, 50, 5, 4, 12, 14, 19, 7, 7, 7, 38, 20, 29
$ high_school_diploma <int> 29, 75, 40, 5, 3, 11, 26, 6, 3, 6, 10, 38, 19, 14
$ university          <int> 17, 22, 11, 4, 2, 11, 9, 7, 1, 2, 5, 13, 9, 12
$ thirty              <int> 59, 115, 79, 9, 2, 18, 14, 21, 8, 7, 10, 48, 13, 30
$ fifty               <int> 66, 117, 88, 8, 17, 19, 34, 30, 12, 6, 27, 59, 29,…
$ more_fifty          <int> 70, 86, 177, 5, 18, 17, 61, 28, 8, 13, 17, 52, 53,…

SCA using {FactoMineR} Package

CA() function from the {FactoMineR} performs Correspondence Analysis (CA) including supplementary row and/or column points. The row.sup and col.sup arguments are used to specify the supplementary row and column points, respectively.

Code

children.CA <- CA (mf.children, 
              #row.sup = 15:18, 
              #col.sup = 6:8, 
              graph = FALSE)
children.CA

**Results of the Correspondence Analysis (CA)**
The row variable has  14  categories; the column variable has 8 categories
The chi square of independence between the two variables is equal to 213.7825 (p-value =  6.994847e-12 ).
*The results are available in the following objects:

   name              description                   
1  "$eig"            "eigenvalues"                 
2  "$col"            "results for the columns"     
3  "$col$coord"      "coord. for the columns"      
4  "$col$cos2"       "cos2 for the columns"        
5  "$col$contrib"    "contributions of the columns"
6  "$row"            "results for the rows"        
7  "$row$coord"      "coord. for the rows"         
8  "$row$cos2"       "cos2 for the rows"           
9  "$row$contrib"    "contributions of the rows"   
10 "$call"           "summary called parameters"   
11 "$call$marge.col" "weights of the columns"      
12 "$call$marge.row" "weights of the rows"

Extract eigenvalues

Code

# extract eigenvalues 
eig.CA <- get_eigenvalue(children.CA) 
eig.CA

        eigenvalue variance.percent cumulative.variance.percent
Dim.1 0.0340503686        51.971687                    51.97169
Dim.2 0.0114874410        17.533487                    69.50517
Dim.3 0.0094083469        14.360128                    83.86530
Dim.4 0.0043373912         6.620238                    90.48554
Dim.5 0.0033585191         5.126168                    95.61171
Dim.6 0.0021470049         3.277012                    98.88872
Dim.7 0.0007280789         1.111280                   100.00000

Get Row and Column Profiles

Code

get_ca_row(children.CA)$contrib

                     Dim 1      Dim 2      Dim 3       Dim 4       Dim 5
money          0.049764545  6.0443835  2.9561580  3.13233964 36.29180231
future        32.368832785  0.9752521  7.2273787  9.23415168  2.52885186
unemployment  26.690166995  0.9206156 25.2245613  4.25460697  4.58775240
circumstances  5.869044546  0.3012778  3.1991888 11.31445487  1.33592601
hard           5.778767898  1.8754374  8.5170381  7.81762876  0.09137883
economic       6.276290495  1.7535909 10.2827548 30.58601311  9.23937770
egoism         8.196641491 17.2938279  8.9732441  3.77403344  1.47207121
employment     0.006068780 12.8338408  1.5492596 10.38619294 26.26586060
finances       0.082798962  6.7546353  4.0805557  5.83474366 13.53974359
war            0.002773799  6.3296027  0.5532762  1.10403085  1.32605158
housing        0.040268520  3.2895376 16.7428859  8.17315117  1.46964639
fear           7.927149801  3.5198263  0.1255356  4.30430115  1.09134794
health         4.435170218 13.3106654  5.5262596  0.02179244  0.69981799
work           2.276261165 24.7975067  5.0419036  0.06255933  0.06037157

Code

get_ca_col(children.CA)$contrib

                        Dim 1       Dim 2      Dim 3       Dim 4       Dim 5
unqualified          5.071074  7.76394678 10.7592712  2.72428795 53.75168412
cep                  3.559470 12.52169557  1.7948192  1.07797842 19.50953210
bepc                 8.088715  0.08731571  1.0579701  0.07954365  9.03546564
high_school_diploma  9.824496 44.94476146  0.8936160 20.08590600  0.94805453
university           4.687857  0.55980474 36.5611703 38.40238427 15.57846290
thirty              22.643901  0.36450413 32.2348939 11.83206221  0.11209628
fifty                1.470514 15.14051045 16.3161964 22.25512288  0.01224293
more_fifty          44.653972 18.61746115  0.3820629  3.54271461  1.05246149

Visualize Correspondence Analysis

Correspondence analysis (CA) is an extension of Principal Component Analysis (PCA) suited to analyze frequencies formed by two categorical variables. fviz_ca() provides ggplot2-based elegant visualization of CA outputs from the R functions: CA in {FactoMineR}, ca in {ca}, coa {in ade4}, correspondence {in MASS} and expOutput/epCA in {ExPosition}.

fviz_ca_row(): Graph of row variables

fviz_ca_col(): Graph of column variables

fviz_ca_biplot(): Biplot of row and column variables

fviz_ca(): An alias of fviz_ca_biplot()

Code

# Symetric Biplot of rows and columns
fviz_ca_biplot(children.CA)

The above biplot shows the relationships between the row and column categories in the children dataset. The length and direction of the vectors represent the strength and direction of the relationships between the variables. The closer the variables are to each other, the more similar they are in terms of their relationships with other variables. Red points represent the row categories, while blue points represent the column categories. The biplot helps to visualize the relationships between the different categories in the dataset.

Code

# Symetric Biplot of rows and columns
fviz_ca_biplot(children.CA)

Code

# Asymetric biplot, use arrows for columns
fviz_ca_biplot(children.CA, map ="rowprincipal",
 arrow = c(FALSE, TRUE))

The arrows in the above plot represent the column categories, while the points represent the row categories. The length and direction of the arrows indicate the strength and direction of the relationships between the column categories. The asymmetrical biplot helps to visualize the relationships between the column categories in the dataset.

Code

# Keep only the labels for row points
fviz_ca_biplot(children.CA, label ="row")

Code

# Keep only labels for column points
fviz_ca_biplot(children.CA, label ="col")

Code

# Select the top 7 contributing rows
# And the top 3 columns
fviz_ca_biplot(children.CA,  
               select.row = list(contrib = 7),
               select.col = list(contrib = 3))

SCA using the {adea4} package

dudi.coa() function from the {ade4} package is used to perform Correspondence Analysis (CA) in R. The dudi.coa() function takes a contingency table as input and returns an object of class dudi that contains the results of the analysis. scannf is a logical value indicating whether the eigenvalues bar plot should be displayed and nf an integer indicating the number of kept axes

Code

chlidren.coa<-dudi.coa(mf.children, scannf = FALSE, nf = 2)
chlidren.coa

Duality diagramm
class: coa dudi
$call: dudi.coa(df = mf.children, scannf = FALSE, nf = 2)

$nf: 2 axis-components saved
$rank: 7
eigen values: 0.03405 0.01149 0.009408 0.004337 0.003359 ...
  vector length mode    content       
1 $cw    8      numeric column weights
2 $lw    14     numeric row weights   
3 $eig   7      numeric eigen values  

  data.frame nrow ncol content             
1 $tab       14   8    modified array      
2 $li        14   2    row coordinates     
3 $l1        14   2    row normed scores   
4 $co        8    2    column coordinates  
5 $c1        8    2    column normed scores
other elements: N

Extract and Visualize the Scores of dimensions

Code

# Extract the scores of the dimensions
scores <- as.data.frame(chlidren.coa$li) 
colnames(scores) <- c("Dim.1", "Dim.2")
# plot the scores
ggplot(scores, aes(x = Dim.1, y = Dim.2, 
                   label = rownames(scores))) + 
 geom_text() + 
 xlab("Dimension 1") + 
 ylab("Dimension 2") + 
 ggtitle("Correspondence Analysis Scatterplot")

The above plot shows the scores of the dimensions in the children dataset. Each point represents a category in the dataset, and the position of the points on the plot indicates their position in the dimensions of the analysis. The plot helps to visualize the relationships between the different categories in the dataset.

Biolot of Row and Column Categories

Code

fviz_ca_biplot(chlidren.coa)

SCA using the {vegan} package

The cca() function from the {vegan} package is used to perform SCA in R. It also optionally constrained correspondence analysis (a.k.a. canonical correspondence analysis), or optionally partial constrained correspondence analysis. Function `rda(0)1 performs redundancy analysis, or optionally principal components analysis. These are all very popular ordination techniques in community ecology.

Code

chlidren.cca<-cca(X=mf.children)
summary(chlidren.cca)


Call:
cca(X = mf.children) 

Partitioning of scaled Chi-square:
              Inertia Proportion
Total         0.06552          1
Unconstrained 0.06552          1

Eigenvalues, and their contribution to the scaled Chi-square 

Importance of components:
                          CA1     CA2      CA3      CA4      CA5      CA6
Eigenvalue            0.03405 0.01149 0.009408 0.004337 0.003359 0.002147
Proportion Explained  0.51972 0.17533 0.143601 0.066202 0.051262 0.032770
Cumulative Proportion 0.51972 0.69505 0.838653 0.904855 0.956117 0.988887
                            CA7
Eigenvalue            0.0007281
Proportion Explained  0.0111128
Cumulative Proportion 1.0000000

Extract Eigenvalues

Code

summary(eigenvals(chlidren.cca))[,1:6] %>% round(3)

                        CA1   CA2   CA3   CA4   CA5   CA6
Eigenvalue            0.034 0.011 0.009 0.004 0.003 0.002
Proportion Explained  0.520 0.175 0.144 0.066 0.051 0.033
Cumulative Proportion 0.520 0.695 0.839 0.905 0.956 0.989

Extract and Visualize the Scores of dimensions

Code

scores(chlidren.cca, display = "sites")

                      CA1        CA2
money          0.06469231  0.7129658
future         1.28867528 -0.2236858
unemployment  -1.17855651 -0.2188840
circumstances  2.08624785 -0.4726784
hard          -1.71646984  0.9778446
economic       1.37704514 -0.7278817
egoism        -1.11275520 -1.6163196
employment     0.03540224  1.6280147
finances       0.21964786  1.9838803
war            0.04172000 -1.9929440
housing        0.11133672  1.0062897
fear           0.90188987 -0.6009739
health        -0.87737400 -1.5199507
work          -0.49592512  1.6368507

Code

scores(chlidren.cca, display = "species")

                            CA1         CA2
unqualified         -0.13207425  0.09492054
cep                 -0.08581732  0.09348988
bepc                 0.16706339 -0.01008181
high_school_diploma  0.19570511 -0.24312930
university           0.20412755 -0.04097165
thirty               0.24104666  0.01776346
fifty                0.05330527  0.09934740
more_fifty          -0.27355379 -0.10259440

plot() function can be used to visualize the results of a CCA. The plot() function provides a variety of plots, including biplots, ordination plots, and scree plots, to help visualize the relationships between the categories in the dataset.

Code

plot(chlidren.cca, type="n", scaling="sites")
#text(chlidren.cca, dis="cn", scaling="sites")
points(chlidren.cca, pch=21, col="red", bg="yellow", cex=1.2, scaling="sites")
text(chlidren.cca, "species", col="blue", cex=0.8, scaling="sites")

Multiple Correspondence Analysis (MCA)

Data

This exercise we use the tea dataset from the {FactoMineR} package to perform Multiple Correspondence Analysis (MCA). The tea dataset contains information about 300 individuals how they drink tea (18 questions), what are their product’s perception (12 questions) and some personal details (4 questions). A data frame with 300 rows and 36 columns. Rows represent the individuals, columns represent the different questions. The first 18 questions are active ones, the 19th is a supplementary quantitative variable (the age) and the last variables are supplementary categorical variables.

Code

# Example dataset (from FactoMineR)
data(tea)
glimpse(tea)

Rows: 300
Columns: 36
$ breakfast        <fct> breakfast, breakfast, Not.breakfast, Not.breakfast, b…
$ tea.time         <fct> Not.tea time, Not.tea time, tea time, Not.tea time, N…
$ evening          <fct> Not.evening, Not.evening, evening, Not.evening, eveni…
$ lunch            <fct> Not.lunch, Not.lunch, Not.lunch, Not.lunch, Not.lunch…
$ dinner           <fct> Not.dinner, Not.dinner, dinner, dinner, Not.dinner, d…
$ always           <fct> Not.always, Not.always, Not.always, Not.always, alway…
$ home             <fct> home, home, home, home, home, home, home, home, home,…
$ work             <fct> Not.work, Not.work, work, Not.work, Not.work, Not.wor…
$ tearoom          <fct> Not.tearoom, Not.tearoom, Not.tearoom, Not.tearoom, N…
$ friends          <fct> Not.friends, Not.friends, friends, Not.friends, Not.f…
$ resto            <fct> Not.resto, Not.resto, resto, Not.resto, Not.resto, No…
$ pub              <fct> Not.pub, Not.pub, Not.pub, Not.pub, Not.pub, Not.pub,…
$ Tea              <fct> black, black, Earl Grey, Earl Grey, Earl Grey, Earl G…
$ How              <fct> alone, milk, alone, alone, alone, alone, alone, milk,…
$ sugar            <fct> sugar, No.sugar, No.sugar, sugar, No.sugar, No.sugar,…
$ how              <fct> tea bag, tea bag, tea bag, tea bag, tea bag, tea bag,…
$ where            <fct> chain store, chain store, chain store, chain store, c…
$ price            <fct> p_unknown, p_variable, p_variable, p_variable, p_vari…
$ age              <int> 39, 45, 47, 23, 48, 21, 37, 36, 40, 37, 32, 31, 56, 6…
$ sex              <fct> M, F, F, M, M, M, M, F, M, M, M, M, M, M, M, M, M, F,…
$ SPC              <fct> middle, middle, other worker, student, employee, stud…
$ Sport            <fct> sportsman, sportsman, sportsman, Not.sportsman, sport…
$ age_Q            <fct> 35-44, 45-59, 45-59, 15-24, 45-59, 15-24, 35-44, 35-4…
$ frequency        <fct> 1/day, 1/day, +2/day, 1/day, +2/day, 1/day, 3 to 6/we…
$ escape.exoticism <fct> Not.escape-exoticism, escape-exoticism, Not.escape-ex…
$ spirituality     <fct> Not.spirituality, Not.spirituality, Not.spirituality,…
$ healthy          <fct> healthy, healthy, healthy, healthy, Not.healthy, heal…
$ diuretic         <fct> Not.diuretic, diuretic, diuretic, Not.diuretic, diure…
$ friendliness     <fct> Not.friendliness, Not.friendliness, friendliness, Not…
$ iron.absorption  <fct> Not.iron absorption, Not.iron absorption, Not.iron ab…
$ feminine         <fct> Not.feminine, Not.feminine, Not.feminine, Not.feminin…
$ sophisticated    <fct> Not.sophisticated, Not.sophisticated, Not.sophisticat…
$ slimming         <fct> No.slimming, No.slimming, No.slimming, No.slimming, N…
$ exciting         <fct> No.exciting, exciting, No.exciting, No.exciting, No.e…
$ relaxing         <fct> No.relaxing, No.relaxing, relaxing, relaxing, relaxin…
$ effect.on.health <fct> No.effect on health, No.effect on health, No.effect o…

Fit Multiple Correspondence Analysis (MCA)

The MCA() function from the {FactoMineR} package is used to perform Multiple Correspondence Analysis (MCA) in R. The MCA() function takes a data frame with categorical variables as input and returns an object of class MCA that contains the results of the analysis. quanti.sup and quali.sup are used to specify the quantitative and categorical supplementary variables, respectively. The graph argument is used to specify whether to display the graph of the results.

Code

res.mca <- MCA(tea,
               quanti.sup=19,
               quali.sup=20:36, 
               graph = FALSE)
summary(res.mca, plot=FALSE)


Call:
MCA(X = tea, quanti.sup = 19, quali.sup = 20:36, graph = FALSE) 


Eigenvalues
                       Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
Variance               0.148   0.122   0.090   0.078   0.074   0.071   0.068
% of var.              9.885   8.103   6.001   5.204   4.917   4.759   4.522
Cumulative % of var.   9.885  17.988  23.989  29.192  34.109  38.868  43.390
                       Dim.8   Dim.9  Dim.10  Dim.11  Dim.12  Dim.13  Dim.14
Variance               0.065   0.062   0.059   0.057   0.054   0.052   0.049
% of var.              4.355   4.123   3.902   3.805   3.628   3.462   3.250
Cumulative % of var.  47.745  51.867  55.769  59.574  63.202  66.664  69.914
                      Dim.15  Dim.16  Dim.17  Dim.18  Dim.19  Dim.20  Dim.21
Variance               0.048   0.047   0.046   0.040   0.038   0.037   0.036
% of var.              3.221   3.127   3.037   2.683   2.541   2.438   2.378
Cumulative % of var.  73.135  76.262  79.298  81.982  84.523  86.961  89.339
                      Dim.22  Dim.23  Dim.24  Dim.25  Dim.26  Dim.27
Variance               0.035   0.031   0.029   0.027   0.021   0.017
% of var.              2.323   2.055   1.915   1.821   1.407   1.139
Cumulative % of var.  91.662  93.717  95.633  97.454  98.861 100.000

Individuals (the 10 first)
                 Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr
1             | -0.541  0.658  0.143 | -0.149  0.061  0.011 | -0.306  0.347
2             | -0.361  0.293  0.133 | -0.078  0.017  0.006 | -0.633  1.483
3             |  0.073  0.012  0.003 | -0.169  0.079  0.018 |  0.246  0.224
4             | -0.572  0.735  0.235 |  0.018  0.001  0.000 |  0.203  0.153
5             | -0.253  0.144  0.079 | -0.118  0.038  0.017 |  0.006  0.000
6             | -0.684  1.053  0.231 |  0.032  0.003  0.001 | -0.018  0.001
7             | -0.111  0.027  0.022 | -0.182  0.090  0.059 | -0.207  0.159
8             | -0.210  0.099  0.043 | -0.068  0.013  0.004 | -0.421  0.655
9             |  0.118  0.031  0.012 |  0.229  0.144  0.044 | -0.538  1.070
10            |  0.258  0.150  0.045 |  0.478  0.627  0.156 | -0.482  0.861
                cos2  
1              0.046 |
2              0.409 |
3              0.038 |
4              0.030 |
5              0.000 |
6              0.000 |
7              0.077 |
8              0.174 |
9              0.244 |
10             0.158 |

Categories (the 10 first)
                 Dim.1    ctr   cos2 v.test    Dim.2    ctr   cos2 v.test  
breakfast     |  0.166  0.495  0.025  2.756 | -0.166  0.607  0.026 -2.764 |
Not.breakfast | -0.153  0.457  0.025 -2.756 |  0.154  0.560  0.026  2.764 |
Not.tea time  | -0.498  4.053  0.192 -7.578 |  0.093  0.174  0.007  1.423 |
tea time      |  0.386  3.142  0.192  7.578 | -0.072  0.135  0.007 -1.423 |
evening       |  0.319  1.307  0.053  3.985 | -0.058  0.053  0.002 -0.728 |
Not.evening   | -0.167  0.683  0.053 -3.985 |  0.030  0.028  0.002  0.728 |
lunch         |  0.659  2.385  0.075  4.722 | -0.390  1.018  0.026 -2.793 |
Not.lunch     | -0.113  0.410  0.075 -4.722 |  0.067  0.175  0.026  2.793 |
dinner        | -0.661  1.146  0.033 -3.136 |  0.796  2.025  0.048  3.774 |
Not.dinner    |  0.050  0.086  0.033  3.136 | -0.060  0.152  0.048 -3.774 |
               Dim.3    ctr   cos2 v.test  
breakfast     -0.483  6.900  0.215 -8.017 |
Not.breakfast  0.445  6.369  0.215  8.017 |
Not.tea time   0.265  1.886  0.054  4.027 |
tea time      -0.205  1.462  0.054 -4.027 |
evening        0.451  4.312  0.106  5.640 |
Not.evening   -0.236  2.254  0.106 -5.640 |
lunch          0.301  0.822  0.016  2.160 |
Not.lunch     -0.052  0.141  0.016 -2.160 |
dinner         0.535  1.235  0.022  2.537 |
Not.dinner    -0.040  0.093  0.022 -2.537 |

Categorical variables (eta2)
                Dim.1 Dim.2 Dim.3  
breakfast     | 0.025 0.026 0.215 |
tea.time      | 0.192 0.007 0.054 |
evening       | 0.053 0.002 0.106 |
lunch         | 0.075 0.026 0.016 |
dinner        | 0.033 0.048 0.022 |
always        | 0.045 0.001 0.101 |
home          | 0.005 0.000 0.134 |
work          | 0.112 0.043 0.005 |
tearoom       | 0.372 0.022 0.008 |
friends       | 0.243 0.015 0.103 |

Supplementary categories (the 10 first)
                 Dim.1   cos2 v.test    Dim.2   cos2 v.test    Dim.3   cos2
F             |  0.151  0.033  3.158 | -0.109  0.017 -2.278 | -0.048  0.003
M             | -0.221  0.033 -3.158 |  0.159  0.017  2.278 |  0.070  0.003
employee      | -0.153  0.006 -1.313 | -0.151  0.006 -1.289 |  0.103  0.003
middle        | -0.030  0.000 -0.205 |  0.336  0.017  2.281 | -0.284  0.012
non-worker    | -0.036  0.000 -0.324 |  0.185  0.009  1.666 | -0.291  0.023
other worker  |  0.040  0.000  0.187 |  0.013  0.000  0.061 | -0.063  0.000
senior        |  0.415  0.023  2.608 |  0.072  0.001  0.452 | -0.187  0.005
student       |  0.032  0.000  0.305 | -0.317  0.031 -3.022 |  0.394  0.047
workman       | -0.417  0.007 -1.473 |  0.249  0.003  0.878 |  0.343  0.005
Not.sportsman | -0.030  0.001 -0.426 |  0.018  0.000  0.260 | -0.051  0.002
              v.test  
F             -0.998 |
M              0.998 |
employee       0.884 |
middle        -1.928 |
non-worker    -2.620 |
other worker  -0.289 |
senior        -1.177 |
student        3.760 |
workman        1.209 |
Not.sportsman -0.721 |

Supplementary categorical variables (eta2)
                   Dim.1 Dim.2 Dim.3  
sex              | 0.033 0.017 0.003 |
SPC              | 0.032 0.053 0.076 |
Sport            | 0.001 0.000 0.002 |
age_Q            | 0.008 0.077 0.146 |
frequency        | 0.094 0.006 0.064 |
escape.exoticism | 0.000 0.007 0.000 |
spirituality     | 0.005 0.000 0.016 |
healthy          | 0.000 0.000 0.008 |
diuretic         | 0.004 0.000 0.013 |
friendliness     | 0.071 0.001 0.013 |

Supplementary continuous variable
                 Dim.1    Dim.2    Dim.3  
age           |  0.042 |  0.204 | -0.340 |

dimsec() function can be used to extract the dimensions of the MCA results. The dimdesc() function can be used to o point out the variables and the categories that are the most characteristic according to each dimension obtained by a Factor Analysis.

Code

dimdesc(res.mca)

$`Dim 1`

Link between the variable and the categorical variable (1-way anova)
=============================================
                     R2      p.value
where        0.41793014 1.255462e-35
tearoom      0.37189109 6.082138e-32
how          0.29882863 1.273180e-23
friends      0.24319952 8.616289e-20
resto        0.22646759 2.319804e-18
tea.time     0.19203800 1.652462e-15
price        0.21609382 4.050469e-14
pub          0.14722360 5.846592e-12
work         0.11153590 3.000872e-09
How          0.10285191 4.796010e-07
Tea          0.08950330 8.970954e-07
lunch        0.07458227 1.570629e-06
frequency    0.09438792 1.849071e-06
friendliness 0.07132511 2.706357e-06
evening      0.05311759 5.586801e-05
always       0.04479873 2.219503e-04
sex          0.03335969 1.487620e-03
dinner       0.03289362 1.608077e-03
breakfast    0.02539639 5.667604e-03
sugar        0.01527654 3.234986e-02

Link between variable and the categories of the categorical variables
================================================================
                                 Estimate      p.value
where=chain store+tea shop     0.33853776 1.344557e-35
tearoom=tearoom                0.29731072 6.082138e-32
how=tea bag+unpackaged         0.23457030 1.361423e-21
friends=friends                0.19950832 8.616289e-20
resto=resto                    0.20802605 2.319804e-18
tea.time=tea time              0.17011357 1.652462e-15
pub=pub                        0.18137133 5.846592e-12
price=p_variable               0.27595067 5.956230e-12
work=work                      0.14170406 3.000872e-09
frequency=+2/day               0.14855615 7.380937e-07
lunch=lunch                    0.14862636 1.570629e-06
friendliness=friendliness      0.13020388 2.706357e-06
How=other                      0.38192443 9.244911e-06
evening=evening                0.09345270 5.586801e-05
always=always                  0.08582336 2.219503e-04
sex=F                          0.07158886 1.487620e-03
dinner=Not.dinner              0.13685745 1.608077e-03
How=lemon                      0.01223478 3.515252e-03
breakfast=breakfast            0.06141392 5.667604e-03
SPC=senior                     0.16802844 8.886876e-03
Tea=Earl Grey                  0.12203299 1.547110e-02
sugar=No.sugar                 0.04761975 3.234986e-02
sugar=sugar                   -0.04761975 3.234986e-02
frequency=1 to 2/week         -0.10430402 1.829818e-02
price=p_private label         -0.11979263 1.156245e-02
breakfast=Not.breakfast       -0.06141392 5.667604e-03
dinner=dinner                 -0.13685745 1.608077e-03
sex=M                         -0.07158886 1.487620e-03
How=alone                     -0.23140428 2.326233e-04
always=Not.always             -0.08582336 2.219503e-04
frequency=1/day               -0.10038745 1.556219e-04
evening=Not.evening           -0.09345270 5.586801e-05
friendliness=Not.friendliness -0.13020388 2.706357e-06
lunch=Not.lunch               -0.14862636 1.570629e-06
Tea=green                     -0.24569103 1.281162e-07
work=Not.work                 -0.14170406 3.000872e-09
price=p_branded               -0.10910793 1.116908e-09
pub=Not.pub                   -0.18137133 5.846592e-12
tea.time=Not.tea time         -0.17011357 1.652462e-15
resto=Not.resto               -0.20802605 2.319804e-18
friends=Not.friends           -0.19950832 8.616289e-20
how=tea bag                   -0.23182447 8.877561e-22
where=chain store             -0.24012436 3.008256e-27
tearoom=Not.tearoom           -0.29731072 6.082138e-32

$`Dim 2`

Link between the variable and the continuous variables (R-square)
=================================================================================
    correlation      p.value
age   0.2035108 0.0003890693

Link between the variable and the categorical variable (1-way anova)
=============================================
                      R2      p.value
where         0.62550194 4.542155e-64
price         0.56056797 1.837909e-50
how           0.51288621 4.103156e-47
Tea           0.16034278 5.359827e-12
resto         0.05883014 2.165287e-05
age_Q         0.07663110 9.613084e-05
dinner        0.04764166 1.385133e-04
work          0.04334283 2.825934e-04
sugar         0.03078909 2.286813e-03
How           0.04300447 4.565763e-03
lunch         0.02609615 5.035226e-03
breakfast     0.02554407 5.527765e-03
sophisticated 0.02298649 8.531637e-03
tearoom       0.02159669 1.081515e-02
SPC           0.05335498 1.284774e-02
sex           0.01734823 2.250375e-02
friends       0.01527530 3.235693e-02

Link between variable and the categories of the categorical variables
================================================================
                                   Estimate      p.value
where=tea shop                   0.56623933 3.435386e-58
price=p_upscale                  0.58675674 6.819842e-53
how=unpackaged                   0.47523577 4.876111e-43
Tea=green                        0.17636551 5.660702e-07
resto=Not.resto                  0.09599600 2.165287e-05
Tea=black                        0.02833892 1.280976e-04
dinner=dinner                    0.14912299 1.385133e-04
work=Not.work                    0.07997829 2.825934e-04
sugar=No.sugar                   0.06120846 2.286813e-03
lunch=Not.lunch                  0.07959849 5.035226e-03
breakfast=Not.breakfast          0.05576534 5.527765e-03
age_Q=+60                        0.11445238 7.567898e-03
sophisticated=sophisticated      0.05865034 8.531637e-03
tearoom=tearoom                  0.06486865 1.081515e-02
SPC=middle                       0.09793868 2.228820e-02
sex=M                            0.04674127 2.250375e-02
friends=Not.friends              0.04527027 3.235693e-02
friends=friends                 -0.04527027 3.235693e-02
price=p_private label           -0.13563786 2.301393e-02
sex=F                           -0.04674127 2.250375e-02
tearoom=Not.tearoom             -0.06486865 1.081515e-02
sophisticated=Not.sophisticated -0.05865034 8.531637e-03
price=p_unknown                 -0.23171262 7.581675e-03
breakfast=breakfast             -0.05576534 5.527765e-03
lunch=lunch                     -0.07959849 5.035226e-03
price=p_variable                -0.04476924 3.517223e-03
SPC=student                     -0.12977090 2.381679e-03
sugar=sugar                     -0.06120846 2.286813e-03
How=milk                        -0.13222339 7.496148e-04
work=work                       -0.07997829 2.825934e-04
dinner=Not.dinner               -0.14912299 1.385133e-04
resto=resto                     -0.09599600 2.165287e-05
age_Q=15-24                     -0.16342547 4.501578e-06
price=p_branded                 -0.11247568 8.001515e-07
Tea=Earl Grey                   -0.20470443 6.499738e-12
how=tea bag                     -0.31755648 2.123707e-18
where=chain store               -0.36891861 5.804094e-23

$`Dim 3`

Link between the variable and the continuous variables (R-square)
=================================================================================
    correlation      p.value
age  -0.3397736 1.530157e-09

Link between the variable and the categorical variable (1-way anova)
=============================================
                     R2      p.value
Tea          0.33178708 9.996725e-27
breakfast    0.21498315 2.138445e-17
sugar        0.21026537 5.278399e-17
How          0.19600156 5.867378e-14
home         0.13440219 5.642852e-11
age_Q        0.14565571 1.852397e-09
evening      0.10637896 7.269113e-09
friends      0.10294629 1.307065e-08
always       0.10133291 1.721112e-08
tea.time     0.05424196 4.636352e-05
frequency    0.06361133 2.167789e-04
pub          0.04311129 2.936636e-04
SPC          0.07625129 6.739742e-04
where        0.04166084 1.801385e-03
exciting     0.02425859 6.872690e-03
dinner       0.02152288 1.095256e-02
spirituality 0.01606168 2.817777e-02
lunch        0.01561039 3.050262e-02
diuretic     0.01302499 4.827448e-02

Link between variable and the categories of the categorical variables
================================================================
                                 Estimate      p.value
Tea=Earl Grey                  0.18122366 5.474630e-22
breakfast=Not.breakfast        0.13921762 2.138445e-17
sugar=sugar                    0.13764789 5.278399e-17
home=Not.home                  0.32238212 5.642852e-11
evening=evening                0.10304138 7.269113e-09
friends=friends                0.10113377 1.307065e-08
always=always                  0.10056783 1.721112e-08
How=lemon                      0.29113799 3.532927e-05
tea.time=Not.tea time          0.07044079 4.636352e-05
frequency=1 to 2/week          0.13367067 4.787775e-05
age_Q=25-34                    0.15458550 9.102105e-05
SPC=student                    0.11762964 1.467086e-04
pub=pub                        0.07646920 2.936636e-04
age_Q=15-24                    0.12278166 4.286711e-04
How=alone                      0.13200504 7.535155e-04
where=tea shop                 0.12875140 1.905017e-03
exciting=exciting              0.04797670 6.872690e-03
dinner=dinner                  0.08625296 1.095256e-02
spirituality=spirituality      0.04098581 2.817777e-02
lunch=lunch                    0.05297807 3.050262e-02
how=unpackaged                 0.07670099 3.765383e-02
diuretic=Not.diuretic          0.03468681 4.827448e-02
diuretic=diuretic             -0.03468681 4.827448e-02
frequency=+2/day              -0.07763057 3.323605e-02
lunch=Not.lunch               -0.05297807 3.050262e-02
spirituality=Not.spirituality -0.04098581 2.817777e-02
where=chain store+tea shop    -0.09887763 2.197901e-02
age_Q=35-44                   -0.07139235 1.961314e-02
dinner=Not.dinner             -0.08625296 1.095256e-02
SPC=non-worker                -0.08793735 8.568111e-03
exciting=No.exciting          -0.04797670 6.872690e-03
age_Q=45-59                   -0.07791412 1.331486e-03
age_Q=+60                     -0.12806070 4.108178e-04
pub=Not.pub                   -0.07646920 2.936636e-04
tea.time=tea time             -0.07044079 4.636352e-05
How=other                     -0.33576960 1.254567e-05
How=milk                      -0.08737342 7.989362e-08
always=Not.always             -0.10056783 1.721112e-08
friends=Not.friends           -0.10113377 1.307065e-08
evening=Not.evening           -0.10304138 7.269113e-09
home=home                     -0.32238212 5.642852e-11
sugar=No.sugar                -0.13764789 5.278399e-17
breakfast=breakfast           -0.13921762 2.138445e-17
Tea=black                     -0.22772627 4.313358e-26

Plot the results of MCA

The plot() function can be used to visualize the results of an MCA. The plot() function provides a variety of plots, including biplots, ordination plots, and scree plots, to help visualize the relationships between the categories in the dataset.

Code

plot(res.mca,invisible=c("var","quali.sup","quanti.sup"),cex=0.7)

Code

plot(res.mca,invisible=c("ind","quali.sup","quanti.sup"),cex=0.8)

Code

plot(res.mca,invisible=c("quali.sup","quanti.sup"),cex=0.8)

plotellipses() draw confidence ellipses around the categories

Code

plotellipses(res.mca,keepvar=1:4)

Code

plotellipses(res.mca,keepvar="Tea")

Detrended Correspondence Analysis (DCA)

Data

This exercise we will use varespec data set from {vegan} package. This data frame has 24 rows and 44 columns. Columns are estimated cover values of 44 species.

Code

data(varespec)
glimpse(varespec)

Rows: 24
Columns: 44
$ Callvulg <dbl> 0.55, 0.67, 0.10, 0.00, 0.00, 0.00, 4.73, 4.47, 0.00, 24.13, …
$ Empenigr <dbl> 11.13, 0.17, 1.55, 15.13, 12.68, 8.92, 5.12, 7.33, 1.63, 1.90…
$ Rhodtome <dbl> 0.00, 0.00, 0.00, 2.42, 0.00, 0.00, 1.55, 0.00, 0.35, 0.07, 0…
$ Vaccmyrt <dbl> 0.00, 0.35, 0.00, 5.92, 0.00, 2.42, 6.05, 2.15, 18.27, 0.22, …
$ Vaccviti <dbl> 17.80, 12.13, 13.47, 15.97, 23.73, 10.28, 12.40, 4.33, 7.13, …
$ Pinusylv <dbl> 0.07, 0.12, 0.25, 0.00, 0.03, 0.12, 0.10, 0.10, 0.05, 0.12, 0…
$ Descflex <dbl> 0.00, 0.00, 0.00, 3.70, 0.00, 0.02, 0.78, 0.00, 0.40, 0.00, 0…
$ Betupube <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.02, 0.00, 0.00, 0.00, 0…
$ Vacculig <dbl> 1.60, 0.00, 0.00, 1.12, 0.00, 0.00, 2.00, 0.00, 0.20, 0.00, 0…
$ Diphcomp <dbl> 2.07, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.07, 0…
$ Dicrsp   <dbl> 0.00, 0.33, 23.43, 0.00, 0.00, 0.00, 0.03, 1.02, 0.30, 0.02, …
$ Dicrfusc <dbl> 1.62, 10.92, 0.00, 3.63, 3.42, 0.32, 37.07, 25.80, 0.52, 2.50…
$ Dicrpoly <dbl> 0.00, 0.02, 1.68, 0.00, 0.02, 0.02, 0.00, 0.23, 0.20, 0.00, 0…
$ Hylosple <dbl> 0.00, 0.00, 0.00, 6.70, 0.00, 0.00, 0.00, 0.00, 9.97, 0.00, 0…
$ Pleuschr <dbl> 4.67, 37.75, 32.92, 58.07, 19.42, 21.03, 26.38, 18.98, 70.03,…
$ Polypili <dbl> 0.02, 0.02, 0.00, 0.00, 0.02, 0.02, 0.00, 0.00, 0.00, 0.00, 0…
$ Polyjuni <dbl> 0.13, 0.23, 0.23, 0.00, 2.12, 1.58, 0.00, 0.02, 0.08, 0.02, 0…
$ Polycomm <dbl> 0.00, 0.00, 0.00, 0.13, 0.00, 0.18, 0.00, 0.00, 0.00, 0.00, 0…
$ Pohlnuta <dbl> 0.13, 0.03, 0.32, 0.02, 0.17, 0.07, 0.10, 0.13, 0.07, 0.03, 0…
$ Ptilcili <dbl> 0.12, 0.02, 0.03, 0.08, 1.80, 0.27, 0.03, 0.10, 0.03, 0.25, 0…
$ Barbhatc <dbl> 0.00, 0.00, 0.00, 0.08, 0.02, 0.02, 0.00, 0.00, 0.00, 0.07, 0…
$ Cladarbu <dbl> 21.73, 12.05, 3.58, 1.42, 9.08, 7.23, 6.10, 7.13, 0.17, 23.07…
$ Cladrang <dbl> 21.47, 8.13, 5.52, 7.63, 9.22, 4.95, 3.60, 14.03, 0.87, 23.67…
$ Cladstel <dbl> 3.50, 0.18, 0.07, 2.55, 0.05, 22.08, 0.23, 0.02, 0.00, 11.90,…
$ Cladunci <dbl> 0.30, 2.65, 8.93, 0.15, 0.73, 0.25, 2.38, 0.82, 0.05, 0.95, 2…
$ Cladcocc <dbl> 0.18, 0.13, 0.00, 0.00, 0.08, 0.10, 0.17, 0.15, 0.02, 0.17, 0…
$ Cladcorn <dbl> 0.23, 0.18, 0.20, 0.38, 1.42, 0.25, 0.13, 0.05, 0.03, 0.05, 0…
$ Cladgrac <dbl> 0.25, 0.23, 0.48, 0.12, 0.50, 0.18, 0.18, 0.22, 0.07, 0.23, 0…
$ Cladfimb <dbl> 0.25, 0.25, 0.00, 0.10, 0.17, 0.10, 0.20, 0.22, 0.10, 0.18, 0…
$ Cladcris <dbl> 0.23, 1.23, 0.07, 0.03, 1.78, 0.12, 0.20, 0.17, 0.02, 0.57, 0…
$ Cladchlo <dbl> 0.00, 0.00, 0.10, 0.00, 0.05, 0.05, 0.02, 0.00, 0.00, 0.02, 0…
$ Cladbotr <dbl> 0.00, 0.00, 0.02, 0.02, 0.05, 0.02, 0.00, 0.00, 0.02, 0.07, 0…
$ Cladamau <dbl> 0.08, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
$ Cladsp   <dbl> 0.02, 0.00, 0.00, 0.02, 0.00, 0.00, 0.02, 0.02, 0.00, 0.07, 0…
$ Cetreric <dbl> 0.02, 0.15, 0.78, 0.00, 0.00, 0.00, 0.02, 0.18, 0.00, 0.18, 0…
$ Cetrisla <dbl> 0.00, 0.03, 0.12, 0.00, 0.00, 0.00, 0.00, 0.08, 0.02, 0.02, 0…
$ Flavniva <dbl> 0.12, 0.00, 0.00, 0.00, 0.02, 0.02, 0.00, 0.00, 0.00, 0.00, 0…
$ Nepharct <dbl> 0.02, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
$ Stersp   <dbl> 0.62, 0.85, 0.03, 0.00, 1.58, 0.28, 0.00, 0.03, 0.02, 0.03, 0…
$ Peltapht <dbl> 0.02, 0.00, 0.00, 0.07, 0.33, 0.00, 0.00, 0.00, 0.00, 0.02, 0…
$ Icmaeric <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.07, 0.00, 0.00, 0…
$ Cladcerv <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
$ Claddefo <dbl> 0.25, 1.00, 0.33, 0.15, 1.97, 0.37, 0.15, 0.67, 0.08, 0.47, 1…
$ Cladphyl <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…

Correspondence analysis/reciprocal averaging on a data set

The decorana() function from the {vegan} package is used to perform Detrended Correspondence Analysis (DCA) in R. The decorana() function takes a data frame with species abundance data as input and returns an object of class decorana that contains the results of the analysis. The argument ira is an integer specifying the detrending method to be used. The default is 1, which corresponds to basic Correspondence analysis or reciprocal averaging. The ira= 0 corresponds to detrending.

Code

df.ra <- decorana(varespec,  ira=1 )
df.ra


Call:
decorana(veg = varespec, ira = 1) 

Orthogonal correspondence analysis.
Total inertia (scaled Chi-square): 2.0832 

                RA1    RA2    RA3    RA4
Eigenvalues  0.5249 0.3568 0.2344 0.1955
Axis lengths 3.3100 2.5227 2.8261 2.0362

Detrended Correspondence analysis of transformed data

Perform a detrended correspondence analysis on a data set, preceded by transformations to correct for differences in sample size and differences in abundance among taxa. decostan() function from the {vegan} package is used to perform data transformation. The decostand() function standardizes a data frame of species abundances, and the decorana() function performs Detrended Correspondence Analysis (DCA) on the transformed data.

Code

# percent transformation on samples and  “total” method defaults to rows (samples)
df.01<-decostand(varespec, "total")
df.02<-decostand(varespec, "max")

default ira=0 specifies detrending, and iweigh=1 specifies down-weighting of rare species.

Code

dca.01<- decorana(df.02, ira = 0)
dca.02<- decorana(df.02, iweigh=1)

Code

names(dca.01)

 [1] "evals"          "rproj"          "cproj"          "adotj"         
 [5] "aidot"          "ira"            "iresc"          "short"         
 [9] "mk"             "totchi"         "evals.ortho"    "evals.decorana"
[13] "origin"         "v"              "fraction"       "iweigh"        
[17] "before"         "after"          "call"

Scores of the ordination axes

Code

scores(dca.01)

          DCA1        DCA2        DCA3        DCA4
18 -0.64961840 -0.35674869  0.26578630  0.30831095
15 -0.11107423 -0.52475183  0.12054413  0.36957038
24  0.35121637  0.30182538 -1.14587054 -0.30428368
27  1.26661878 -0.36661825  0.31300757 -0.25658940
23  0.13252131 -0.23903603  0.41785586 -0.05485219
19  0.40282848  0.17774464  0.34444513 -0.09783870
22  0.38687451 -0.43317541 -0.01889923  0.39147237
16 -0.19513963 -0.55059491 -0.08182289  0.39313298
28  1.55181422 -0.76664865 -0.03070901 -0.28557629
13 -0.43471986 -0.07336440 -0.11879538  0.93468092
14 -0.37843446 -0.43654956 -0.23112771  0.28657468
20  0.03800399 -0.28164315 -0.12848913  0.04470122
25  0.32548334  0.42495924  0.90899670 -0.33553365
7  -0.61009749 -0.49333283  0.09923166  0.14791463
5  -1.39297993 -1.25738006  0.35705383 -0.15286422
6  -0.55116794 -0.18796847 -0.02603715  0.38847845
3  -0.68162655 -0.07330327  0.06635126  0.13690565
4  -1.33732659  0.88591230  0.09991692  0.24952792
2  -0.57778664  1.05127966  0.23381163 -0.31577517
9  -0.18510720  0.68380249 -0.14619030 -0.21542622
12  0.10884491  0.50951772 -0.50889876 -0.29672863
10 -0.10586258  0.66427464 -0.19314012 -0.44191795
11 -0.38895670 -0.02026841  0.01391985 -0.94450797
21  1.34045737  0.63025073 -0.25848655  0.25834458

Plot the results of DCA

Code

plot(dca.01, display= c("sites"), cols=c(1,2), 
     pch=3, col="green")
text(dca.01, display=c("species"), choices=1:2,
 cex=0.7)

Canonical Correspondence Analysis (CCA)

Data

In this exercise we will use the varespec and varechem data sets from the {vegan} package to perform Canonical Correspondence Analysis (CCA). The 44 columns are estimated cover values of 44 species. The varechem data frame has 24 rows and 14 columns, giving the soil characteristics of the very same sites as in the varespec data frame.

Code

library(vegan)
data(varespec) 
glimpse(varespec)

Rows: 24
Columns: 44
$ Callvulg <dbl> 0.55, 0.67, 0.10, 0.00, 0.00, 0.00, 4.73, 4.47, 0.00, 24.13, …
$ Empenigr <dbl> 11.13, 0.17, 1.55, 15.13, 12.68, 8.92, 5.12, 7.33, 1.63, 1.90…
$ Rhodtome <dbl> 0.00, 0.00, 0.00, 2.42, 0.00, 0.00, 1.55, 0.00, 0.35, 0.07, 0…
$ Vaccmyrt <dbl> 0.00, 0.35, 0.00, 5.92, 0.00, 2.42, 6.05, 2.15, 18.27, 0.22, …
$ Vaccviti <dbl> 17.80, 12.13, 13.47, 15.97, 23.73, 10.28, 12.40, 4.33, 7.13, …
$ Pinusylv <dbl> 0.07, 0.12, 0.25, 0.00, 0.03, 0.12, 0.10, 0.10, 0.05, 0.12, 0…
$ Descflex <dbl> 0.00, 0.00, 0.00, 3.70, 0.00, 0.02, 0.78, 0.00, 0.40, 0.00, 0…
$ Betupube <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.02, 0.00, 0.00, 0.00, 0…
$ Vacculig <dbl> 1.60, 0.00, 0.00, 1.12, 0.00, 0.00, 2.00, 0.00, 0.20, 0.00, 0…
$ Diphcomp <dbl> 2.07, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.07, 0…
$ Dicrsp   <dbl> 0.00, 0.33, 23.43, 0.00, 0.00, 0.00, 0.03, 1.02, 0.30, 0.02, …
$ Dicrfusc <dbl> 1.62, 10.92, 0.00, 3.63, 3.42, 0.32, 37.07, 25.80, 0.52, 2.50…
$ Dicrpoly <dbl> 0.00, 0.02, 1.68, 0.00, 0.02, 0.02, 0.00, 0.23, 0.20, 0.00, 0…
$ Hylosple <dbl> 0.00, 0.00, 0.00, 6.70, 0.00, 0.00, 0.00, 0.00, 9.97, 0.00, 0…
$ Pleuschr <dbl> 4.67, 37.75, 32.92, 58.07, 19.42, 21.03, 26.38, 18.98, 70.03,…
$ Polypili <dbl> 0.02, 0.02, 0.00, 0.00, 0.02, 0.02, 0.00, 0.00, 0.00, 0.00, 0…
$ Polyjuni <dbl> 0.13, 0.23, 0.23, 0.00, 2.12, 1.58, 0.00, 0.02, 0.08, 0.02, 0…
$ Polycomm <dbl> 0.00, 0.00, 0.00, 0.13, 0.00, 0.18, 0.00, 0.00, 0.00, 0.00, 0…
$ Pohlnuta <dbl> 0.13, 0.03, 0.32, 0.02, 0.17, 0.07, 0.10, 0.13, 0.07, 0.03, 0…
$ Ptilcili <dbl> 0.12, 0.02, 0.03, 0.08, 1.80, 0.27, 0.03, 0.10, 0.03, 0.25, 0…
$ Barbhatc <dbl> 0.00, 0.00, 0.00, 0.08, 0.02, 0.02, 0.00, 0.00, 0.00, 0.07, 0…
$ Cladarbu <dbl> 21.73, 12.05, 3.58, 1.42, 9.08, 7.23, 6.10, 7.13, 0.17, 23.07…
$ Cladrang <dbl> 21.47, 8.13, 5.52, 7.63, 9.22, 4.95, 3.60, 14.03, 0.87, 23.67…
$ Cladstel <dbl> 3.50, 0.18, 0.07, 2.55, 0.05, 22.08, 0.23, 0.02, 0.00, 11.90,…
$ Cladunci <dbl> 0.30, 2.65, 8.93, 0.15, 0.73, 0.25, 2.38, 0.82, 0.05, 0.95, 2…
$ Cladcocc <dbl> 0.18, 0.13, 0.00, 0.00, 0.08, 0.10, 0.17, 0.15, 0.02, 0.17, 0…
$ Cladcorn <dbl> 0.23, 0.18, 0.20, 0.38, 1.42, 0.25, 0.13, 0.05, 0.03, 0.05, 0…
$ Cladgrac <dbl> 0.25, 0.23, 0.48, 0.12, 0.50, 0.18, 0.18, 0.22, 0.07, 0.23, 0…
$ Cladfimb <dbl> 0.25, 0.25, 0.00, 0.10, 0.17, 0.10, 0.20, 0.22, 0.10, 0.18, 0…
$ Cladcris <dbl> 0.23, 1.23, 0.07, 0.03, 1.78, 0.12, 0.20, 0.17, 0.02, 0.57, 0…
$ Cladchlo <dbl> 0.00, 0.00, 0.10, 0.00, 0.05, 0.05, 0.02, 0.00, 0.00, 0.02, 0…
$ Cladbotr <dbl> 0.00, 0.00, 0.02, 0.02, 0.05, 0.02, 0.00, 0.00, 0.02, 0.07, 0…
$ Cladamau <dbl> 0.08, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
$ Cladsp   <dbl> 0.02, 0.00, 0.00, 0.02, 0.00, 0.00, 0.02, 0.02, 0.00, 0.07, 0…
$ Cetreric <dbl> 0.02, 0.15, 0.78, 0.00, 0.00, 0.00, 0.02, 0.18, 0.00, 0.18, 0…
$ Cetrisla <dbl> 0.00, 0.03, 0.12, 0.00, 0.00, 0.00, 0.00, 0.08, 0.02, 0.02, 0…
$ Flavniva <dbl> 0.12, 0.00, 0.00, 0.00, 0.02, 0.02, 0.00, 0.00, 0.00, 0.00, 0…
$ Nepharct <dbl> 0.02, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
$ Stersp   <dbl> 0.62, 0.85, 0.03, 0.00, 1.58, 0.28, 0.00, 0.03, 0.02, 0.03, 0…
$ Peltapht <dbl> 0.02, 0.00, 0.00, 0.07, 0.33, 0.00, 0.00, 0.00, 0.00, 0.02, 0…
$ Icmaeric <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.07, 0.00, 0.00, 0…
$ Cladcerv <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
$ Claddefo <dbl> 0.25, 1.00, 0.33, 0.15, 1.97, 0.37, 0.15, 0.67, 0.08, 0.47, 1…
$ Cladphyl <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…

Code

data(varechem)
glimpse(varechem)

Rows: 24
Columns: 14
$ N        <dbl> 19.8, 13.4, 20.2, 20.6, 23.8, 22.8, 26.6, 24.2, 29.8, 28.1, 2…
$ P        <dbl> 42.1, 39.1, 67.7, 60.8, 54.5, 40.9, 36.7, 31.0, 73.5, 40.5, 3…
$ K        <dbl> 139.9, 167.3, 207.1, 233.7, 180.6, 171.4, 171.4, 138.2, 260.0…
$ Ca       <dbl> 519.4, 356.7, 973.3, 834.0, 777.0, 691.8, 738.6, 394.6, 748.6…
$ Mg       <dbl> 90.0, 70.7, 209.1, 127.2, 125.8, 151.4, 94.9, 45.3, 105.3, 11…
$ S        <dbl> 32.3, 35.2, 58.1, 40.7, 39.5, 40.8, 33.8, 27.1, 42.5, 60.2, 3…
$ Al       <dbl> 39.0, 88.1, 138.0, 15.4, 24.2, 104.8, 20.7, 74.2, 17.9, 329.7…
$ Fe       <dbl> 40.9, 39.0, 35.4, 4.4, 3.0, 17.6, 2.5, 9.8, 2.4, 109.9, 4.6, …
$ Mn       <dbl> 58.1, 52.4, 32.1, 132.0, 50.1, 43.6, 77.6, 24.4, 106.6, 61.7,…
$ Zn       <dbl> 4.5, 5.4, 16.8, 10.7, 6.6, 9.1, 7.4, 5.2, 9.3, 9.1, 8.1, 10.2…
$ Mo       <dbl> 0.30, 0.30, 0.80, 0.20, 0.30, 0.40, 0.30, 0.30, 0.30, 0.50, 0…
$ Baresoil <dbl> 43.90, 23.60, 21.20, 18.70, 46.00, 40.50, 23.00, 29.80, 17.60…
$ Humdepth <dbl> 2.2, 2.2, 2.0, 2.9, 3.0, 3.8, 2.8, 2.0, 3.0, 2.2, 2.7, 2.5, 2…
$ pH       <dbl> 2.7, 2.8, 3.0, 2.8, 2.7, 2.7, 2.8, 2.8, 2.8, 2.8, 2.7, 2.9, 2…

Fit CCA Model

Function cca() function of {vegan} performs performs Canonical Correspondence analysis, or optionally constrained correspondence analysis), or optionally partial constrained correspondence analysis. Function rda() performs redundancy analysis, or optionally principal components analysis. These are all very popular ordination techniques in community ecology. X is the species data (varespec), Y is the environmental data (varechem), and the data argument is the data frame containing the species and environmental data.

We can conduct the CCA in two ways. First, we can identify the response and explanatory objects separately:

Code

## environmental data matrix
vare.cca.01 <- cca(X=varespec, Y=varechem)
vare.cca.01

Call: cca(X = varespec, Y = varechem)

-- Model Summary --

              Inertia Proportion Rank
Total          2.0832     1.0000     
Constrained    1.4415     0.6920   14
Unconstrained  0.6417     0.3080    9

Inertia is scaled Chi-square

-- Eigenvalues --

Eigenvalues for constrained axes:
  CCA1   CCA2   CCA3   CCA4   CCA5   CCA6   CCA7   CCA8   CCA9  CCA10  CCA11 
0.4389 0.2918 0.1628 0.1421 0.1180 0.0890 0.0703 0.0584 0.0311 0.0133 0.0084 
 CCA12  CCA13  CCA14 
0.0065 0.0062 0.0047 

Eigenvalues for unconstrained axes:
    CA1     CA2     CA3     CA4     CA5     CA6     CA7     CA8     CA9 
0.19776 0.14193 0.10117 0.07079 0.05330 0.03330 0.01887 0.01510 0.00949

Code

plot(vare.cca.01)

Second, we can specify a formula relating the response to the explanatory variables:

Code

## Formula interface and a better model
vare.cca.02 <- cca(varespec ~ Al + P*(K + Baresoil), data=varechem)
vare.cca.02

Call: cca(formula = varespec ~ Al + P * (K + Baresoil), data = varechem)

-- Model Summary --

              Inertia Proportion Rank
Total           2.083      1.000     
Constrained     1.046      0.502    6
Unconstrained   1.038      0.498   17

Inertia is scaled Chi-square

-- Eigenvalues --

Eigenvalues for constrained axes:
  CCA1   CCA2   CCA3   CCA4   CCA5   CCA6 
0.3756 0.2342 0.1407 0.1323 0.1068 0.0561 

Eigenvalues for unconstrained axes:
    CA1     CA2     CA3     CA4     CA5     CA6     CA7     CA8 
0.27577 0.15411 0.13536 0.11803 0.08887 0.05511 0.04919 0.03781 
(Showing 8 of 17 unconstrained eigenvalues)

Code

plot(vare.cca.02)

Code

## Partialling out and negative components of variance
vare.cca.03<-cca(varespec ~ Ca + Condition(pH), varechem)
vare.cca.03

Call: cca(formula = varespec ~ Ca + Condition(pH), data = varechem)

-- Model Summary --

              Inertia Proportion Rank
Total          2.0832     1.0000     
Conditional    0.1458     0.0700    1
Constrained    0.1827     0.0877    1
Unconstrained  1.7547     0.8423   21

Inertia is scaled Chi-square

-- Eigenvalues --

Eigenvalues for constrained axes:
   CCA1 
0.18269 

Eigenvalues for unconstrained axes:
   CA1    CA2    CA3    CA4    CA5    CA6    CA7    CA8 
0.3834 0.2749 0.2123 0.1760 0.1701 0.1161 0.1089 0.0880 
(Showing 8 of 21 unconstrained eigenvalues)

Code

plot(vare.cca.03)

Canonical Correlation Analysis (CCA)

Data

In this exercise we will use following data set.

gp_soil_data.csv

We will use read_csv() function of {readr} package to import data as a tidy data from my github repository. The glimpse() function from {dplyr} package can be used to get a quick overview of the data

Code

mf<-readr::read_csv("https://github.com/zia207/r-colab/raw/main/Data/Regression_analysis/gp_soil_data.csv")
glimpse(mf)

Rows: 467
Columns: 19
$ ID        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ FIPS      <dbl> 56041, 56023, 56039, 56039, 56029, 56039, 56039, 56039, 5603…
$ STATE_ID  <dbl> 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, …
$ STATE     <chr> "Wyoming", "Wyoming", "Wyoming", "Wyoming", "Wyoming", "Wyom…
$ COUNTY    <chr> "Uinta County", "Lincoln County", "Teton County", "Teton Cou…
$ Longitude <dbl> -111.0119, -110.9830, -110.8065, -110.7344, -110.7308, -110.…
$ Latitude  <dbl> 41.05630, 42.88350, 44.53497, 44.43289, 44.80635, 44.09124, …
$ SOC       <dbl> 15.763, 15.883, 18.142, 10.745, 10.479, 16.987, 24.954, 6.28…
$ DEM       <dbl> 2229.079, 1889.400, 2423.048, 2484.283, 2396.195, 2360.573, …
$ Aspect    <dbl> 159.1877, 156.8786, 168.6124, 198.3536, 201.3215, 208.9732, …
$ Slope     <dbl> 5.6716146, 8.9138117, 4.7748051, 7.1218114, 7.9498644, 9.663…
$ TPI       <dbl> -0.08572358, 4.55913162, 2.60588670, 5.14693117, 3.75570583,…
$ KFactor   <dbl> 0.31999999, 0.26121211, 0.21619999, 0.18166667, 0.12551020, …
$ MAP       <dbl> 468.3245, 536.3522, 859.5509, 869.4724, 802.9743, 1121.2744,…
$ MAT       <dbl> 4.5951686, 3.8599243, 0.8855000, 0.4707811, 0.7588266, 1.358…
$ NDVI      <dbl> 0.4139390, 0.6939532, 0.5466033, 0.6191013, 0.5844722, 0.602…
$ SiltClay  <dbl> 64.84270, 72.00455, 57.18700, 54.99166, 51.22857, 45.02000, …
$ NLCD      <chr> "Shrubland", "Shrubland", "Forest", "Forest", "Forest", "For…
$ FRG       <chr> "Fire Regime Group IV", "Fire Regime Group IV", "Fire Regime…

Preparing Two Datasets for Canonical Correlation Analysis (CCA)

We will split the the data into two high-dimensional datasets and We will also scale the variables to put them on the same scale.

X: Data with train variables: DEM, Aspect, Slope, and TP1
Y: MAT, MAP, NDVI

In R, scale() is generic function whose default method centers and/or scales the columns of a numeric matrix.

Code

# Create  data-frames
X<-mf %>% dplyr::select( DEM, Aspect, Slope,TPI) %>%
  scale()
Y<-mf %>% dplyr::select(MAT, MAP, NDVI) %>%
  scale()

Compute canonical correlations

{CCA} package Provides a set of functions that extend the cancor() function with new numerical and graphical outputs. It also include a regularized extension of the canonical correlation analysis to deal with datasets with more variables than observations.

Code

cc <- cancor(X,Y)
str(cc)

List of 5
 $ cor    : num [1:3] 0.857 0.553 0.183
 $ xcoef  : num [1:4, 1:4] 0.05263 -0.00251 -0.00916 -0.00308 0.03601 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:4] "DEM" "Aspect" "Slope" "TPI"
  .. ..$ : NULL
 $ ycoef  : num [1:3, 1:3] -0.04243 -0.01461 -0.00525 0.01922 -0.03444 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "MAT" "MAP" "NDVI"
  .. ..$ : NULL
 $ xcenter: Named num [1:4] 5.06e-17 2.66e-16 1.00e-17 -6.50e-18
  ..- attr(*, "names")= chr [1:4] "DEM" "Aspect" "Slope" "TPI"
 $ ycenter: Named num [1:3] -2.10e-16 5.66e-17 1.46e-16
  ..- attr(*, "names")= chr [1:3] "MAT" "MAP" "NDVI"

cancor() function returns a list containing the correlation between the variables and the coefficients

Code

str(cc)

List of 5
 $ cor    : num [1:3] 0.857 0.553 0.183
 $ xcoef  : num [1:4, 1:4] 0.05263 -0.00251 -0.00916 -0.00308 0.03601 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:4] "DEM" "Aspect" "Slope" "TPI"
  .. ..$ : NULL
 $ ycoef  : num [1:3, 1:3] -0.04243 -0.01461 -0.00525 0.01922 -0.03444 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "MAT" "MAP" "NDVI"
  .. ..$ : NULL
 $ xcenter: Named num [1:4] 5.06e-17 2.66e-16 1.00e-17 -6.50e-18
  ..- attr(*, "names")= chr [1:4] "DEM" "Aspect" "Slope" "TPI"
 $ ycenter: Named num [1:3] -2.10e-16 5.66e-17 1.46e-16
  ..- attr(*, "names")= chr [1:3] "MAT" "MAP" "NDVI"

Code

cc$xcoef

               [,1]         [,2]        [,3]         [,4]
DEM     0.052632132  0.036014771  0.01251337 -0.005520457
Aspect -0.002505462 -0.008496253  0.03772102  0.028805905
Slope  -0.009159251 -0.059569664 -0.02733187 -0.002101725
TPI    -0.003082211 -0.010559004  0.02497863 -0.037444837

Code

cc$ycoef

             [,1]         [,2]        [,3]
MAT  -0.042429859  0.019223196 -0.02211589
MAP  -0.014608094 -0.034441590  0.07607734
NDVI -0.005251573 -0.009221375 -0.08590261

Interpretation the results of a CCA

In order to interpret the results of a CCA, it is important to look at both the canonical correlations and the canonical variables. The canonical correlations indicate how strongly the two sets of variables are related, while the canonical variables show which variables in each set are most strongly related to each other.

Correlations between the canonical variate

Code

cc$cor

[1] 0.8573999 0.5530778 0.1828072

The correlation between the first canonical variates from these two data is pretty high, suggesting that both the data sets have strong covariation.

Get the Canonical Covariate Pairs

Code

CC1_X <- as.matrix(X) %*%  cc$xcoef[, 1]
CC1_Y <- as.matrix(Y) %*%  cc$ycoef[, 1]
cor(CC1_X,CC1_Y)

          [,1]
[1,] 0.8573999

Code

CC2_X <- as.matrix(X) %*%  cc$xcoef[, 2]
CC2_Y <- as.matrix(Y) %*%  cc$ycoef[, 2]
cor(CC2_X,CC2_Y)

          [,1]
[1,] 0.5530778

Create a dataframe canonical covariates

Code

cca_df <- mf  |> 
  mutate(CC1_X=CC1_X,
         CC1_Y=CC1_Y,
         CC2_X=CC2_X,
         CC2_Y=CC2_Y)  |> 
  glimpse()

Rows: 467
Columns: 23
$ ID        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ FIPS      <dbl> 56041, 56023, 56039, 56039, 56029, 56039, 56039, 56039, 5603…
$ STATE_ID  <dbl> 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, …
$ STATE     <chr> "Wyoming", "Wyoming", "Wyoming", "Wyoming", "Wyoming", "Wyom…
$ COUNTY    <chr> "Uinta County", "Lincoln County", "Teton County", "Teton Cou…
$ Longitude <dbl> -111.0119, -110.9830, -110.8065, -110.7344, -110.7308, -110.…
$ Latitude  <dbl> 41.05630, 42.88350, 44.53497, 44.43289, 44.80635, 44.09124, …
$ SOC       <dbl> 15.763, 15.883, 18.142, 10.745, 10.479, 16.987, 24.954, 6.28…
$ DEM       <dbl> 2229.079, 1889.400, 2423.048, 2484.283, 2396.195, 2360.573, …
$ Aspect    <dbl> 159.1877, 156.8786, 168.6124, 198.3536, 201.3215, 208.9732, …
$ Slope     <dbl> 5.6716146, 8.9138117, 4.7748051, 7.1218114, 7.9498644, 9.663…
$ TPI       <dbl> -0.08572358, 4.55913162, 2.60588670, 5.14693117, 3.75570583,…
$ KFactor   <dbl> 0.31999999, 0.26121211, 0.21619999, 0.18166667, 0.12551020, …
$ MAP       <dbl> 468.3245, 536.3522, 859.5509, 869.4724, 802.9743, 1121.2744,…
$ MAT       <dbl> 4.5951686, 3.8599243, 0.8855000, 0.4707811, 0.7588266, 1.358…
$ NDVI      <dbl> 0.4139390, 0.6939532, 0.5466033, 0.6191013, 0.5844722, 0.602…
$ SiltClay  <dbl> 64.84270, 72.00455, 57.18700, 54.99166, 51.22857, 45.02000, …
$ NLCD      <chr> "Shrubland", "Shrubland", "Forest", "Forest", "Forest", "For…
$ FRG       <chr> "Fire Regime Group IV", "Fire Regime Group IV", "Fire Regime…
$ CC1_X     <dbl[,1]> <matrix[26 x 1]>
$ CC1_Y     <dbl[,1]> <matrix[26 x 1]>
$ CC2_X     <dbl[,1]> <matrix[26 x 1]>
$ CC2_Y     <dbl[,1]> <matrix[26 x 1]>

Scatter plot between the first pair of canonical covariate

Code

cca_df %>% 
  ggplot(aes(x=CC1_X,y=CC1_Y, color=NLCD))+
  geom_point()

To see if each of canonical variate is correlated with NLCD, you create boxplots between two canonical covariates and NLCD.

Code

# First Canonical Variate of X vs Latent Variable
p1<-cca_df %>% 
  ggplot(aes(x=NLCD,y=CC1_X, color=NLCD))+
  geom_boxplot(width=0.5)+
  geom_jitter(width=0.15)+
  theme(legend.position="none")+
  ggtitle("First Canonical Variate of X vs NLCD") 

# First Canonical Variate of Y vs Latent Variable
p2<-cca_df %>% 
  ggplot(aes(x=NLCD,y=CC1_Y, color=NLCD))+
  geom_boxplot(width=0.5)+
  geom_jitter(width=0.15)+
  theme(legend.position="none")+
  ggtitle("First Canonical Variate of Y vs NLCD")

Code

library(patchwork)
p1+p2

Summary and Conclusion

This tutorial provides an overview of multivariate statistical techniques, including Correspondence Analysis (CA), Detrended Correspondence Analysis (DCA), Canonical Correspondence Analysis (CCA), and Canonical Correlation Analysis (CCA). These techniques are widely used in statistics, ecology, and machine learning for multivariate data analysis. We have also demonstrated how to perform these analyses in R using the {FactoMineR}, {vegan}, {ca}, and {CCA} packages. These techniques are useful for exploring relationships between categorical variables, identifying patterns in survey data, and analyzing the relationship between species and environmental variables in ecological studies.

3. Correspondence Analysis (CA)

Overview

Simple Correspondence Analysis (SCA)

Multiple Correspondence Analysis (MCA)

Detrended Correspondence Analysis (DCA)

Canonical Correspondence Analysis (CCA)

Canonical Correlation Analysis (CCA)

Summary Table

Performing Correspondence Analysis Scratch

Simple Correspondence Analysis (SCA)

Modifications for MCA, DCA, CCA, and Canonical Correlation

MCA

DCA

CCA and Canonical Correlation

Example Summary and Visualization

Results:

Plot Example:

Performing Correspondance Analysis in R

Install Required R Packages

Load Packages

Simple Correspondence Analysis (SCA)

Data

SCA using {FactoMineR} Package

Extract eigenvalues

Get Row and Column Profiles

Visualize Correspondence Analysis

SCA using the {adea4} package

Extract and Visualize the Scores of dimensions

Biolot of Row and Column Categories

SCA using the {vegan} package

Extract Eigenvalues

Extract and Visualize the Scores of dimensions

Multiple Correspondence Analysis (MCA)

Data

Fit Multiple Correspondence Analysis (MCA)

Plot the results of MCA

Detrended Correspondence Analysis (DCA)

Data

Correspondence analysis/reciprocal averaging on a data set

Detrended Correspondence analysis of transformed data

Scores of the ordination axes

Plot the results of DCA

Canonical Correspondence Analysis (CCA)

Data

Fit CCA Model

Canonical Correlation Analysis (CCA)

Data

Preparing Two Datasets for Canonical Correlation Analysis (CCA)

Compute canonical correlations

Interpretation the results of a CCA

Correlations between the canonical variate

Get the Canonical Covariate Pairs

Create a dataframe canonical covariates

Scatter plot between the first pair of canonical covariate

Summary and Conclusion

References