Package 'easyCODA'

Title:	Compositional Data Analysis in Practice
Description:	Univariate and multivariate methods for compositional data analysis, based on logratios. The package implements the approach in the book Compositional Data Analysis in Practice by Michael Greenacre (2018), where accent is given to simple pairwise logratios. Selection can be made of logratios that account for a maximum percentage of logratio variance. Various multivariate analyses of logratios are included in the package.
Authors:	Michael Greenacre [aut, cre]
Maintainer:	Michael Greenacre <[email protected]>
License:	GPL
Version:	0.40.2
Built:	2025-03-26 04:17:16 UTC
Source:	https://github.com/cran/easyCODA

Help Index

Compositional Data Analysis in Practice
Amalgamation clustering of the parts of a compositional data matrix
Additive logratios
Compositional bar plot
Correspondence analysis
Bivariate confidence and data ellipses
Closure of rows of compositional data matrix
Centred logratios
Dataset: RomanCups
Dot plot
Dummy variable (indicator) coding
Find the best ALR transformation
Dataset: FishMorphology
Isometric logratio
Inverse of additive logratios
Inverse of centred logratios
Inverse of full set of amalgamation balances
All pairwise logratios
Total logratio variance
Logratio analysis
Principal component analysis
Plot the results of a correspondence analysis
Plot the results of a logratio analysis
Plot the results of a principal component analysis
Plot the results of a redundancy analysis
Pivot logratios
Redundancy analysis
Amalgamation (summed) logratio
Stepwise selection of logratios
Stepwise selection of pairwise logratios for generalized linear modelling
Dataset: TimeBudget
Variance of a vector of observations, dividing by n rather than n-1
Dataset: Vegetables
Ward clustering of a compositional data matrix

Compositional Data Analysis in Practice

Description

Univariate and multivariate methods for compositional data analysis, based on logratios. The package implements the approach in the book Compositional Data Analysis in Practice by Michael Greenacre (2018), where accent is given to simple pairwise logratios. Selection can be made of logratios that account for a maximum percentage of logratio variance. Various multivariate analyses of logratios are included in the package.

Details

The DESCRIPTION file:

Package:	easyCODA
Type:	Package
Version:	0.40.2
Date:	2024-08-25
Title:	Compositional Data Analysis in Practice
Authors@R:	person(given = "Michael", family = "Greenacre", role = c("aut", "cre"), email = "[email protected]")
Author:	Michael Greenacre [aut, cre]
Imports:	ca (>= 0.7), vegan (>= 2.3), ellipse (>= 0.4.1)
Maintainer:	Michael Greenacre <[email protected]>
Description:	Univariate and multivariate methods for compositional data analysis, based on logratios. The package implements the approach in the book Compositional Data Analysis in Practice by Michael Greenacre (2018), where accent is given to simple pairwise logratios. Selection can be made of logratios that account for a maximum percentage of logratio variance. Various multivariate analyses of logratios are included in the package.
License:	GPL
URL:	https://github.com/michaelgreenacre/CODAinPractice/
NeedsCompilation:	no
Packaged:	2024-08-27 14:25:12 UTC; rforge
Repository/R-Forge/Project:	easycoda
Repository/R-Forge/Revision:	53
Repository/R-Forge/DateTimeStamp:	2024-08-27 14:20:20
Date/Publication:	2024-08-27 15:00:05 UTC
Depends:	R (>= 2.10)
Repository:	https://michaelgreenacre.r-universe.dev
RemoteUrl:	https://github.com/cran/easyCODA
RemoteRef:	HEAD
RemoteSha:	3a9ea3953d8a6a7753c2c54298d93ef54558b3d2

Index of help topics:

ACLUST                  Amalgamation clustering of the parts of a
                        compositional data matrix
ALR                     Additive logratios
BAR                     Compositional bar plot
CA                      Correspondence analysis
CIplot_biv              Bivariate confidence and data ellipses
CLOSE                   Closure of rows of compositional data matrix
CLR                     Centred logratios
DOT                     Dot plot
DUMMY                   Dummy variable (indicator) coding
FINDALR                 Find the best ALR transformation
ILR                     Isometric logratio
LR                      All pairwise logratios
LR.VAR                  Total logratio variance
LRA                     Logratio analysis
PCA                     Principal component analysis
PLOT.CA                 Plot the results of a correspondence analysis
PLOT.LRA                Plot the results of a logratio analysis
PLOT.PCA                Plot the results of a principal component
                        analysis
PLOT.RDA                Plot the results of a redundancy analysis
PLR                     Pivot logratios
RDA                     Redundancy analysis
SLR                     Amalgamation (summed) logratio
STEP                    Stepwise selection of logratios
STEPR                   Stepwise selection of pairwise logratios for
                        generalized linear modelling
VAR                     Variance of a vector of observations, dividing
                        by n rather than n-1
WARD                    Ward clustering of a compositional data matrix
cups                    Dataset: RomanCups
easyCODA-package        Compositional Data Analysis in Practice
fish                    Dataset: FishMorphology
invALR                  Inverse of additive logratios
invCLR                  Inverse of centred logratios
invSLR                  Inverse of full set of amalgamation balances
time                    Dataset: TimeBudget
veg                     Dataset: Vegetables

Author(s)

Michael Greenacre [aut, cre]

Maintainer: Michael Greenacre <[email protected]>

References

Greenacre, Michael (2018) Compositional Data Analysis in Practice. Chapman & Hall / CRC Press

Examples

# Roman cups glass compositions
data("cups")
# unweighted logratio analysis
cups.ulra <- LRA(cups, weight=FALSE)
PLOT.LRA(cups.ulra)
# weighted logratio analysis
cups.wlra <- LRA(cups)
PLOT.LRA(cups.wlra)
# Roman cups glass compositions
data("cups")
# unweighted logratio analysis
cups.ulra <- LRA(cups, weight=FALSE)
PLOT.LRA(cups.ulra)
# weighted logratio analysis
cups.wlra <- LRA(cups)
PLOT.LRA(cups.wlra)

Amalgamation clustering of the parts of a compositional data matrix

Description

This function clusters the parts of a compositional data matrix, using amalgamation of the parts at each step.

Usage

ACLUST(data, weight = TRUE, close = TRUE)
ACLUST(data, weight = TRUE, close = TRUE)

Arguments

`data`	Compositional data matrix, with the parts as columns
`weight`	`TRUE` (default) for weighting using part averages of closed compositions, `FALSE` for unweighted analysis, or a vector of user-defined column weights
`close`	`TRUE` (default) will close the rows of `data` prior to clustering, `FALSE` leaves `data` as it is

Details

The function ACLUST performs amalgamation hierarchical clustering on the parts (columns) of a given compositional data matrix, as proposed by Greenacre (2019). At each step of the clustering two clusters are amalgamated that give the least loss of explained logratio variance.

Value

An object which describes the tree produced by the clustering process on the n objects. The object is a list with components:

`merge`	an n-1 by 2 matrix. Row i of `merge` describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in `merge` indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.
`height`	a set of n-1 real values (non-decreasing for ultrametric trees). The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.
`order`	a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches
`labels`	a vector of column labels, the column names of `data`

Author(s)

Michael Greenacre

References

Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC.
Greenacre, M. (2019), Amalgamations are valid in compositional data analysis, can be used in agglomerative clustering, and their logratios have an inverse transformation. Applied Computing and Geosciences, open access.

Examples

data(cups)

# amalgamation clustering    (weighted parts)
cups.aclust <- ACLUST(cups)
plot(cups.aclust)

# reproducing Figure 2(b) of Greenacre (2019) (unweighted parts))
# dataset Aar is in the compositions package
# aar is a subset of Aar
# code given here within the '\dontrun' environment since external package 'compositions' required
## Not run: 
  library(compositions)
  data(Aar)
  aar <- Aar[,c(3:12)]
  aar.aclust <- ACLUST(aar, weight=FALSE)
# the maximum height is the total variance
# convert to percents of variance NOT explained
  aar.aclust$height <- 100 * aar.aclust$height / max(aar.aclust$height)
  plot(aar.aclust, main="Parts of Unexplained Variance", ylab="Variance (percent)")

## End(Not run)
data(cups)

# amalgamation clustering    (weighted parts)
cups.aclust <- ACLUST(cups)
plot(cups.aclust)

# reproducing Figure 2(b) of Greenacre (2019) (unweighted parts))
# dataset Aar is in the compositions package
# aar is a subset of Aar
# code given here within the '\dontrun' environment since external package 'compositions' required
## Not run: 
  library(compositions)
  data(Aar)
  aar <- Aar[,c(3:12)]
  aar.aclust <- ACLUST(aar, weight=FALSE)
# the maximum height is the total variance
# convert to percents of variance NOT explained
  aar.aclust$height <- 100 * aar.aclust$height / max(aar.aclust$height)
  plot(aar.aclust, main="Parts of Unexplained Variance", ylab="Variance (percent)")

## End(Not run)

Additive logratios

Description

Computation of additive logratios (ALRs) with respect to a specified part.

Usage

ALR(data, denom=ncol(data), weight=TRUE, stats=FALSE)
ALR(data, denom=ncol(data), weight=TRUE, stats=FALSE)

Arguments

`data`	A compositional data frame or matrix
`denom`	Number of part used in the denominator
`weight`	Logical indicating if varying weights are returned(default:`TRUE`). If `FALSE`, unweighted (equal) weights are returned. Alternatively a set of positive weights can be specified.
`stats`	Logical indicating if means, variances and total variance of the ALRs are returned (default:`FALSE`)

Details

The function ALR computes a set of additive logratios (ALRs) with respect to a specified part (by default, the last part).

Value

`LR`	The additive logratios (ALRs)
`LR.wt`	The weights assigned to the ALRs
`denom`	The index of the denominator used in the computation of the ALRs
`part.names`	The part names in the data, i.e. column names
`part.wt`	The part weights
`means`	The means of the ALRs (only returned if `stats = TRUE`)
`vars`	The variances of the ALRs (only returned if `stats = TRUE`)
`totvar`	The total variance of the ALRs (only returned if `stats = TRUE`)

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
ALR(veg, denom=2)
data(veg)
ALR(veg, denom=2)

Compositional bar plot

Description

Horizontal bar plot of compositional data

Usage

BAR(data, cols=rainbow(ncol(data)), col.names=colnames(data),
    row.names=rownames(data), order.column=NA, eps=0.5, main="", ylab="",
    ylim=c(0,nrow(data)), xlim=c(0,100), cex=1, truncate=NA)
BAR(data, cols=rainbow(ncol(data)), col.names=colnames(data),
    row.names=rownames(data), order.column=NA, eps=0.5, main="", ylab="",
    ylim=c(0,nrow(data)), xlim=c(0,100), cex=1, truncate=NA)

Arguments

`data`	Compositional data matrix or data frame with compositions in rows, parts in columns
`cols`	Colours of points for each part, default rainbow
`col.names`	Part names, if modified
`row.names`	Sample names, if modified
`order.column`	By default parts are taken in order of columns, but can be re-ordered using this option
`eps`	Small space between bars, can be modified
`main`	Heading
`ylab`	Vertical axis label
`ylim`	Vertical axis limits (default is the number of rows in data)
`xlim`	Horizontal axis limits (default c(0,100))
`cex`	Character size scaling factor for labels
`truncate`	Truncate part (column) names to this number of characters for legend

Details

The function BAR makes a BAR plot for specified groups of points, which can be in columns of a matrix or data frame.

Author(s)

Michael Greenacre

References

Greenacre, M. (2016), Data reporting and visualization in ecology, Polar Biology: 39, 2189-2205.

Examples

# Vegetables data set: order samples by carbohydrates 
data(veg)
BAR(veg, order.column=2)
data(time)
# TimeBudget data set: put domestic work in first column and order by it
BAR(time[,c(2,1,3,4,5,6)], order.column=1, main="Time Budget")
# Vegetables data set: order samples by carbohydrates 
data(veg)
BAR(veg, order.column=2)
data(time)
# TimeBudget data set: put domestic work in first column and order by it
BAR(time[,c(2,1,3,4,5,6)], order.column=1, main="Time Budget")

Correspondence analysis

Description

Computation of correspondence analysis on a table of nonnegative data.

Usage

CA(data, nd = 2, suprow = NA, supcol = NA)
CA(data, nd = 2, suprow = NA, supcol = NA)

Arguments

`data`	A data frame or matrix of nonnegative data (no negative values)
`nd`	Number of dimensions for summary solution if not 2 (default)
`suprow`	Indices of rows that are supplementary points
`supcol`	Indices of columns that are supplementary points

Details

The function CA is a simple wrapper for the ca function in the ca package (Nenadic and Greenacre, 2007), for compatibility within the easyCODA package.

Supplementary rows and columns can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes.

The function borrows the structure and functions of the ca package, which is required, and produces a ca object, and the same print, summary and plot methods can be used, as for a ca object. It additionally exports the principal coordinates of both the rows and columns, not presently found in the ca package.

Value

`sv`	Singular values
`nd`	Number of dimensions in solution results
`rownames`	Row names
`rowmass`	Row weights
`rowdist`	Row logratio distances to centroid
`rowinertia`	Row inertias
`rowcoord`	Row standard coordinates
`rowpcoord`	Row principal coordinates
`rowsup`	Indices of row supplementary points
`colnames`	Column names
`colmass`	Column weights
`coldist`	Column logratio distances to centroid
`colinertia`	Column inertias
`colcoord`	Column standard coordinates
`colpcoord`	Column principal coordinates
`N`	The compositional data table

Author(s)

Michael Greenacre

References

Nenadic, O. and Greenacre, M. (2007). Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. Journal of Statistical Software, 20 (3), https://www.jstatsoft.org/v20/i03/

Examples

# CA of the Roman cups data (symmetric map)
data("cups")
PLOT.CA(CA(cups))
# CA of the Roman cups data (symmetric map)
data("cups")
PLOT.CA(CA(cups))

Bivariate confidence and data ellipses

Description

Draws confidence and data ellipses in bivariate scatterplots

Usage

CIplot_biv(x, y, group, wt=rep(1/length(x),length(x)), 
           varnames=c("x","y"), groupnames=sort(unique(group)), 
           groupcols=rainbow(length(unique(group))), 
           shownames=TRUE, xlim=c(NA,NA), ylim=c(NA,NA), 
           lty=1, lwd=1, add=FALSE, alpha=0.95, ellipse=0, 
           shade=FALSE, alpha.f=0.2, frac=0.01, cex=1) 
CIplot_biv(x, y, group, wt=rep(1/length(x),length(x)), 
           varnames=c("x","y"), groupnames=sort(unique(group)), 
           groupcols=rainbow(length(unique(group))), 
           shownames=TRUE, xlim=c(NA,NA), ylim=c(NA,NA), 
           lty=1, lwd=1, add=FALSE, alpha=0.95, ellipse=0, 
           shade=FALSE, alpha.f=0.2, frac=0.01, cex=1)

Arguments

`x`	x-variable (horizontal) of scatterplot
`y`	y-variable (vertical) of scatterplot
`group`	Grouping variable
`wt`	Set of weights on the cases (operates when ellipse=1)
`varnames`	Vector of two labels for the axes (default is x and y)
`groupnames`	Vector of labels for the groups (default is 1, 2, etc...)
`groupcols`	Vector of colours for the groups
`shownames`	Whether to show group names at group centroids or not (default is `TRUE`)
`xlim`	Possible new x-limits for plot
`ylim`	Possible new y-limits for plot
`lty`	Line type for the ellipses (default is 1)
`lwd`	Line width for the ellipses (default is 1)
`add`	=`TRUE` if ellipses/intervals are added to existing plot (default is `FALSE`)
`alpha`	Confidence level of ellipses (default is 0.95)
`ellipse`	Type of ellipse (see Details below; default is 0 for normal-based ellipses)
`shade`	=`TRUE` for ellipse shading (default=`FALSE`)
`alpha.f`	Shading fraction (default is 0.2)
`frac`	Proportional part defining the width of the bars at the edges of confidence intervals (for `ellipse=3` and `4`)
`cex`	Character expansion factor for group names

Details

The function CIplot_biv makes various types of confidence and data ellipses, according to option ellipse. Set ellipse<0 for regular data-covering ellipses. Set ellipse=0 (default) for normal-theory confidence ellipses. Setellipse=1 for bootstrap confidence ellipses. The option ellipse=2 for the delta method is not implemented yet. Set ellipse=3 for normal-theory confidence error bars lined up with axes. Set ellipse=4 for bootstrap confidence error bars along axes. The package ellipse is required.

Author(s)

Michael Greenacre

References

Greenacre, M. (2016), Data reporting and visualization in ecology, Polar Biology, 39:2189-2205.

Examples

# Generate some bivariate normal data in three groups with different means
# Means (1,0), (0,1) and (0,0)
means  <- matrix(c(1,0,0,1,0,0), ncol=3)
data   <- matrix(nrow=300, ncol=2)
groups <- sample(rep(c(1,2,3), 100))
for(i in 1:300) data[i,] <- rnorm(c(1,1), mean=means[,groups[i]])
# Plot confidence ellipses with shading
CIplot_biv(data[,1], data[,2], group=groups, shade=TRUE)
# Generate some bivariate normal data in three groups with different means
# Means (1,0), (0,1) and (0,0)
means  <- matrix(c(1,0,0,1,0,0), ncol=3)
data   <- matrix(nrow=300, ncol=2)
groups <- sample(rep(c(1,2,3), 100))
for(i in 1:300) data[i,] <- rnorm(c(1,1), mean=means[,groups[i]])
# Plot confidence ellipses with shading
CIplot_biv(data[,1], data[,2], group=groups, shade=TRUE)

Closure of rows of compositional data matrix

Description

This function closes (or normalizes) the rows of a compositional data matrix, resulting in rows summing to 1.

Usage

CLOSE(x)
CLOSE(x)

Arguments

`x`	Compositional data matrix.

Details

Compositional data carry relative information. It is sometimes required to close the data so that each row of observations sums to 1. The function CLOSE performs the closure.

Value

The closed compositional data matrix.

Author(s)

Michael Greenacre

References

Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC.

Examples

data(cups)
apply(cups, 2, sum)
cups <- CLOSE(cups)
apply(cups, 2, sum)
data(cups)
apply(cups, 2, sum)
cups <- CLOSE(cups)
apply(cups, 2, sum)

Centred logratios

Description

Computation of centred logratios (CLRs).

Usage

CLR(data, weight=TRUE)
CLR(data, weight=TRUE)

Arguments

`data`	A compositional data frame or matrix
`weight`	Logical indicating if varying weights are returned(default:`TRUE`). If `FALSE`, unweighted (equal) weights are returned. Alternatively a set of positive weights can be specified.

Details

The function CLR computes the set of centred logratios (CLRs).

Value

`LR`	The centred logratios (CLRs)
`LR.wt`	The weights assigned to the CLRs

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
CLR(veg)
data(veg)
CLR(veg)

Dataset: RomanCups

Description

This data set consists of the compositions of 11 oxides in 47 Roman cups found at an archaeological site in eastern England. Compositions are expressed as percentages.

Usage

data(cups)data(cups)

Format

Data frame containing the 47 x 11 matrix.

Source

Baxter MJ, Beardah CC, Cool HEM and Jackson CM (2005) Compositional data analysis of some alkaline glasses. Mathematical Geology 37: 183-196.

Dot plot

Description

Simple dot plot of original data

Usage

DOT(data, cols=NA, names=NA, groups=NA, pch=NA, horizon=FALSE, jitter=1, 
     xscale=NA, xscalefac=1, yaxis=TRUE, shownames=TRUE, main="", ylab="", 
     xlim=c(NA,NA), ylim=c(NA, NA), cex=1)
DOT(data, cols=NA, names=NA, groups=NA, pch=NA, horizon=FALSE, jitter=1, 
     xscale=NA, xscalefac=1, yaxis=TRUE, shownames=TRUE, main="", ylab="", 
     xlim=c(NA,NA), ylim=c(NA, NA), cex=1)

Arguments

`data`	Matrix or data frame with data groups in columns; alternatively, a single vector but then groups (if any) have to specified with the `groups` option
`cols`	Colours of points for each sample, default rainbow
`names`	Labels for variables, by default the column names of data, or group names
`groups`	Group codes to split the data vector into separate plots
`pch`	Point character
`horizon`	`TRUE` if horizontal gray dashed lines required at "nice" y-values (default FALSE, not implemented yet)
`jitter`	1 by default, increase or decrease slightly for more jitter
`xscale`	User-supplied positions of points on horizontal axis
`xscalefac`	1 by default, rescale the positions on horizontal axis
`yaxis`	TRUE by default, FALSE to suppress and optionally add afterwards
`shownames`	`TRUE` by default; `FALSE` to not show group names and add them externally
`main`	Heading
`ylab`	Vertical axis label
`xlim`	Horizontal axis limits
`ylim`	Vertical axis limits
`cex`	Character size adjustment for labels

Details

The function DOT makes a dot plot for specified groups of points, which can be in columns of a matrix or data frame, or in a single vector with group codes specified separately.

Author(s)

Michael Greenacre

References

Greenacre, M. (2016), Data reporting and visualization in ecology, Polar Biology, 39:2189-2205.

Examples

# Dot plot of columns of Vegetables data set
data(veg)
DOT(veg)
# Dot plot of domestic work column of TimeBudget data set, split by sex
data(time)
DOT(time[,2], groups=substr(rownames(time),3,3), cols=c("blue","red"), ylim=c(0,20), 
    jitter=2, main="Percentage of Domestic Work")
# Dot plot of columns of Vegetables data set
data(veg)
DOT(veg)
# Dot plot of domestic work column of TimeBudget data set, split by sex
data(time)
DOT(time[,2], groups=substr(rownames(time),3,3), cols=c("blue","red"), ylim=c(0,20), 
    jitter=2, main="Percentage of Domestic Work")

Dummy variable (indicator) coding

Description

Convert categorical variable to dummy (0/1) coding

Usage

DUMMY(x, catnames=NA)
DUMMY(x, catnames=NA)

Arguments

`x`	Variable (vector) of categorical data to be coded
`catnames`	Category names

Details

The function DUMMY takes a categorical variable and converts it to a set of dummy variables (zeros and ones), where the ones indicate the corresponding category. There are as many columns in the result as there are unique categories in the input vector.

Author(s)

Michael Greenacre

Examples

# Indicator (dummy) coding of sex in FishMorphology data set
data(fish)
sex   <- fish[,1]
sex.Z <- DUMMY(sex, catnames=c("F","M")) 
# Indicator (dummy) coding of sex in FishMorphology data set
data(fish)
sex   <- fish[,1]
sex.Z <- DUMMY(sex, catnames=c("F","M"))

Find the best ALR transformation

Description

Searching over every possible reference part for choosing an optimal ALR transformation.

Usage

FINDALR(data, weight=FALSE)
FINDALR(data, weight=FALSE)

Arguments

`data`	Compositional data matrix, with the parts as columns
`weight`	FALSE (default) for equally weighted parts, TRUE when weights are in data list object, or a vector of user-defined part weights

Details

The function FINDALR considers every possible set of additive logratio (ALR) transformations, by trying each of the references. For each set the closeness to isometry is measured by the Procrustes correlation. In addition, the variance of the log-transformed reference is also computed. The reference with highest Procrustes correlation and the reference with the lowest variance of its log-transform are identified. The number of ALRs computed is equal to 1 less than the number of rows or columns, whichever is the smallest.

Value

An object which describes the tree produced by the clustering process on the n objects. The object is a list with components:

`totvar`	Total logratio variance
`procrust.cor`	The Procrustes correlations of the ALRs using each reference
`procrust.max`	The value of the highest Procrustes correlation
`procrust.ref`	The reference corresponding to the highest correlation
`var.log`	Variances of the log-transformed references
`var.min`	The value of the lowest variance
`var.ref`	The reference corresponding to the lowest variance

Author(s)

Michael Greenacre

References

Greenacre, M., Martinez-Alvaro, M. and Blasco, A. (2021), Compositional data analysis of microbiome and any-omics datasets: a validation of the additive logratio transformation, Frontiers in Microbiology 12: 2625
Gower, J. and Dijksterhuis, G.B. (2004), Procrustes Problems. Oxford University Press
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC

Examples

# For the fish morphometric data, first close (normalize) 
# then loop over the 26 possible references
data(fish)
FINDALR(CLOSE(fish[,4:29]))
# Note that for the default option weight=FALSE closing the data is not necessary
# For the fish morphometric data, first close (normalize) 
# then loop over the 26 possible references
data(fish)
FINDALR(CLOSE(fish[,4:29]))
# Note that for the default option weight=FALSE closing the data is not necessary

Dataset: FishMorphology

Description

This data set consists of the sex, habitat, mass and then 26 morphometric measurements on 75 fish (Arctic charr)

Usage

data(fish)data(fish)

Format

Data frame containing the 75 x 29 matrix. Column 1 contains sex (1=female, 2=male). Column 2 contains habitat (1=litoral, 2=pelagic). Column 3 contains the mass in grams. Columns 4 to 29 contain the 26 morphometric measurements.

Source

Greenacre, M and Primicerio, R (2010) Multivariate Analysis of Ecological Data. BBVA Foundation, Bilbao. Free download at www.multivariatestatistics.org

Isometric logratio

Description

Computation of a single isometric logratio (ILR)

Usage

ILR(data, numer=NA, denom=NA, weight=TRUE)
ILR(data, numer=NA, denom=NA, weight=TRUE)

Arguments

`data`	A compositional data frame or matrix
`numer`	Vector of parts in the numerator
`denom`	Vector of parts in the denominator
`weight`	Logical indicating if a varying weight is returned (default:`TRUE`). If `FALSE`, a weight based on equally-weighted parts is returned. Alternatively a positive weight can be specified.

Details

The function ILR computes a single isometric logratio based on the specified numerator and denominator parts that define the two geometric means in the ratio.

Value

`LR`	The isometric logratio (ILR)
`LR.wt`	The weight assigned to the ILR

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
ILR(veg, numer=1, denom=2:3)
data(veg)
ILR(veg, numer=1, denom=2:3)

Inverse of additive logratios

Description

Given additive logratios (ALRs) with respect to a specified part, compute the inverse (i.e. original parts)

Usage

invALR(ALRmatrix, part.names=paste("part",1:(ncol(ALRmatrix)+1),sep=""), denom=NA)
invALR(ALRmatrix, part.names=paste("part",1:(ncol(ALRmatrix)+1),sep=""), denom=NA)

Arguments

`ALRmatrix`	A matrix of additive logratios (ALRs) with respect to a specified part)
`part.names`	Part names in the reconstructed compositional data matrix
`denom`	The index of the denominator used in the computation of the ALRs (default: last part))

Details

The function invALR computes the original parts, given the additive logratios (ALRs)

Value

parts

The reconstructed parts (they add up to 1)

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
# compute additive logratios with respect to second part
veg.ALR <- ALR(veg, denom=2)$LR
# recover original parts (to get same order, specify the denominator used originally)
invALR(veg.ALR, denom=2)
data(veg)
# compute additive logratios with respect to second part
veg.ALR <- ALR(veg, denom=2)$LR
# recover original parts (to get same order, specify the denominator used originally)
invALR(veg.ALR, denom=2)

Inverse of centred logratios

Description

Given centred logratios (CLRs), compute the inverse (i.e. recover the original parts)

Usage

invCLR(CLRmatrix, part.names=colnames(CLRmatrix))
invCLR(CLRmatrix, part.names=colnames(CLRmatrix))

Arguments

`CLRmatrix`	A matrix of centred logratios
`part.names`	Part names in the reconstructed compositional data matrix

Details

The function invCLR computes the original parts, given the centred logratios (CLRs)

Value

parts

The reconstructed parts (they add up to 1)

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
# compute centred logratios
veg.CLR <- CLR(veg)$LR
# invert back to original parts (parts closed to sum to 1)
invALR(veg.CLR)
data(veg)
# compute centred logratios
veg.CLR <- CLR(veg)$LR
# invert back to original parts (parts closed to sum to 1)
invALR(veg.CLR)

Inverse of full set of amalgamation balances

Description

Given a full set of amalgamation (or summation) balances (SLRs), compute the inverse (i.e. recover the original parts)

Usage

invSLR(SLRmatrix, part.names=NA, ratio.names=colnames(SLRmatrix))
invSLR(SLRmatrix, part.names=NA, ratio.names=colnames(SLRmatrix))

Arguments

`SLRmatrix`	A matrix of amalgamation logratios, one less column than the number of parts
`part.names`	Part names in the reconstructed compositional data matrix
`ratio.names`	Definition of the amalgamation logratios

Details

The function invSLR computes the original parts, given the amalgamation logratios (CLRs). The amalgamation logratios are specified in ratio.names in the format num/den where num and den are the numerator and denominator amalgamations respectively. An amalgamation is specified as "p1&p2&...", where p1, p2, etc. are the parts summed in the amalgamation. For example, an SLR of the ratio MnO/(CaO+P2O5) would be names as "MnO/CaO&P2O5".

Value

parts

The reconstructed parts (they add up to 1)

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
# compute centred logratios
veg.CLR <- CLR(veg)$LR
# invert back to original parts (parts closed to sum to 1)
invALR(veg.CLR)
data(veg)
# compute centred logratios
veg.CLR <- CLR(veg)$LR
# invert back to original parts (parts closed to sum to 1)
invALR(veg.CLR)

All pairwise logratios

Description

Computation of all pairwise logratios.

Usage

LR(data, ordering=1:ncol(data), weight=TRUE)
LR(data, ordering=1:ncol(data), weight=TRUE)

Arguments

`data`	A compositional data frame or matrix
`ordering`	A permutation of the columns (default: the original ordering)
`weight`	Logical indicating if varying weights are returned (default:`TRUE`). If `FALSE`, unweighted (equal) weights are returned. Alternatively a set of positive weights can be specified.

Details

The function LR computes the complete set of pairwise logratios, in the order [1,2], [1,3], [2,3], [1,4], [2,4], [3,4], etc.

Value

`LR`	The pairwise logratios as columns of a data matrix
`LR.wt`	The weights assigned to the respective logratios

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
LR(veg)
data(veg)
LR(veg)

Total logratio variance

Description

Computation of total (weighted)logratio variance.

Usage

LR.VAR(LRdata, row.wt = NA, weight=TRUE, vars=FALSE)
LR.VAR(LRdata, row.wt = NA, weight=TRUE, vars=FALSE)

Arguments

`LRdata`	Matrix of logratios, either a vector or preferably the logratio object resulting from one of the functions ALR, CLR, PLR or LR
`row.wt`	Optional set of row weights (default: equal weights)
`weight`	Logical indicating if varying weights are returned(default:`TRUE`). If `FALSE`, unweighted (equal) weights are returned. Alternatively a set of positive weights can be specified.
`vars`	If `TRUE`, output individual variances as well (default `FALSE`)

Details

The function LR.VAR computes the sum of the logratio variances provided as input, using the weights in the logratio data object.

Value

`LRtotvar`	The total logratio variance
`LRvars`	(optional, if vars=TRUE, the individual logratio variances composing the total)

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(cups)
# These give identical total logratio variances (weighted, by default)
LR.VAR(CLR(cups))
LR.VAR(LR(cups))
# Summing over all sets of ALRs gives twice the variance
totvar <- 0
for(j in 1:ncol(cups)) totvar <- totvar + LR.VAR(ALR(cups, denom=j))
totvar/2
data(cups)
# These give identical total logratio variances (weighted, by default)
LR.VAR(CLR(cups))
LR.VAR(LR(cups))
# Summing over all sets of ALRs gives twice the variance
totvar <- 0
for(j in 1:ncol(cups)) totvar <- totvar + LR.VAR(ALR(cups, denom=j))
totvar/2

Logratio analysis

Description

Computation of weighted or unweighted logratio analysis of a samples-by-parts compositional data table.

Usage

LRA(data, nd = 2, weight = TRUE, suprow = NA, row.wt = NA, amalg = NA, supamalg = FALSE)
LRA(data, nd = 2, weight = TRUE, suprow = NA, row.wt = NA, amalg = NA, supamalg = FALSE)

Arguments

`data`	A data frame or matrix of compositional data, with no zero values
`nd`	Number of dimensions for summary solution if not 2 (default)
`weight`	TRUE (default) for part weighting, FALSE for unweighted analysis, or a vector of user-defined part weights
`suprow`	Indices of rows that are supplementary points
`row.wt`	Optional user-defined set of positive weights for the rows (samples) (default: equal weights)
`amalg`	Optional list of amalgamated parts
`supamalg`	FALSE (default) when amalgamations are active and their subparts supplementary, TRUE when amalgamations are supplementary and their parts active

Details

The function LRA computes a log-ratio analysis of a table of compositional data based on the singular value decomposition. By default the weighted log-ratio analysis is computed (Greenacre & Lewi 2009). For the unweighted logratio analysis (Aitchison & Greenacre 2002), specify the option weight=FALSE.

User-specified weights can be supplied, for the rows and/or the columns. Usually row weights are not specified, and are equal unless intentional weighting of the samples is desired. Default column weights (if weight = TRUE) are the part means of the true compositional table, thus summing to 1. User-specified part weights can be provided using the same weight option.

Supplementary rows can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes.

Amalgamations can be defined and can either replace their constituent parts (default) or be declared supplementary using the supamalg option: supamalg = FALSE (default), = TRUE if all declared amalgamations are supplementary.

Value

`sv`	Singular values
`nd`	Number of dimensions in solution results
`rownames`	Row names
`rowmass`	Row weights
`rowdist`	Row logratio distances to centroid
`rowinertia`	Row inertias
`rowcoord`	Row standard coordinates
`rowpcoord`	Row principal coordinates
`rowsup`	Indices of row supplementary points
`colnames`	Column names
`colmass`	Column weights
`coldist`	Column logratio distances to centroid
`colinertia`	Column inertias
`colcoord`	Column standard coordinates
`colpcoord`	Column principal coordinates
`N`	The compositional data table

Author(s)

Michael Greenacre

References

Aitchison, J. and Greenacre, M. (2002), Biplots of compositional data, Applied Statistics 51, 375-392.
Greenacre, M. and Lewi, P.J. (2009), Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio scale measurements. Journal of Classification 26, 29-54. Greenacre, M. (2020), Amalgamations are valid in compositional data analysis, can be used in agglomerative clustering, and their logratios have an inverse transformation. Applied Computing and Geosciences 5, 100017.

Examples

# (weighted) LRA of the RomanCups data set, showing default symmetric map
data("cups")
PLOT.LRA(LRA(cups))
# (unweighted) LRA of the RomanCups data set, showing default symmetric map
# the solution is completely different
PLOT.LRA(LRA(cups, weight=FALSE))
# (weighted) LRA of the RomanCups data set, showing default symmetric map
data("cups")
PLOT.LRA(LRA(cups))
# (unweighted) LRA of the RomanCups data set, showing default symmetric map
# the solution is completely different
PLOT.LRA(LRA(cups, weight=FALSE))

Principal component analysis

Description

Computation of weighted or unweighted principal component analysis of a matrix of interval-scale data (e.g. a matrix of logratios).

Usage

PCA(data, nd = 2, weight = TRUE, row.wt = NA, suprow = NA)
PCA(data, nd = 2, weight = TRUE, row.wt = NA, suprow = NA)

Arguments

`data`	A data frame or matrix of interval-scale data, or logratio object from functions `ALR`, `CLR` or `LR`
`nd`	Number of dimensions for summary solution if not 2 (default)
`weight`	TRUE (default) for column weighting, FALSE for unweighted analysis, or a vector of user-defined column weights
`row.wt`	Optional user-defined set of positive weights for the rows (samples) (default: equal weights)
`suprow`	Indices of rows that are supplementary points (NOTE: this option is not implemented in this version)

Details

The function PCA computes an unstandardized principal component analysis, based on the singular value decomposition, of a matrix of interval-scale data, usually a matrix of logratios in the context of this package (but it can be used for general data as well). For general usage the unweighted option weight = FALSE might be preferred, but the default is weighted in the present context of compositional data.

Supplementary rows and columns can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes. Notice that this optyion is not implemented in the present version, but will appear in the next one.

Value

`sv`	Singular values
`nd`	Dimenson of the solution
`rownames`	Row names
`rowmass`	Row weights
`rowdist`	Row logratio distances to centroid
`rowinertia`	Row variances
`rowcoord`	Row standard coordinates
`rowpcoord`	Row principal coordinates
`rowsup`	Indices of row supplementary points
`colnames`	Column names
`colmass`	Column weights
`coldist`	Column logratio distances to centroid
`colinertia`	Column variances
`colcoord`	Column standard coordinates
`colpcoord`	Column principal coordinates
`N`	The data table

Author(s)

Michael Greenacre

References

Aitchison, J. and Greenacre, M. (2002), Biplots of compositional data, Applied Statistics 51, 375-392.
Greenacre, M. (2010), Biplots in Practice, BBVA Foundation, Bilbao. Free download from www.multivariatestatistics.org

Examples

# compute logratios of Vegetables data set
data("veg")
veg.LR <- LR(veg)
# unweighted PCA biplot of the results
veg.pca <- PCA(veg.LR$LR, weight=FALSE)
PLOT.PCA(veg.pca, map="asymmetric")
# compute logratios of Vegetables data set
data("veg")
veg.LR <- LR(veg)
# unweighted PCA biplot of the results
veg.pca <- PCA(veg.LR$LR, weight=FALSE)
PLOT.PCA(veg.pca, map="asymmetric")

Plot the results of a correspondence analysis

Description

Various maps and biplots of the results of a correspondence analysis using function CA.

Usage

PLOT.CA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1), main="",
        cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8), fonts=c(2,4))
PLOT.CA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1), main="",
        cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8), fonts=c(2,4))

Arguments

`obj`	A CA object created using function `CA`
`map`	Choice of scaling of rows and columns: `"symmetric"` (default), `"asymmetric"` or `"contribution"`
`rescale`	A rescaling factor applied to column coordinates (default is 1 for no rescaling)
`dim`	Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2))
`main`	Title for plot
`axes.inv`	Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing)
`cols`	Colours for row and column labels (default is c("blue","red"))
`colarrows`	Colour for arrows in asymmetric and contribution biplots (default is "pink")
`cexs`	Character expansion factors for row and column labels (default is c(0.8,0.8))
`fonts`	Fonts for row and column labels (default is c(2,4))

Details

The function PLOT.CA makes a scatterplot of the results of a correspondence analysis (computed using function CA), with various options for scaling the results and changing the direction of the axes. By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to variables.

By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance (i.e. inertia) along the two dimensions. The other options are biplots: the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates. In cases where the row and column displays occupy widely different extents, the column coordinates can be rescaled using the rescale option.

Author(s)

Michael Greenacre

References

Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics, 22, 107-122.

Examples

data("cups")
cups.ca <- CA(cups)
PLOT.CA(cups.ca, map="contribution", rescale=0.2)	
# Compare the above plot with that of a weighted LRA -- practically the same
cups.lra <- LRA(cups)
PLOT.LRA(cups.lra, map="contribution", rescale=0.2)
data("cups")
cups.ca <- CA(cups)
PLOT.CA(cups.ca, map="contribution", rescale=0.2)	
# Compare the above plot with that of a weighted LRA -- practically the same
cups.lra <- LRA(cups)
PLOT.LRA(cups.lra, map="contribution", rescale=0.2)

Plot the results of a logratio analysis

Description

Various maps and biplots of the results of a logratio analysis using function LRA.

Usage

PLOT.LRA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1), main="",
         cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8), fonts=c(2,4))
PLOT.LRA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1), main="",
         cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8), fonts=c(2,4))

Arguments

`obj`	An LRA object created using function `LRA`
`map`	Choice of scaling of rows and columns: `"symmetric"` (default), `"asymmetric"` or `"contribution"`
`rescale`	A rescaling factor applied to column coordinates (default is 1 for no rescaling)
`dim`	Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2))
`axes.inv`	Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing)
`main`	Title for plot
`cols`	Colours for row and column labels (default is c("blue","red"))
`colarrows`	Colour for arrows in asymmetric and contribution biplots (default is "pink")
`cexs`	Character expansion factors for row and column labels (default is c(0.8,0.8))
`fonts`	Fonts for row and column labels (default is c(2,4))

Details

The function PLOT.LRA makes a scatterplot of the results of a logratio analysis (computed using function LRA), with various options for scaling the results and changing the direction of the axes. By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to compositional parts.

By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance along the two dimensions. The other options are the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates. In cases where the row and column displays occupy widely different extents, the column coordinates can be rescaled using the rescale option.

Author(s)

Michael Greenacre

References

Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics, 22, 107-122.

Examples

# perform LRA on the vegetable compositions (unweighted form)
data("veg")
veg.lra <- LRA(veg, weight=FALSE)
PLOT.LRA(veg.lra)
# perform LRA on the Roman cups data set and plot the results (weighted form)
data("cups")
cups.lra <- LRA(cups)
PLOT.LRA(cups.lra, map="contribution", rescale=0.2)	
# perform LRA on the vegetable compositions (unweighted form)
data("veg")
veg.lra <- LRA(veg, weight=FALSE)
PLOT.LRA(veg.lra)
# perform LRA on the Roman cups data set and plot the results (weighted form)
data("cups")
cups.lra <- LRA(cups)
PLOT.LRA(cups.lra, map="contribution", rescale=0.2)

Plot the results of a principal component analysis

Description

Various maps and biplots of the results of a principal component analysis using function PCA.

Usage

PLOT.PCA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1),
         main="", cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8),
         fonts=c(2,4))
PLOT.PCA(obj, map="symmetric", rescale=1, dim=c(1,2), axes.inv = c(1,1),
         main="", cols=c("blue","red"), colarrows = "pink", cexs=c(0.8,0.8),
         fonts=c(2,4))

Arguments

`obj`	An LRA object created using function `LRA`
`map`	Choice of scaling of rows and columns: `"symmetric"` (default), `"asymmetric"` or `"contribution"`
`rescale`	A rescaling factor applied to column coordinates (default is 1 for no rescaling)
`dim`	Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2))
`axes.inv`	Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing)
`main`	Title for plot
`cols`	Colours for row and column labels (default is c("blue","red"))
`colarrows`	Colour for arrows in asymmetric and contribution biplots (default is "pink")
`cexs`	Character expansion factors for row and column labels (default is c(0.8,0.8))
`fonts`	Fonts for row and column labels (default is c(2,4))

Details

The function PLOT.PCA makes a scatterplot of the results of a logratio analysis (computed using function PCA), with various options for scaling the results and changing the direction of the axes. By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to variables.

By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance along the two dimensions. The other options are biplots: the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates. In cases where the row and column displays occupy widely different extents, the column coordinates can be rescaled using the rescale option.

Author(s)

Michael Greenacre

References

Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics, 22, 107-122.

Examples

# perform weighted PCA on the ALRs of the Roman cups data set
# where the first oxide silica is chosen as the denominator
data("cups")
cups.alr <- ALR(cups, denom=1)
cups.pca <- PCA(cups.alr)
PLOT.PCA(cups.pca, map="contribution", rescale=0.2, axes.inv=c(1,-1))	
# perform weighted PCA on the ALRs of the Roman cups data set
# where the first oxide silica is chosen as the denominator
data("cups")
cups.alr <- ALR(cups, denom=1)
cups.pca <- PCA(cups.alr)
PLOT.PCA(cups.pca, map="contribution", rescale=0.2, axes.inv=c(1,-1))

Plot the results of a redundancy analysis

Description

Various maps and biplots/triplots of the results of a redundancy analysis using function RDA.

Usage

PLOT.RDA(obj, map="symmetric", indcat=NA, rescale=1, dim=c(1,2), axes.inv=c(1,1),
         main="", rowstyle=1, cols=c("blue","red","forestgreen"),
         colarrows=c("pink","lightgreen"), colrows=NA, pchrows=NA, colcats=NA,
         cexs=c(0.8,0.8,0.8), fonts=c(2,4,4))
PLOT.RDA(obj, map="symmetric", indcat=NA, rescale=1, dim=c(1,2), axes.inv=c(1,1),
         main="", rowstyle=1, cols=c("blue","red","forestgreen"),
         colarrows=c("pink","lightgreen"), colrows=NA, pchrows=NA, colcats=NA,
         cexs=c(0.8,0.8,0.8), fonts=c(2,4,4))

Arguments

`obj`	An RDA object created using function `RDA`
`map`	Choice of scaling of rows and columns: `"symmetric"` (default), `"asymmetric"` or `"contribution"`
`indcat`	A vector indicating which of the covariates are dummy (or fuzzy) variables
`rescale`	A rescaling factor applied to column coordinates(default is 1 for no rescaling). If rescale is a vector with two values, the first applies to the column coordinates and the second to the covariate coordinates.
`dim`	Dimensions selected for horizontal and vertical axes of the plot (default is c(1,2))
`axes.inv`	Option for reversing directions of horizontal and vertical axes (default is c(1,1) for no reversing, change one or both to -1 for reversing)
`main`	Title for plot
`rowstyle`	Scaling option for row coordinates, either 1 (SVD coordinates, default) or 2 (as supplementary points)
`cols`	Colours for row and column and covariate labels (default is c("blue","red","forestgreen"))
`colarrows`	Colour for arrows in asymmetric or contribution biplots, for columns and covariates (default is c("pink","lightgreen"))
`colrows`	Optional vector of colours for rows
`pchrows`	Optional vector of point symbols for rows
`colcats`	Optional vector of colours for covariate categories (dummy variables)
`cexs`	Vector of character expansion factors for row and column and covariate labels (default is c(0.8,0.8,0.8))
`fonts`	Vector of font styles for row and column and covariate labels (default is c(2,4,4))

Details

The function PLOT.RDA makes a scatterplot of the results of a redundancy analysis (computed using function RDA), with various options for scaling the results and changing the direction of the axes. By default, dimensions 1 and 2 are plotted on the horizontal and vertical axes, and it is assumed that row points refer to samples and columns to compositional parts. Covariates are plotted according to their regression coefficients with the RDA dimensions, and if they contain dummy (or fuzzy) variables these are indicated by the option indcat, and hence plotted as centroids not arrows.

By default, the symmetric scaling is used, where both rows and columns are in principal coordinates and have the same amount of weighted variance along the two dimensions. The other options are the asymmetric option, when columns are in standard coordinates, and the contribution option, when columns are in contribution coordinates. In cases where the row and column displays as well as the covariate positions occupy widely different extents, the column and covariate coordinates can be rescaled using the rescale option.

Author(s)

Michael Greenacre

Examples

# see the use of PLOT.RDA in the example of the RDA function
# see the use of PLOT.RDA in the example of the RDA function

Pivot logratios

Description

Computation of the set of pivot logratios(PLRs) based on the specified ordering of parts

Usage

PLR(data, ordering=1:ncol(data), weight=TRUE)
PLR(data, ordering=1:ncol(data), weight=TRUE)

Arguments

`data`	A compositional data frame or matrix
`ordering`	The ordering of the parts to be used in the PLRs (by default, the original ordering of the columns)
`weight`	Logical indicating if varying weights are returned (default:`TRUE`). If `FALSE`, weights based on equally-weighted parts are returned. Alternatively a set of positive weights can be specified.

Details

The function PLR computes the set of pivot logratios according to the ordering of the parts.

Value

`LR`	The pivot logratios (PLRs)
`LR.wt`	The weights assigned to the PLRs

Author(s)

Michael Greenacre

References

Hron K., Filzmoser P., de Caritat P., Fiserova E., Gardlo A. (2017). Weighted pivot coordinates for copositional data and their application to geochemical mapping. Mathematical Geosciences 49, 777-796.

Examples

data(veg)
PLR(veg, ordering=c(1,3,2))
data(veg)
PLR(veg, ordering=c(1,3,2))

Redundancy analysis

Description

Computation of weighted or unweighted redundancy analysis of a samples-by-parts compositional data table, given a set of covariates.

Usage

RDA(data, cov=NA, nd = NA, weight = TRUE, suprow = NA, row.wt = NA) 
RDA(data, cov=NA, nd = NA, weight = TRUE, suprow = NA, row.wt = NA)

Arguments

`data`	A data frame or matrix of interval-scale data, e.g. logratios (which are preferably in a list object with weights)
`cov`	List of covariates for constraining solution
`nd`	Number of dimensions for summary output, by default the number of constraining dimensions
`weight`	TRUE (default) when weights are in data list object, FALSE for unweighted analysis, or a vector of user-defined part weights
`suprow`	Indices of rows that are supplementary (passive) points (NOTE: this option is not implemented in this version)
`row.wt`	Optional user-defined set of positive weights for the rows (samples) (default: equal weights)

Details

The function RDA computes a redundancy analysis of a matrix of interval-scaled data, constrained by a matrix of covariates, using the singular value decomposition. By default weights are assumed in the data list object. For the unweighted logratio analysis, specify the option weight=FALSE. If weight = TRUE (the default) it is assumed that the weights are included in the data object, which comes from one of the logratio functions. User-specified part weights can be provided using the same weight option.

Usually row weights are not specified, they are equal unless intentional weighting of the samples is desired. Supplementary rows can be declared (also known as passive points) – these do not contribute to the solution but are positioned on the solution axes. This option will be available in the next release of the package.

Value

`sv`	Singular values
`nd`	Number of dimensions in the solution output
`rownames`	Row names
`rowmass`	Row weights
`rowdist`	Row distances to centroid
`rowinertia`	Row variances
`rowcoord`	Row standard coordinates
`rowpcoord`	Row principal coordinates
`rowsup`	Indices of row supplementary points
`colnames`	Column names
`colmass`	Column weights
`coldist`	Column logratio distances to centroid
`colinertia`	Column variances
`colcoord`	Column standard coordinates
`colpcoord`	Column principal coordinates
`covcoord`	Regression coordinates of constraining variables
`covnames`	Names of constraining variables
`N`	The data table (usually logratios in this package)
`cov`	The table of covariates

Author(s)

Michael Greenacre

References

Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2013), Contribution biplots, Journal of Computational and Graphical Statistics 22, 107-122.

Examples

# Data frame fish has sex, habitat and mass in first columns, 
# then morphometric data in remaining columns
data("fish")
sex     <- fish[,1]
habitat <- fish[,2]
mass    <- fish[,3]
fishm   <- as.matrix(fish[,4:29])
# Convert to compositional data matrix
fishm   <- CLOSE(fishm)
# Compute logarithm of mass and interaction of sex (F/M) and habitat (L/P) categories
logmass <- log(mass)
sexhab  <- 2*(sex-1)+habitat
sexhab.names <- c("FL","FP","ML","MP")
rownames(fishm) <- sexhab.names[sexhab]
# Create dummy variables for sexhab and create matrix of covariates
sexhab.Z <- DUMMY(sexhab, catnames=sexhab.names)
vars     <- cbind(logmass, sexhab.Z)
# Perform RDA on centred logratios
require(ca)
fish.rda <- RDA(CLR(fishm), cov=vars)
# Plot results 
# (for more options see Appendix of Compositional Data Analysis in Practice)
PLOT.RDA(fish.rda, map="contribution", rescale=0.05, indcat=2:5, 
         colrows=rainbow(4, start=0.1, end=0.8)[sexhab], cexs=c(0.8,0.8,1))
# Data frame fish has sex, habitat and mass in first columns, 
# then morphometric data in remaining columns
data("fish")
sex     <- fish[,1]
habitat <- fish[,2]
mass    <- fish[,3]
fishm   <- as.matrix(fish[,4:29])
# Convert to compositional data matrix
fishm   <- CLOSE(fishm)
# Compute logarithm of mass and interaction of sex (F/M) and habitat (L/P) categories
logmass <- log(mass)
sexhab  <- 2*(sex-1)+habitat
sexhab.names <- c("FL","FP","ML","MP")
rownames(fishm) <- sexhab.names[sexhab]
# Create dummy variables for sexhab and create matrix of covariates
sexhab.Z <- DUMMY(sexhab, catnames=sexhab.names)
vars     <- cbind(logmass, sexhab.Z)
# Perform RDA on centred logratios
require(ca)
fish.rda <- RDA(CLR(fishm), cov=vars)
# Plot results 
# (for more options see Appendix of Compositional Data Analysis in Practice)
PLOT.RDA(fish.rda, map="contribution", rescale=0.05, indcat=2:5, 
         colrows=rainbow(4, start=0.1, end=0.8)[sexhab], cexs=c(0.8,0.8,1))

Amalgamation (summed) logratio

Description

Computation of a single amalgamation (summed) logratio

Usage

SLR(data, numer=NA, denom=NA, weight=TRUE)
SLR(data, numer=NA, denom=NA, weight=TRUE)

Arguments

`data`	A compositional data frame or matrix
`numer`	Vector of parts in the numerator
`denom`	Vector of parts in the denominator
`weight`	Logical indicating if a varying weight is returned (default:`TRUE`). If `FALSE`, a weight based on equally-weighted parts is returned. Alternatively a positive weight can be specified.

Details

The function SLR computes a single amalgamation logratio based on the specified numerator and denominator parts that define the two summations in the ratio.

Value

`LR`	The amalgamation (summed)) logratio (SLR)
`LR.wt`	The weight assigned to the SLR

Author(s)

Michael Greenacre

References

Aitchison, J. (1986), The Statistical Analysis of Compositional Data, Chapman & Hall.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC Press.

Examples

data(veg)
SLR(veg, numer=1, denom=2:3)
data(veg)
SLR(veg, numer=1, denom=2:3)

Stepwise selection of logratios

Description

Stepwise selection of pairwise logratios that explain maximum variance in a target matrix.

Usage

STEP(data, datatarget=data, previous=NA, previous.wt=NA, weight=TRUE, 
     random=FALSE, nsteps=min(ncol(data), ncol(datatarget))-1, top=1)
STEP(data, datatarget=data, previous=NA, previous.wt=NA, weight=TRUE, 
     random=FALSE, nsteps=min(ncol(data), ncol(datatarget))-1, top=1)

Arguments

`data`	A data frame or matrix of compositional data on which pairwise logratios are computed
`datatarget`	A matrix of interval-scale data, with as many rows as `data`, which serves as the target matrix whose variance is to be explained (by default it is the same matrix as data, in which case total logratio variance is to be explained)
`previous`	A vector or matrix of variables to be forced in before logratios are sought
`previous.wt`	Possible weights of the variable(s) forced in before logratios are sought (if not specified, weights of 1 are assumed)
`weight`	`TRUE` (default) when weights are in data list object, `FALSE` for unweighted analysis, or a vector of user-defined part weights
`random`	`TRUE` if a random selection is made of tied logratios; `FALSE` (default) if logratio that maximizes Procrustes correlation is chosen
`nsteps`	Number of steps to take (by default, one less than the number of columns of data and of datatarget, whichever is smaller)
`top`	Number of top variance-explaining logratios returned after last step (by default, 1, i.e. the best)

Details

The function STEP sequentially computes the logratios in a data matrix (usually compositional) that best explain the variance in a second matrix, called the target matrix. By default, the target matrix is the same matrix, in which case the logratios that best explain the logratio variance in the same matrix are computed. In this case, weights for the data matrix are assumed by default, proportional to part means of the compositional data matrix. For the unweighted logratio variance, specify the option weight=FALSE. User-specified weights on the columns of the data matrix (usually compositional parts) can be provided using the same weight option.

If the target matrix is a different matrix, it is the logratio variance of that matrix that is to be explained. An option for the target matrix to be any response matrix will be in the next release.

If nsteps > 1 and top=1 the results are in the form of an optimal set of logratios that sequentially add maximum explained variance at each step. If top>1 then at the last step the ordered list of top variance-explaining logratios is returned, which allows users to make an alternative choice of the logratio based on substantive knowledge. Hence, if nsteps=1 and top=10, for example, the procedure will move only one step, but list the top 10 logratios for that step. If top=1 then all results with extension .top related to the top ratios are omitted because they are already given.

Value

`names`	Names of maximizing ratios in stepwise process
`ratios`	Indices of ratios
`logratios`	Matrix of logratios
`R2max`	Sequence of maximum cumulative explained variances
`pro.cor`	Corresponding sequence of Procrustes correlations
`names.top`	Names of "top" ratios at last step
`ratios.top`	Indices of "top" ratios
`logratios.top`	Matrix of "top" logratios
`R2.top`	Sequence of "top" cumulative explained variances (in descending order)
`pro.cor.top`	Corresponding sequence of "top" Procrustes correlations
`totvar`	Total logratio variance of target matrix

Author(s)

Michael Greenacre

References

Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2018), Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, DOI: 10.1007/s11004-018-9754-x.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC

Examples

# Stepwise selection of ratios for RomanCups data set
data(cups)
# Set seed to obtain same results as in Appendix C of Greenacre (2018)
set.seed(2872)
STEP(cups, random=TRUE)
# Select best ratio, but output "top 5"
STEP(cups, nsteps=1, top=5)
# Stepwise selection of ratios for RomanCups data set
data(cups)
# Set seed to obtain same results as in Appendix C of Greenacre (2018)
set.seed(2872)
STEP(cups, random=TRUE)
# Select best ratio, but output "top 5"
STEP(cups, nsteps=1, top=5)

Stepwise selection of pairwise logratios for generalized linear modelling

Description

Three different algorithms for selecting pairwise logratios that best explain/predict a response variable, which could be continuous, binary or count

Usage

STEPR(data, y, method = NA, family = "gaussian", nsteps = ncol(data)-1, 
      top = 1, previous = NA, criterion = "Bonferroni", alpha = 0.05, 
      previousparts=NA, denom=NA)
STEPR(data, y, method = NA, family = "gaussian", nsteps = ncol(data)-1, 
      top = 1, previous = NA, criterion = "Bonferroni", alpha = 0.05, 
      previousparts=NA, denom=NA)

Arguments

`data`	A data frame or matrix of compositional data on which the pairwise logratios will be constructed and selected
`y`	The response variable: a numeric variable for regression (default)), a binary factor for logistic regression or a numeric count for Poisson regression
`method`	The selection method: 1 (unrestricted selection of logratios), 2 (restricted to non-overlapping parts), 3 (additive logratios)
`family`	The distribution used in the generalized linear model family: "gaussian" (default, for multiple regression), "binomial" (for logistic regression of binary response), or "poisson" (for Poisson regression)
`nsteps`	The maximum number of steps taken, by default one less than the number of parts
`top`	When one step is taken (nsteps=1), the ordered list of top logratios with the highest improvements in the likelihood function, for selection based on domain knowledge
`previous`	For specifying variable(s) to be included before stepwise selection takes place; these can be non-compositional variables and/or specific pairwise logratios computed in previous runs of STEPR or by hand; the matrix (or vector for a single variable) of values must be supplied
`criterion`	Criterion for stopping the stepwise selection: "Bonferroni" (default), "AIC", "BIC", or NA for no stopping until maximum specified or permissible logratios entered
`alpha`	Overall significance level (default is 0.05)
`previousparts`	(For method 2) The sequence numbers of the logratios, if any, forced in using the previous option
`denom`	(For method 3) The sequence number of the part used in denominator; for use when additive logratios are forced in using previous option or to select a set of additive logratios with specific reference from the start

Details

The function STEPR performs stepwise selection of pairwise logratios with the objective of explaining/predicting a response variable, in the framework of generalized linear modelling where the response can be numeric continuous (regression analysis), or a binary factor (logistic regression), or a numeric count (Poisson regression). The corresponding family option has to be indicated if the regression is logistic or Poisson. The different method options for the stepwise selection are method = 1 (unrestricted selection of logratios, any logratios can be selected irrespective of the previous ones), method = 2 (restricted to non-overlapping parts, each part participates at most in one logratio, so that parts in previously selected logratios are excluded in subsequent steps; logratio effects can be interpreted as under orthogonality), method = 3 (additive logratios, only logratios with the same denominator as the first selected logratio are c onsidered; the result is an additive logratio transformation on a subcomposition) Three alternative stopping criteria can be specified, otherwise the procedure executes as many steps as the value of nsteps. These are (in increasing strictness), "AIC", "BIC" and "Bonferroni" (the default).

Value

`rationames`	Names of the selected logratios
`ratios`	The sequence numbers of the selected parts in each ratio
`logratios`	Matrix of selected logratios
`logLik`	The -2*log-likelihood sequence for the steps
`deviance`	The deviance sequence for the steps
`AIC`	The AIC sequence for the steps
`BIC`	The BIC sequence for the steps
`Bonferroni`	The Bonferroni sequence for the steps
`null.deviance`	The null deviance for the regression

(Notice that for logLik, AIC, BIC and Bonferroni, the values for one more step are given, so that the stopping point can be confirmed.)

And the following if top > 1:

`ratios.top`	The top ratios and the sequence numbers of their parts
`logratios.top`	The matrix of top logratios
`logLik.top`	The set of top -2*log-likelihoods
`deviance.top`	The set of top deviances
`AIC.top`	The set of top AICs
`BIC.top`	The set of top BICs
`Bonferroni.top`	The set of top Bonferronis

Author(s)

Michael Greenacre

References

Coenders, G. and Greenacre, M. (2021), Three approaches to supervised learning for compositional data with pairwise logratios. aRxiv preprint. URL:https://arxiv.org/abs/2111.08953
Coenders, G. and Pawlowsky-Glahn, V. (2020), On interpretations of tests and effect sizes in regression models with a compositional predictor. SORT, 44:201-220
Greenacre, M. (2021), Compositional data analysis, Annual Review of Statistics and its Application, 8: 271-299

Examples

# For the fish morphometric data, first close (normalize, although not necessary) 
# then loop over the 26*25/2 = 325 possible logratios stepwise
data(fish)
habitat <- fish[,2]
morph <- CLOSE(fish[,4:29])
# predict habitat binary classification from morphometric ratios
fish.step1 <- STEPR(morph, as.factor(habitat), method=1, family="binomial")
# [1] "Criterion increases when 3-th ratio enters"
fish.step1$names
# [1] "Bac/Hg" "Hw/Jl"
# perform logistic regression with selected logratios
fish.glm   <- glm(as.factor(habitat) ~ fish.step1$logratios, family="binomial")
summary(fish.glm)
fish.pred1  <- predict(fish.glm)
table(fish.pred1>0.5, habitat)
#     habitat
#        1  2
# FALSE 56 11
# TRUE   3  5
# (Thus 61/75 correct predictions)
#
# force the sex variable in at the first step before selecting logratios
# and using more strict Bonferroni default
sex <- as.factor(fish[,1]) 
fish.step2 <- STEPR(morph, as.factor(habitat), method=1, previous=sex, family="binomial")
# [1] "Criterion increases when 3-th ratio enters"
fish.step2$names
# [1] "Bac/Hg" "Hw/Jl"
# perform logistic regression with sex and selected logratios
fish.glm   <- glm(as.factor(habitat) ~ sex + fish.step2$logratios, family="binomial")
summary(fish.glm)
# (sex not significant)
#
# check the top 10 ratios at Step 1 to allow domain knowledge to operate
fish.step3 <- STEPR(morph, as.factor(habitat), method=1, nsteps=1, top=10, family="binomial")
cbind(fish.step3$ratios.top, fish.step3$BIC.top)
#        row col         
# Bac/Hg   8  19 67.93744
# Bp/Hg    7  19 69.87134
# Jl/Hg    6  19 70.31554
# Jw/Bp    5   7 71.53671
# Jw/Jl    5   6 71.57122
# Jw/Bac   5   8 71.69294
# Fc/Hg   10  19 72.38560
# Hw/Bac   1   8 73.25325
# Jw/Fc    5  10 73.48882
# Hw/Bp    1   7 73.55621
# Suppose 5th in list, Jw/Jl (Jaw width/Jaw length), preferred at the first step
fish.step4 <- STEPR(morph, as.factor(habitat), method=1, 
                    previous=fish.step3$logratios.top[,5], family="binomial")
# [1] "Criterion increases when 2-th ratio enters"
fish.step4$names
# [1] "Bac/Hg"
# So after Jw/Jl forced in only Bac/Hg enters, the best one originally
# For the fish morphometric data, first close (normalize, although not necessary) 
# then loop over the 26*25/2 = 325 possible logratios stepwise
data(fish)
habitat <- fish[,2]
morph <- CLOSE(fish[,4:29])
# predict habitat binary classification from morphometric ratios
fish.step1 <- STEPR(morph, as.factor(habitat), method=1, family="binomial")
# [1] "Criterion increases when 3-th ratio enters"
fish.step1$names
# [1] "Bac/Hg" "Hw/Jl"
# perform logistic regression with selected logratios
fish.glm   <- glm(as.factor(habitat) ~ fish.step1$logratios, family="binomial")
summary(fish.glm)
fish.pred1  <- predict(fish.glm)
table(fish.pred1>0.5, habitat)
#     habitat
#        1  2
# FALSE 56 11
# TRUE   3  5
# (Thus 61/75 correct predictions)
#
# force the sex variable in at the first step before selecting logratios
# and using more strict Bonferroni default
sex <- as.factor(fish[,1]) 
fish.step2 <- STEPR(morph, as.factor(habitat), method=1, previous=sex, family="binomial")
# [1] "Criterion increases when 3-th ratio enters"
fish.step2$names
# [1] "Bac/Hg" "Hw/Jl"
# perform logistic regression with sex and selected logratios
fish.glm   <- glm(as.factor(habitat) ~ sex + fish.step2$logratios, family="binomial")
summary(fish.glm)
# (sex not significant)
#
# check the top 10 ratios at Step 1 to allow domain knowledge to operate
fish.step3 <- STEPR(morph, as.factor(habitat), method=1, nsteps=1, top=10, family="binomial")
cbind(fish.step3$ratios.top, fish.step3$BIC.top)
#        row col         
# Bac/Hg   8  19 67.93744
# Bp/Hg    7  19 69.87134
# Jl/Hg    6  19 70.31554
# Jw/Bp    5   7 71.53671
# Jw/Jl    5   6 71.57122
# Jw/Bac   5   8 71.69294
# Fc/Hg   10  19 72.38560
# Hw/Bac   1   8 73.25325
# Jw/Fc    5  10 73.48882
# Hw/Bp    1   7 73.55621
# Suppose 5th in list, Jw/Jl (Jaw width/Jaw length), preferred at the first step
fish.step4 <- STEPR(morph, as.factor(habitat), method=1, 
                    previous=fish.step3$logratios.top[,5], family="binomial")
# [1] "Criterion increases when 2-th ratio enters"
fish.step4$names
# [1] "Bac/Hg"
# So after Jw/Jl forced in only Bac/Hg enters, the best one originally

Dataset: TimeBudget

Description

This data set consists of the average percentage breakdown of time use into six categories, for 16 countries, split by males and females.

Usage

data(time)data(time)

Format

Data matrix containing the 32 x 6 matrix. Row samples are labelled by the two-character country code and m (male) or f (female).

Source

Greenacre M., Compositional Data Analysis in Practice, Chapman & Hall / CRC, 2018.

Variance of a vector of observations, dividing by n rather than n-1

Description

This function computes the usual variance but divides by n, not by n-1.

Usage

VAR(x)
VAR(x)

Arguments

`x`	Vector of values for which variance is computed

Details

To think of each of n observations weighted by 1/n this function VAR computes squared deviations from the mean and averages them. Thus, the sum of squared deviations is divided by n rather than by n-1, as for the unbiased estimate of the variance.

Value

The value of the variance.

Author(s)

Michael Greenacre

References

Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC.

Examples

data(cups)
cups <- CLOSE(cups)

# variances using base R function var
apply(cups, 2, var)

# variances using easyCODA function VAR
apply(cups, 2, VAR)
data(cups)
cups <- CLOSE(cups)

# variances using base R function var
apply(cups, 2, var)

# variances using easyCODA function VAR
apply(cups, 2, VAR)

Dataset: Vegetables

Description

This data set consists of the protein, carbohydrate and fat compositions of 10 different vegetables. Compositions are expressed as percentages.

Usage

data(veg)data(veg)

Format

Data frame containing the 10 x 3 matrix.

Source

US Department of Agriculture, https://ndb.nal.usda.gov/ndb/nutrients/index

Ward clustering of a compositional data matrix

Description

This function clusters the rows (or the columns, if the matrix is transformed) of a compositional data matrix, using weighted Ward clustering of the logratios.

Usage

WARD(LRdata, weight=TRUE, row.wt=NA)
WARD(LRdata, weight=TRUE, row.wt=NA)

Arguments

`LRdata`	Matrix of logratios, either a vector or preferably the logratio object resulting from one of the functions ALR, CLR, PLR or LR (usually CLRs will be used))
`weight`	`TRUE` (default) for weighted analysis (in which case weights are in the logratio object), `FALSE` for unweighted analysis, or a vector of user-defined column weights
`row.wt`	Optional set of row weights (default is equal weights when `row.wt=NA`)

Details

The function WARD performs a weighted WARD hierarchical clustering on the rows of an input set of logratios, usually CLR-transformed. (This would be equivalent to performing the clustering on all pairwise logratios). If the columns of the logratio matrix are unweighted, specify the option weight=FALSE: they will then get equal weights. The default weight=TRUE option implies that column weights are provided, either in the input list object LRdata, as LRdata$LR.wt, or as a vector of user-specified weights using the same weight option.

Value

An object which describes the tree produced by the clustering process on the n objects. The object is a list with components:

`merge`	an n-1 by 2 matrix. Row i of `merge` describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in `merge` indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.
`height`	a set of n-1 real values (non-decreasing for ultrametric trees). The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.
`order`	a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches

Author(s)

Michael Greenacre

References

Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC.

Examples

# clustering steps for unweighted and weighted logratios
# for both row- and column-clustering
data(cups)
cups <- CLOSE(cups)

# unweighted logratios: clustering samples
cups.uclr  <- CLR(cups, weight=FALSE)
cups.uward <- WARD(cups.uclr, weight=FALSE)   # weight=FALSE not needed here,
                                              # as equal weights are in object
plot(cups.uward)
# add up the heights of the nodes
sum(cups.uward$height)
# [1] 0.02100676
# check against the total logratio variance
LR.VAR(cups.uclr, weight=FALSE)
# [1] 0.02100676

# unweighted logratios: clustering parts
tcups <- t(cups)
tcups.uclr  <- CLR(tcups, weight=FALSE)
tcups.uward <- WARD(tcups.uclr, weight=FALSE)  # weight=FALSE not needed here,
                                               # as equal weights are in object
plot(tcups.uward, labels=colnames(cups))
sum(tcups.uward$height)
# [1] 0.02100676
LR.VAR(tcups.uclr, weight=FALSE)
# [1] 0.02100676

# weighted logratios: clustering samples
cups.clr <- CLR(cups)
cups.ward <- WARD(cups.clr)
plot(cups.ward)
sum(cups.ward$height)
# [1] 0.002339335
LR.VAR(cups.clr)
# [1] 0.002339335

# weighted logratios: clustering parts
# weight=FALSE is needed here, since we want equal weights
# for the samples (columns of tcups)
tcups.clr  <- CLR(tcups, weight=FALSE)
tcups.ward <- WARD(tcups.clr, row.wt=colMeans(cups))
plot(tcups.ward, labels=colnames(cups))
   sum(tcups.ward$height)
# [1] 0.002339335
LR.VAR(tcups.clr, row.wt=colMeans(cups))
# [1] 0.002339335
# clustering steps for unweighted and weighted logratios
# for both row- and column-clustering
data(cups)
cups <- CLOSE(cups)

# unweighted logratios: clustering samples
cups.uclr  <- CLR(cups, weight=FALSE)
cups.uward <- WARD(cups.uclr, weight=FALSE)   # weight=FALSE not needed here,
                                              # as equal weights are in object
plot(cups.uward)
# add up the heights of the nodes
sum(cups.uward$height)
# [1] 0.02100676
# check against the total logratio variance
LR.VAR(cups.uclr, weight=FALSE)
# [1] 0.02100676

# unweighted logratios: clustering parts
tcups <- t(cups)
tcups.uclr  <- CLR(tcups, weight=FALSE)
tcups.uward <- WARD(tcups.uclr, weight=FALSE)  # weight=FALSE not needed here,
                                               # as equal weights are in object
plot(tcups.uward, labels=colnames(cups))
sum(tcups.uward$height)
# [1] 0.02100676
LR.VAR(tcups.uclr, weight=FALSE)
# [1] 0.02100676

# weighted logratios: clustering samples
cups.clr <- CLR(cups)
cups.ward <- WARD(cups.clr)
plot(cups.ward)
sum(cups.ward$height)
# [1] 0.002339335
LR.VAR(cups.clr)
# [1] 0.002339335

# weighted logratios: clustering parts
# weight=FALSE is needed here, since we want equal weights
# for the samples (columns of tcups)
tcups.clr  <- CLR(tcups, weight=FALSE)
tcups.ward <- WARD(tcups.clr, row.wt=colMeans(cups))
plot(tcups.ward, labels=colnames(cups))
   sum(tcups.ward$height)
# [1] 0.002339335
LR.VAR(tcups.clr, row.wt=colMeans(cups))
# [1] 0.002339335

Package 'easyCODA'

Help Index

Compositional Data Analysis in Practice

Description

Details

Author(s)

References

See Also

Examples

Amalgamation clustering of the parts of a compositional data matrix

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Additive logratios

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Compositional bar plot

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Correspondence analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Bivariate confidence and data ellipses

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Closure of rows of compositional data matrix

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Centred logratios

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Dataset: RomanCups

Description

Usage

Format