Computes Pearson's linear correlation or partial correlation with p-values
Arguments
- data
The data set. It understand grouped data passed from
dplyr::group_by()
.- ...
Variables to use in the correlation. If no variable is informed all the numeric variables from
data
are used.- type
The type of correlation to be computed. Defaults to
"linear"
. Usetype = "partial"
to compute partial correlation.- method
a character string indicating which partial correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman"
- use
an optional character string giving a method for computing covariances in the presence of missing values. See stats::cor for more details
- by
One variable (factor) to compute the function by. It is a shortcut to
dplyr::group_by()
.This is especially useful, for example, to compute correlation matrices by levels of a factor.- verbose
Logical argument. If
verbose = FALSE
the code is run silently.
Details
The partial correlation coefficient is a technique based on matrix operations that allow us to identify the association between two variables by removing the effects of the other set of variables present (Anderson 2003) A generalized way to estimate the partial correlation coefficient between two variables (i and j ) is through the simple correlation matrix that involves these two variables and m other variables from which we want to remove the effects. The estimate of the partial correlation coefficient between i and j excluding the effect of m other variables is given by: r_ij.m = - a_ij a_iia_jj
Where r_ij.m is the partial correlation coefficient between variables i and j, without the effect of the other m variables; a_ij is the ij-order element of the inverse of the linear correlation matrix; a_ii, and a_jj are the elements of orders ii and jj, respectively, of the inverse of the simple correlation matrix.
References
Anderson, T. W. 2003. An introduction to multivariate statistical analysis. 3rd ed. Wiley-Interscience.
Author
Tiago Olivoto tiagoolivoto@gmail.com
Examples
# \donttest{
library(metan)
# All numeric variables
all <- corr_coef(data_ge2)
# Select variable
sel <-
corr_coef(data_ge2,
EP, EL, CD, CL)
sel$cor
#> EP EL CD CL
#> EP 1.0000000 0.2634237 0.1750448 0.3908239
#> EL 0.2634237 1.0000000 0.9118653 0.2554068
#> CD 0.1750448 0.9118653 1.0000000 0.3003636
#> CL 0.3908239 0.2554068 0.3003636 1.0000000
# Select variables, partial correlation
sel <-
corr_coef(data_ge2,
EP, EL, CD, CL,
type = "partial")
sel$cor
#> EP EL CD CL
#> EP 1.0000000 0.2938850 -0.2418441 0.3856626
#> EL 0.2938850 1.0000000 0.9110035 -0.1549749
#> CD -0.2418441 0.9110035 1.0000000 0.2454591
#> CL 0.3856626 -0.1549749 0.2454591 1.0000000
# }