Pearson’s correlation coefficient: what it is and how to use it.

A type of coefficient used in descriptive statistics to see the relationship between variables.

When researching in psychology, descriptive statistics are frequently employed, offering ways to present and evaluate the main characteristics of data through tables, graphs, and summary measures.

In this article we will learn about Pearson's correlation coefficienta measure of descriptive statistics. It is a linear measure between two quantitative random variables, which allows us to know the intensity and direction of the relationship between them.

Descriptive statistics

Pearson's correlation coefficient is a type of coefficient used in descriptive statistics. Specifically, is used in descriptive statistics applied to the study of two variables..

Descriptive statistics (also called exploratory data analysis) is a set of mathematical techniques designed to obtain, organize, present and describe a set of data, with the purpose of facilitating its use. In general, it uses tables, numerical measures or graphs as support.

Pearson's correlation coefficient: what is it for?

Pearson's correlation coefficient is used to study the relationship (or correlation) between two quantitative random variables (minimum interval scale); for example, the relationship between weight and height.

It is a measure that gives us information about the intensity and direction of the relationship.. In other words, it is an index that measures the degree of covariation between different linearly related variables.

We must be clear about the difference between relationship, correlation or covariation between two variables (= joint variation) and causality (also called forecasting, prediction or regression), as they are different concepts.

How is it interpreted?

Pearson's correlation coefficient includes values between -1 and +1. Thus, depending on its value, it will have one meaning or another.

If the Pearson correlation coefficient is equal to 1 or -1, we can consider that the correlation between the variables studied is perfect.

If the coefficient is greater than 0, the correlation is positive ("The more, the more, and the less, the less). On the other hand, if it is less than 0 (negative), the correlation is negative ("The more, the less, and the less, the more). Finally, if the coefficient is equal to 0, we can only state that there is no linear relationship between the variables, but there may be some other type of relationship.

Considerations

Pearson's correlation coefficient increases if the variability of X and/or Y (the variables) increases, and decreases in the opposite case. On the other hand, to affirm whether a value is high or low, we must compare our data with other investigations with the same variables and in similar circumstances..

To represent the relationships of different variables that combine linearly, we can use the so-called variance-covariance matrix or the correlation matrix; on the diagonal of the former we will find variance values, and on the diagonal of the latter we will find ones (the correlation of a variable with itself is perfect, =1).

Coefficient squared

When we square the Pearson's correlation coefficient, its meaning changesand we interpret its value in relation to the forecasts (it indicates causality of the relationship). That is, in this case, it can have four interpretations or meanings:

1. associated variance.

It indicates the proportion of the variance of Y (one variable) associated with the variation of X (the other variable). Therefore, we will know that "1-squared Pearson coefficient" = "proportion of the variance of Y that is not associated with the variation of X".

2. Individual differences

If we multiply the Pearson correlation coefficient x100, it will be indicating us the % of individual differences in Y that are associated with / depend on / are explained by the variations or individual differences in X. Therefore, "1-squared Pearson coefficient x 100" = % of the individual differences in Y that is not associated with / depends on / is explained by the variations or individual differences in X.

3. Error reduction index

The Pearson's correlation coefficient squared can also be interpreted as an index of the reduction of error in forecastsi.e., it would be the proportion of the mean squared error eliminated using Y' (the regression line, constructed from the results) instead of the mean of Y as the forecast. In this case the coefficient would also be multiplied x 100 (indicates %).

Therefore, "1-squared Pearson coefficient" = error still made by using the regression line instead of the mean (always multiplied x 100 = indicates %).

4. Approximation index of the points

Finally, the last interpretation of the Pearson correlation coefficient squared would indicate the approximation of the points to the regression line. The higher the value of the coefficient (closer to 1), the closer the points are to Y' (to the line).

Bibliographical references:

Botella, J. Sueró, M. Ximénez, C. (2012). Análisis de datos en psicología I. Madrid: Pirámide.
Lubin, P. Macià, A. Rubio de Lerma, P. (2005). Psicología matemática I y II. Madrid: UNED.
Pardo, A. San Martín, R. (2006). Análisis de datos en psicología II. Madrid: Pirámide.

(Updated at Apr 13 / 2024)