PPMCC (only describes linear relation
between two continuous (interval / ratio scale) variables):
[Correlation coefficient (rxy)
is a statistic]
The
correlation coefficient (rxy)
is a measure of the strength of association between two continuous (interval /
ratio scale) variables. It reflects how closely scores on two continuous variables
go together. The more closely two variables go together, the stronger the
association between them and the more extreme the correlation coefficient.
Mathematically, PPMRC
(rxy) is
defined as the ratio of the covariance
of two continuous variables and the product of their standard deviations. It measures the strength of the linear
relationship between normally
distributed variables. When the variables are not normally distributed or
the relationship between the variables
is not linear, it may not be the appropriate method (Spearman rank
correlation method would be more appropriate).
rxy =
Covxy / SxSy = (Sum of products of errors/n-1) / SxSy
Sum of product of
rank orders is divided by n – 1 to
standardize two variables against variability (it is equalizing the
contribution of both).
Important points about PPMCC:
1.
rxy is a scaled (normalized) measure of the covariance.
2.
It
shows correspondence between the rank
orders and is used to establish validity and reliability of the instrument.
rxy is high means the rank orders of the two variables are close
to each other and vice versa (disruption in rank means low r).
3.
Covariance (Covxy –
variance of both variables together) is different than that of
coefficient of determination (r2 – proportion of variance in Y that can be accounted
for by knowing X).
4.
rxy gives combined rank of two variables
(interval / ratio scale), actually compares rank orders of two variables.
5.
Correlation
coefficient (rxy) indicates
magnitude (0 to 1) or intensity and direction (negative and positive). If the data points fall in a random pattern, the correlation is equal to
zero.
6.
Outcome variable is called the response or dependent variable (Y) and
risk factors and confounder are called the predictors, or explanatory or independent variables (X).
7.
It does not make any difference which
variable is plotted in which axis as far as no prediction to be made. But if a
prediction to be made then, by convention,
predictive variables are plotted in the x-axis and outcome variables
in the y-axis.
8.
X and Y variables
can be measured entirely on different
scales. Change in scale does
not hamper the correlation because PPMCC
does not depends upon scales (rxy
does not have any unit in its own, it is a ratio).
9.
Pearson correlation coefficient, r, does not represent the slope of the line of best fit, it only shows
the direction of the relationship,
uphill or downhill.
10.
rxy has nothing to do with the
mean differences. Same means of two
data scores does not tell anything about
the relationship (rxy).
11.
rxy
= ryx (Scatterplot will be same,
but slope ‘b’ will change)
12.
Every correlation (rxy) has two slopes and two intercepts (when Y as a function of X and when X as a function
of Y).
13.
A correlation of 0 does not mean zero relationship between
two variables; rather, it means zero linear relationship. (It is possible for two
variables to have zero linear relationship and a strong curvilinear
relationship at the same time.)
14.
Correlation does not imply causation: Two
variables may be related to each other, but this doesn’t mean that one variable
causes the other.
15.
Because the two variables
are paired through a linear equation (for them to show a linear
correlationship) which is a logical
relation between X and Y.
16.
Larger sample size makes the correlation more
stable. Large sample size is a pretty good reason to trust
on the correlation. Small sample size does not provide accurate picture of the
correlation, I mean a single outlier makes a huge difference in the
correlation.
17.
Correlation can be understood by various
means: Scatterplots, slope of the regression line, variance interpretation (The squared correlation coefficient (r2) is the
proportion of variance in Y that can be
accounted for by knowing X. Conversely, it is the proportion of variance in
X that can be accounted for by knowing Y).
18.
The
correlation coefficient is the slope (b) of the regression line when both the X
and Y variables have been converted to z-scores. The larger the size of the
correlation coefficient, the steeper the slope.
19.
Linear relationship is described by for every one-point increase in one variable, you get a
four-point increase in the other variable.
20.
A PPMCC is appropriate to describe when X increases, Y decreases by the same amount.
21.
Pearson Product Moment Correlation can be used
to express the degree of relationship for:
1.
For every
extra year of growth in a pine forest, you can expect an increase of 10,000
board feet,
2.
Strenuous
exercise results in large weight loss, moderate exercise maintains weight at
current levels and no exercise produces gains in weight.
22.
The higher the correlation between X and Y,
then more accurate the resulting
predictions are.
23.
We can have strong relationship between two
variables bust still have a low correlation coefficient when: Relationship is non-linear and the variances are truncated (cut
off)
24.
We can’t say that correlation coefficient
is not proper for those data where r = 0 as we really don’t know about that.
25.
Potential
problems with Pearson correlation: The
PPMC is not able to tell the difference
between dependent and independent variables. For example, if we are trying
to find the correlation between a high calorie diet and diabetes, we might find
a high correlation of .8. However, we could also work out the correlation
coefficient with the variables
switched around. In other words, we could say that diabetes causes a high
calorie diet. That obviously makes no sense.
Guilford’s
Interpretation:
< 0.20 – Slight, almost negligible relationship
0.20 – 0.40 – Low (weak) correlation, definite but
small relationship
0.40 – 0.70 – moderate correlation, substantial
relationship
0.70 – 1.00 – very high (strong) correlation, very
dependable relationship (their rank orders might be close to each other, scores
with one variable grows with the other)
These notes are written by S C Joshi during EPSY 635 Course, Fall 2015, Texas A&M University. Acknowledgements to Dr. Bob Hall, Professor, EPSY, Texas A&M University for his assistance in understanding these terms during the course
No comments:
Post a Comment