Pearson r-correlation Coefficient - University of Strathclyde



Pearson r-correlation Coefficient - University of Strathclyde
For example, we might want to know whether the relationship between maths achievement and parental social background is stronger or weaker than that between maths achievement and cognitive ability. Graphs could give us an indication, but will not give us an exact indication of the strength of the relationship. To do that, we need to calculate the correlation coefficient, which is a numerical indicator of the strength and direction (positive or negative) of the linear relationship between two variables. When working with continuous variables, as we have so far in this chapter, the correlation coefficient to use is Pearson’s r.

What does Pearson’s r do? The formula for Pearson’s correlation coefficient for two variables, x and y is computed as:
 (1)
where:
 and  are individual observations (e.g. the grade of a child in English () and the grade of the same child in maths ();
 and  are the means for variables X and Y (e.g. the mean grades in English and maths);
 is the number of cases
and  and  are the standard deviations of the two variables (English and Maths) respectively. 
Looking at formula (1), we see that what is actually happening is that the difference between the individual response and the mean for each variable is calculated. 

These are then multiplied for each individual case. This will give us a positive score if both are positive, so if the respondent scores above the mean on both variables the outcome will be positive. The same is true if the score on both is negative. 
If the respondent scores below the mean on both variables, the outcome will also be positive. If the respondent has a positive score on variable X, and a negative score on variable Y, the outcome will be negative. All these individual scores are then summed to get a total, which is then divided by the product of the standard deviations of both variables to scale it. This will give us the Pearson r correlation coefficient. 
Pearson r coefficient varies between –1 and +1, with +1 indicating a perfect positive relationship (a high score on variable X = a high score on variable Y), -1 a perfect negative relationship (a high score on X = a low score on Y), and 0 no relationship. 
  • The direction of the relationship: a positive sign indicates a positive direction (high scores on X means high scores on Y), a negative sign a negative direction (high score on X means low scores on Y)
  • The strength of the relationship: The closer to 1 (+ or -) the stronger the relationship.
When we want to make inferences from a sample to a population, we obviously want to calculate the statistical significance of the correlation coefficient as well as the effect size, as discussed in previous units. To do this, we use a test called the F-test. Fortunately, we do not need to do this by hand, as Pearson’s r correlation coefficients, and the associated test of statistical significance, can easily be calculated in SPSS.

Plea
Correlation and Variance

The amount of dispersion or spread in a set of scores can be described as Variance (the standard deviation squared) and is expressed in percentage terms. If one set of scores (X, e.g. Intelligence quotients) is correlated with another set (Y, maths scores), the correlation can be expressed as the percentage of variance in Y which is predicted by the variance in X. For example:
  • A correlation of 0.7 means that the variance in X predicts 49% of the variance in Y. [(0.7 x 0.7) /100]
  • A correlation of 0.5 means that the variance in X predicts 25% of the variance in Y. [(0.5 x 0.5) /100]
  • A correlation of 0.3 means that the variance in X predicts 9% of the variance in Y. [(0.3 x 0.3) /100] 
 For a sample
Pearson's correlation coefficient when applied to a sample is commonly represented by the letter r and may be referred to as the sample correlation coefficient or the sample Pearson correlation coefficient. We can obtain a formula for r by substituting estimates of the covariances and variances based on a sample into the formula above. That formula for r is:
r = \frac{\sum ^n _{i=1}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum ^n _{i=1}(X_i - \bar{X})^2} \sqrt{\sum ^n _{i=1}(Y_i - \bar{Y})^2}}
An equivalent expression gives the correlation coefficient as the mean of the products of the standard scores. Based on a sample of paired data (XiYi), the sample Pearson correlation coefficient is
r = \frac{1}{n-1} \sum ^n _{i=1} \left( \frac{X_i - \bar{X}}{s_X} \right) \left( \frac{Y_i - \bar{Y}}{s_Y} \right)
where
\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i, \text{ and } s_X=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2}
are the standard score, sample mean, and sample standard deviation, respectively.
Please read full article from Pearson r-correlation Coefficient - University of Strathclyde

Read full article from Pearson r-correlation Coefficient - University of Strathclyde

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts