4/15
University of Huddersfield
SPX1005 Foundation Maths
Objectives
On completion of this session you will be able to:
Understand the strength of the relationship between two variables
Conduct correlation calculations
Produce correlation statistics and graphs in excel
Definition
Correlation is the term used to describe factors which vary together. In the early years of child development, height and weight are positively correlated (as one increases, so does the other ie, ‘the more…the more’). Interestingly, as infants/juveniles grow up (or age) the proportion of the body that the human head constitutes actually decreases - a feature known as negative correlation (as one increases the other decreases ie. ‘the more…the less’.
Spearman’s Rank Correlation Coefficient (Rs) is a simple calculation developed to show the direction (positive or negative) and the strength of the relationship between two variables. It answers the questions ‘as one variable increases, does the other increase likewise?’ or ‘as one variable increases, does the other decrease?’ The statistic Rs varies between -1 and +1 through zero.
+1 means a perfect positive correlation
-1 signifies a perfect negative correlation
Calculated values can be checked in statistical tables for ‘significance’. In general, values between +0.5 and +1.0, and between –0.5 and –1.0 are worth checking although calculations using large data sets may be ‘significant’ at lower values.
Spearman’s Rank Correlation can be applied to any type of data which can be ranked.
The formula for Spearman’s Rank Correlation Coefficient is Rs = 1 - ( 6 x d2 ) n( n2 – 1 )
Sample Data
The data set shown in Table 1 has been derived from measurements made of the mean radius of lichens growing on gravestones in Ripon, North Yorkshire (June, 2004).
Table 1: The mean radius of lichens growing on gravestones in Ripon, North Yorkshire (June 2004)
In this instance we want to know if there is a significant correlation between the age of the gravestone and the mean radius of the lichen. In other words, as the gravestones get older do they have more lichen?
Before the statistical test is performed the hypotheses must be stated
Null Hypothesis - H0, there is NO significant correlation between the gravestones and the lichens
Alternative Hypothesis – H1, there is a significant correlation between the gravestones and the lichens
Ranking
The first step is to rank the two variables of Age and Lichen. We rank each variable from highest to lowest in Microsoft Excel by adding extra columns called Rank1 (rank of Age data) and Rank2 (rank of mean radius data).
To rank these values you need to use the Ranking Formulae.
To rank the Age values, you must enter the following values in to Excel:
=RANK(number, ref, [order])
The number is value want to rank. In this case it is the value in cell B2 (32)
The ref is the range of all the values you are comparing your rank to. In this case, the values in B2:B9. Press the F4 key to store these
The order is either ascending (1), or descending (0). In this case it is 0 for descending.
Once you have calculated first rank, drag the formulae down to the remaining cells. Calculate the ranking for all the mean radius of lichen values the same way. Remember not to change the order of the data, it should stay exactly as it is for the next stage of calculations.
Rank Differences
Create 2 new columns and label them d (difference) and d2 (difference squared). This is to work out the difference of the two ranks for each gravestone.
To calculate d, use the formula:
=sum(Rank1 – Rank2)
To calculate d2, use the formula:
=sum(d*d) (Click on the cell containing the d value, press star then click on the d cell again)
Highlight both cells and drag them down to fill in the formulae for all the remaining cells.
You are now ready to perform the next stage of calculations.
The Equation
The