SPEARMAN'S RANK:- A test for the relationship between two variables
---> Not only does this test highlight a relationship but it also indicates the strength and statistical significance of the relationship, if any
---> Not only does this test highlight a relationship but it also indicates the strength and statistical significance of the relationship, if any
---> You should always get a value between 1 and -1, if not then you have gone wrong somewhere! The closer to 1 the stronger the positive relationship, the closer to -1 the stronger the negative relationship.
---> Remember you have to compare you result to the critical values (I believe we get given these in the exam when appropriate) to see whether or not the relationship is statistically significant enough to accept and use to formulate conclusions regarding variables investigated.
Worked Example using Column C and Column F in AIB
The first thing you need to do is to set a positive and null hypothesis; the easiest way to do this is to say, 'There will be a relationship between x and y' and 'There will not be a relationship between x and y'. Although it seems a bit strange that you have to set a null hypothesis, it is actually more statistically reliable to disprove a null hypothesis than to prove the alternative hypothesis.
Positive Hypothesis: There will be a relationship between number of owner occupied dwellings as a % of total dwellings and of number of people in higher managerial and professional occupations as a % of total economically active.
Positive Hypothesis: There will be a relationship between number of owner occupied dwellings as a % of total dwellings and of number of people in higher managerial and professional occupations as a % of total economically active.
Null Hypothesis: There will not be a relationship between number of owner occupied dwellings as a % of total dwellings and of number of people in higher managerial and professional occupations as a % of total economically active.
Again, the easiest way to do this calculation is within a table. What you need to do is to rank each variable independently, giving 1 to the highest and so on. When you have two values that are the same you add together the places they would rank, then divide it by how many would rank there (e.g if three values would rank 13, you add together 13, 14 and 15 and find the middle value which is 14 and asign each one 14), then skip the same number of ranks to ensure that the lowest ranking number is given the same number as n (number of values) - sorry this is really hard to explain in words but hopefully will become a bit clearer in a minute!
Hopefully, the ranking part is quite logical. Within the first variable, when there was two 73, for example, that would both rank 14, you need to add 14 and 15 together, then divide by 2.
Again, the easiest way to do this calculation is within a table. What you need to do is to rank each variable independently, giving 1 to the highest and so on. When you have two values that are the same you add together the places they would rank, then divide it by how many would rank there (e.g if three values would rank 13, you add together 13, 14 and 15 and find the middle value which is 14 and asign each one 14), then skip the same number of ranks to ensure that the lowest ranking number is given the same number as n (number of values) - sorry this is really hard to explain in words but hopefully will become a bit clearer in a minute!
Hopefully, the ranking part is quite logical. Within the first variable, when there was two 73, for example, that would both rank 14, you need to add 14 and 15 together, then divide by 2.
14 + 15 = 29
29/2 = 14.5 so, rank both 14.5 then skip rank 15 and move onto 16 for the next ranking number
29/2 = 14.5 so, rank both 14.5 then skip rank 15 and move onto 16 for the next ranking number
After you have independently ranked both data sets, you need to work out the difference between the
ranks.....
Basically all you do, using first one as an example: 2 - 3 = -1
Then you need to square the difference, hence why it does not matter if the difference is negative.....
ranks.....
Basically all you do, using first one as an example: 2 - 3 = -1
Then you need to square the difference, hence why it does not matter if the difference is negative.....
S = 423
Now you have all the values you need so can just plug the numbers into the equation
Rs = 1 – √ (6 x Σd2)/n3 – n
Rs = 1 – √ (6 x 423)/183 – 18
Rs = 1 – √ 2538/5814
Rs = 1 – 0.436532507
Rs = 0.56
That is the calculation, itself, completed but now you have to see whether or not you can accept the relationship shown using critical values.
When n = 18
Critical Values:
95% = 0.4014
99% = 0.5501
---> Data reliability is related to the size of the sample. The more data you collect, the more reliable your result.
I think normally you are given these in the exam, but I suppose if the examiner was not feeling that nice, they could ask you to work out the 'Degree of Freedom', then use a graph to work out the values yourself!
Degree of Freedom = n - 2
So, in this example 18 - 2 = 16, so you draw a line up from 16 on the x-axis
---> After doing this, I can say that I can accept the positive hypothesis as it has a 99% significance, meaning that there is a relationship between the two variables. Looking at the Spearman's Rank value, I can then say that there is a relatively strong positive correlation within the data, so as the number of people in higher managerial and professional jobs increases so does home ownership in Poole. Without even knowing the first thing about Poole, this kind of makes sense, in a general context, but it is important to remember that just because the two variables correlate, it does not really prove anything - only further research can actually prove that one thing affects the other.
Now you have all the values you need so can just plug the numbers into the equation
Rs = 1 – √ (6 x Σd2)/n3 – n
Rs = 1 – √ (6 x 423)/183 – 18
Rs = 1 – √ 2538/5814
Rs = 1 – 0.436532507
Rs = 0.56
That is the calculation, itself, completed but now you have to see whether or not you can accept the relationship shown using critical values.
When n = 18
Critical Values:
95% = 0.4014
99% = 0.5501
---> Data reliability is related to the size of the sample. The more data you collect, the more reliable your result.
I think normally you are given these in the exam, but I suppose if the examiner was not feeling that nice, they could ask you to work out the 'Degree of Freedom', then use a graph to work out the values yourself!
Degree of Freedom = n - 2
So, in this example 18 - 2 = 16, so you draw a line up from 16 on the x-axis
---> After doing this, I can say that I can accept the positive hypothesis as it has a 99% significance, meaning that there is a relationship between the two variables. Looking at the Spearman's Rank value, I can then say that there is a relatively strong positive correlation within the data, so as the number of people in higher managerial and professional jobs increases so does home ownership in Poole. Without even knowing the first thing about Poole, this kind of makes sense, in a general context, but it is important to remember that just because the two variables correlate, it does not really prove anything - only further research can actually prove that one thing affects the other.
So, that is the last of the stats test that AS students need to know for their skills exam - I hope these posts have been helpful to you as well, there is one on fieldwork sampling on its way which might also be useful, and I hope the revision for your first A-level Geography exam is going well - unfortunately, for us A2 students, the stats test don't stop there; Mann Whitney U and Chi Squared are on there way!
No comments:
Post a Comment