Use Scatter Diagrams & Correlation To Determine The Relationship Between Variables

Posted by on Dec 22, 2019 in Correlation Coefficient, Scatter Diagram | 0 comments

A scatter diagrams is a useful tool that shows the linear relationship between two variables.  The graph is show with one variable on the vertical axis (y-axis) and the other on the horizontal axis (x-axis).

Let’s take a simple example.  Let’s say we want to determine if there is a linear relationship between the amount of money we make each month and the amount we spend each month.  We’ll make two columns, putting the amount we make in one and the amount we spend in the other.  We’ll label the amount we make as the x-axis and the amount spent as the y-axis.

$ Made    $ Spent
5500        3900
4900        3750
5300        3870
5000        3800
6000        4850
5800        4350
5500        4020
6200        4900
5950        4500
6475        5275
A scatter diagram would look as follows:
As you can see, as the amount of money we make increases so does the amount spent increase.  But how strong is the relationship between the amount made and the amount spent?  In order to determine this we need to understand something called correlation.
The scatter diagram suggests there is a strong positive relationship between our two variables.  To help us see this more clearly, we’ll add a best-fit line to our graph that will allow us to see this relationship more clearly.  This is shown by the second scatter diagram below.


Now in order to understand how strong the relationship is we use the Pearson correlation.   coefficient, r.  The formula used to calculate correlation is rather complex, so I’ll spare you that agony and explain it in simple terms.  r must always fall between -1 and +1 inclusive.  A strong positive linear correlation between x and y is reflected by a value of r near +1, while a strong negative linear correlation, as illustrated when one variable increases, the other decreases, is indicated by a value of r near -1.  If r is close to 0, we conclude there is no significant linear correlation between x and y.  Using Minitab, the correlation coefficient for our two variables is 0.955 which is a very strong, positive correlation.
Now I want to caution you about something when using r, the correlation coefficient.  Even though the relationship is strong, cause and effect cannot be inferred, i.e., earning more money doesn’t cause you to spend more.  In order to determine cause and effect, other tools such as Design of Experiments would need to be used.

Leave a Reply

Your email address will not be published.