Tuesday, March 27, 2018

Scatter Plots and Correlations



Scatter plots measure behavioral relationships between two items.  Sometimes, they may not appear to be related conceptually, while in other cases there may indeed be something going on.   

The fun part is to hypothesize whether there indeed is some sort of relationship between two things.  Next, you have to go out, find and collect the data to bolster your case.  The goal is to collect as many data points as possible, to make your findings statistically meaningful (or valid).  



Ideally, you want to have at least 100 data points to find a meaningful behavioral relationship between the two items.  You cannot make any meaningful discoveries if you have only a few data points (or pairs of data points, in this case).  In other words, you can’t generalize based on a small set of cases. 

Above is an example that plots the relationship between GDP per capita and life expectancy, among countries of the world.  Notice that there are two data sets, from 1952 and from 2007.   


1.    Is there a sufficient number of data points to draw any conclusions?  That is, is this study statistically valid? (R-squared: 0 to 1) 
2.   Is there a positive, negative, zero slope to the data points or are the results inconclusive? (Correlation coefficient: -1 to 0 to +1) 
3.   Does correlation imply causation? 
  

Some Interesting Correlation Scatter Plot Questions 

1.   Are higher temperatures positively correlated with higher sales? 
2.   Is the number of hours spent doing homework positively correlated with higher grades? 
3.   Is the number of hours spent on social media per person positively correlated with GDP per capita? 
4.   Is a person’s height positively correlated with age? 
5.   Is the probability that you will need a complete tooth extraction positively correlated with a person’s annual income level? 
6.   Is income equality (or lack thereof) positively correlated with political revolutions? 
7.   Is the # of cigarettes smoked by an individual positively correlated with incidence of cancer? 
8.   Is the quantity of alcohol imbibed by an individual positively correlated with dementia (Alzheimers, Parkinsons, severe amnesia, etc.)? 
9.   For males, is the amount of marijuana or hashish consumed positively correlated with sperm count? 
10.   Is the level of academic certification or degree attained positively correlated with a person’s annual income level? 
11. Is the number of languages spoken by an individual positively correlated with annual income levels? 

When answering these questions, provide a link or mention the source of a study that you used to answer each question.