Originally Posted by MrRoboto
Based on your plot, if we only include only those with highest IQ and those with lowest IQ it definitely does show that those states with higher IQ tend to be center/left (less religious) while those with lowest IQ tend to be center/right (more religious)
Well, if we did that, then the trend line would look like so:
Let's call the highest IQ 104 and lowest 94-97 (there is only one data point for 94, and 7 for the rest). I don't know what states those are from just the plot you produced alone, so I'll just choose what is the highest and lowest. Well...I guess I could
figure them out by matching the numbers with the data set, but that'd be incredibly tedious
Looks pretty similar with the trend line; there is a huge gap of data. As one who does statistics, I'd look at that and say, "Well, obviously data are missing here and we aren't getting the full picture. Either they didn't take enough samples, or someone is leaving out data on purpose." Then I'd have to figure out if any of those are outliers and stuff and try to explain them and stuff; it'd be a big ol' mess.
Let's check the ANOVA (Analysis of Variance):
24.17% of the data are described by that trend line, and the data are now no longer statistically significant. It fails at the generally accepted p-value of 0.05.
So no, even if one were to bias the data and pick and choose what to show, it isn't statistically significant nor is it described by a trend line very well.
Picking a choosing data to show is a high form of bias; in order to get the full picture, all data must be shown and no data in the middle should be lost. That is why I said above that I was going to choose what is the highest and lowest IQ. What is
the highest and lowest? Are we going off of subjective intuition as to what we consider high, or are we going off of what is statistically significant as considered high or low based upon the mean IQ of all states? As you can see, it can become very fuzzy and really changes how data are produced big time.
Sometimes data are lost and can be interpolated through the trend line; a good example would be all the divorce stuff. I can tell SAS to go ahead and fill in the missing data from divorce rates through interpolation if I wanted to, but I'd have to record that that's how I got it. If a few data points are lost, then it's not too big of a deal. But if entire sections are lost, then it's a very big deal and introduces a tremendous amount of bias.
This is really good SAS programming practice
It's always good to review statistical concepts and such using real-world examples.