Correlation Doesn't Equal Causation: Crash Course Statistics #8
Vložit
- čas přidán 13. 03. 2018
- Today we’re going to talk about data relationships and what we can learn from them. We’ll focus on correlation, which is a measure of how two variables move together, and we’ll also introduce some useful statistical terms you’ve probably heard of like regression coefficient, correlation coefficient (r), and r^2. But first, we’ll need to introduce a useful way to represent bivariate continuous data - the scatter plot. The scatter plot has been called “the most useful invention in the history of statistical graphics” but that doesn’t necessarily mean it can tell us everything. Just because two data sets move together doesn’t necessarily mean one CAUSES the other. This gives us one of the most important tenets of statistics: correlation does not imply causation.
Crash Course is on Patreon! You can support us directly by signing up at / crashcourse
Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
Mark Brouwer, Justin Zingsheim, Nickie Miskell Jr., Jessica Wode, Eric Prestemon, Kathrin Benoit, Tom Trval, Jason Saslow, Nathan Taylor, Divonne Holmes à Court, Brian Thomas Gossett, Khaled El Shalakany, Indika Siriwardena, Robert Kunz, SR Foxley, Sam Ferguson, Yasenia Cruz, Daniel Baulig, Eric Koslow, Caleb Weeks, Tim Curwick, Evren Türkmenoğlu, Alexander Tamas, D.A. Noe, Shawn Arnold, mark austin, Ruth Perez, Malcolm Callis, Ken Penttinen, Advait Shinde, Cody Carpenter, Annamaria Herrera, William McGraw, Bader AlGhamdi, Vaso, Melissa Briski, Joey Quek, Andrei Krishkevich, Rachel Bright, Alex S, Mayumi Maeda, Kathy & Tim Philip, Montather, Jirat, Eric Kitchen, Moritz Schmidt, Ian Dundore, Chris Peters,, Sandra Aft, Steve Marshall
--
Want to find Crash Course elsewhere on the internet?
Facebook - / youtubecrashcourse
Twitter - / thecrashcourse
Tumblr - / thecrashcourse
Support Crash Course on Patreon: / crashcourse
CC Kids: / crashcoursekids
This needs to be mandatory viewing for EVERYONE.
I second that!
"Correlation does not equal causation" was my old stats teacher's favourite phrase along with "always interpolate, never extrapolate." :)
Extrapolation is actually necessary in certain circumstances though - for example predicting growth of global human population, economic forecasts, environmental forecasts regarding climate change.... anything that has to do with the future.
Post hoc ergo propter hoc!
Bah. I know my rock keeps away tigers because I have never seen a tiger for as long as I have had it.
SilortheBlade Makes sense to me
Nicholas Cage movies are correlated by yet another unmentioned variable: summer. Nicholas Cage is an action movie star. Action movies are generally targeted for summer releases. Summer is also hot, which is the cause behind air conditioner sales and swimming, the latter of which is of course the cause of drowning.
Pfhorrest Or it could be that people who have endured a Nicholas Cage movie are more likely to drown themselves ...
That's true, but the data shows a close correlation over multiple years, not just over the seasons of a given year. It just so happens that the summers of years with more Nicholas Cage movies also happen to have more drownings.
*Me:* I used to think correlation implied causation.
*Me:* Then I watched this video. Now I don't.
*Friend:* Sounds like the video helped.
*Me:* Well, Maybe.
lol. Well, probably.
The video explains that it's not because two elements are correlated that one is the cause of the other. One '''can''' be the cause, but it's not logical to imply it just from their correlation. It was not the floor itself that broke the glass even though it is related to the breaking, it was it's impact with the glass, '''caused''' by gravity.
XKCD is a pretty good comic :)
Kachimbo somebody missed the joke
Herodotus Von 8428 no, someone got the joke, but felt the need to expand our knowledge.
This has been my favorite CrashCourse season by far. Really enjoying the material and the host!
A class on non linear relationships would be FANTASTIC :) And more classes in general (e.g., on general versus mixed effects models; GAMs etc...) Thank you for your dynamism!
Crash Course, thank you so much. This awesome course is definitively above the curve!
Puppy cat! I didn't know that they'd made a stuffed animal of him. This has greatly improved my day.
Better explanation then my university level stats class. 👍
Everyone needs to see this! Just because things seem connected on the surface doesn’t mean they’re related and Visa Versa!
psst. its vice versa, not visa versa
and if they're not connected then they DON'T CORRELATE. this shit's a red herring.
@@improover113 talking specifically about causal relationships, as the phrase states explicitly
Thank you so much for sharing. You're so much better at explaining than my professor.
Thank you!!!
Learned so much from this video.
Excellent video! Thank you!!!
“Mr. Fluffy misses you.”
*pouts thinking of the cat I don’t have missing me*
Love the series!!!
I wish all my scatterplots ended up making pictures of dinosaurs.
I haven't watched Nicholas Cage movies, AND I haven't drowned. Aha!
Thank you for thissss!!
Watching Stat for fun again.
Loved it!
Как же замечательно вы рассказываете! Даже переводить ничего не надо! (Russian is deliberate here)
Love this upload 😍
When she apologises for using imperial units......
"Air Cons, and Con Airs"
Amazing
"..if people blink more when they're lying!"
Our Professor: 😳
很棒的视频, 对学习统计学非常有帮助
wow! Thank you
this was an awesome video
Great video!!!😊
I was JUST reading up on this in class! 😂
Thank You.
Anecdotally, after playing Simpsons: Hit & Run (a GTA clone), I genuinely drove more recklessly for a little while. Not like I got into an accident, but like I was cutting corners tighter, and being a little heavier on the pedal. I had to work at it to knock it off. Really really good game though.
EXCELLENT!
Gain in my knowledge is perfectly correlated with the number of crash course videos I watch and shows the value of absolute +1 as the correlation coefficient #CrashCourse ..... 😁😁😁
Every time I see one of these videos I look at the view count and know that there's that many more people out there that are better educated about this topic and that makes me very optimistic for the future keep up the great work guys
without you guys i would not pass my exams thank you so much
...how do you fit a regression line through a circle (or fat ellipse) on a 2D-scattered, plot...
...how do you define accuracy where there are fewer data points, even though the fitted-curve looks similar, (do you overlay random information certitude measure sigma bars)...
*_...(in case you missed the first question: flip the plot axes for a different regression line...)_*
Thank u Crash Course
Islam xDDDD
Watching this video at work, miss my cat. Burst into tears
I feel some people go so far in this argument that they seem to argue the correlation disproves causation.
Eg. "thats only correlation it doesnt prove causation, obviously you are wrong"
Yes correlation doesnt prove causation, but it most definitely does not disprove causation. Further it might suggest causation, or that a 3rd factor is causing both phenomena to occur. Its frustrating to give data in an argument, to have the other side counter with, "thats only correlation, it doesn't prove causation, you are wrong."
EasySnake 100% agree
i've seen this too! It irks me to no end.
This is crash course statistics and statistics is all about probability?
i love this
Love this video and the channel, also - @1:43 You've spelled eruptions wrong...
Good episode, but some things would need exercise and ‘usage’ in order to be memorized well and longer-term, like r and r squared.
Hello great video
The example of changing the units on the y-axis is only relevant if you're not doing your dimensional analysis properly. If the slope of the feet-feet plot is 0.5, then the slope of the meter-feet plot is 0.15m/foot=0.5
This was the funniest Crash Course video I've ever seen. Her comedic timing is excellent. Though I still don't know if that clever mayor was a man or a woman.
I love this chanel
This needs to be essential viewing for EVERYONE.
You have mentioned that the steep line can have a strong correlation but there was no support of the graphics. Emphasis for users: the slope and correlation are different
So this was great. You are definetly one of my favorite crash course hosts. And I took statistics back in 1994. I have one question that boggles me. When and who is right, who determines the reality or that there is causation?
Example .... cigarette smoking and lung health. The negative effects are clearly visible, the correlation is there ... but is it really the cause? When and how do we get to a positive causality?
Or is it left to the interpreter? Or is it just all relative? Or by the end of the day it's meaningless and everyone can make the statement "correlation doesn't equal causality" and your data and beautiful charts and correlations just fizzle out?
That's the tricky part! Ultimately they all need to be interpreted. Overall, there is no true "proof", just higher levels of confidence. I am confident that the city of Paris exists, even though I've never been there. The process generally starts by asking "is this even possible?" and "Does this make some sense?" Then you can go back and try to find some other cause of the data you got. Eventually, you have to do experiments carefully. But even well-planned experiments can have hickups and biases (there have been many cases of seemingly high-confidence experiments not being repeatable by other professionals). Often, multiple experimenters need to come up with the same results on their own (and usually with their own equipment) before the scientific community is convinced. Overall, it's a difficult and time consuming process.
In health data like the lung example, there is a set of criteria called the Bradford-Hill criteria. Google it. This is criteria for determining if something can be considered causation. It is not a checklist: you still need to do your own scientific interpretation. But it’s a good way to get an idea of whether the data your looking at implies causation or not. The criteria are: effect size, consistency, specificity, temporality, biological plausibility, dose-response relationship, coherence, analogous results. Interestingly, Bradford Hill who came up with this list, is the same Hill who co-authored the original Doll and Hill paper that established the linked between smoking and lung cancer!
I've seen people both conflate correlation with causation in situations that are clearly coincidence and insist that correlation does not equal causation when the pattern of cause and effect are obvious.
Just noticed puppycat on her table! 💗
Please do more literature!!
@crash course team, not all the graphs in the datasaurus dozen shown in the end doesn't seems like having same correlation coefficient. Few look like having r=1, few r=0. Please correct me if I'm wrong
I love this series! However, you made one, small lie: R^2 does not have to be between zero and one, but can in fact be negative.
You spoke of the mx + b, but failed to mention what value it has to determine b (and if chose horribly wrong, it can give you negative R-values, due to estimate a model that is worse than random).
Keep up the series! :)
Squares of real numbers are always nonnegative, by definition. They can never be less than zero -- the square of -5 is 25, for example.
Those movie computer tick noises (when charts are presented) drive me mad, and I don't even have EQ in my setup to damp them down. Good vid though!
I was TRICKED into watching this by the title. How hard would it be to add, "WARNING! THIS IS STATISTICS, DWEEB" to what appears on my temptation screen?
It was really good.
While taking my stats course I started sleep talking and explained empirical rule to my mon
Cool-Cage Act; hilarious.
Its was hillarious ,the data present by the reporter.
We don't predict the temperature in Fahrenheit we calculate it using the formula (c*9/5)+32
Me: focus, you have a test this week
Also me: OMG PUPPYCAT!!
now go teach the media this so they can stop blaming video games for all the worlds problems
The Bee and Puppy-cat doll in the back is sooo cute (๑>◡
Squared correlation r^2
Line of regression
Can anyone explain a little more in depth standard deviation? Im still not sure what information it tells us in a scatter plot
I am also looking for that :/
wish you'd touch on poker. Math and Data is very important in poker
Any chance of crash course architecture (history of?)
*_...there'd be a negative-correlation where reducing air conditioning increases swimming..._*
*_...or, an overriding 'cause' leading to watching-speeding or doing-it, another, negrelation..._*
*_...so...what's the mathematically-concisely-stated-statistical-rule for causality-guessing..._*
*_...(making statistics, like modulo arithmetic: where compounded moduli may get better)..._*
At 0:27, it must have taken everything you had to not blink.
Correlation does not neccesarily state that causation is found between two variable.
However. don't walk away thinking correlation disproves causation. This isn't politics. There are more than two possibilities. (There are in politics too, but ignore that.) Thanks, and have a good day.
As a final note: Time taken to get from point a to point b is negatively correlated with speed. There is (by definition no less) causation there.
Sango, that's a good tip. But I fear that addressing people as the "scientifically illiterate" might not be the best way to get your message across. (What I would give for Crash Course: Rhetoric).
Everyone was illiterate (scientific and otherwise) at one point. It is one's duty to make sure they do not continue to be.
There is no causation only chaos.
It is absolutely true that everyone begins illiterate, and there should be no shame in that. However, referring to people as such can cause them to misinterpret your message as being condescending, even though you had no intention to be that way. Regardless, they are now slighted, and in retaliation, they ignore your advice, no matter how reasonable it was.
kaizersabre, there is no Dana, only ZOOL.
PuppyCattttt!!!! so cute
The first eruption scatter plot has a typo
If A caused B then there is a correlation between A and B.
The rising of the Sun caused the eating of an ice cream by John.
Therefore, there is a correlation between the rising of the Sun and the eating of an ice cream by John.
My question is, how would you quantify those events and plot the correlation between them on a graph? Would I count the number of times these events occurred? What if an event only causes another once? What if John died after the first ice cream? Can we still say that there was a correlation?
3:12
> Hummer, the epitome of in-your-face Americanness
> Russian license plate
Comment containing the word EVERYONE in caps lock.
Child Fs but why
Ok, you talked me into it.
containing the word EVERYONE in caps lock
In Pearsons study, did he take into account that people often shrink as they get older?
Does anyone know how to interpret a Bland-Altman analysis?
omg! PUPPYCAT 😭💗
Hmm. I may have needed this video 2 years ago when I was toiling in the halls of grad school
That's not the graph Jim Carrey and Jenny McCarthy showed me.
y = mx + b , is this some American standard? In Sweden it's y=kx+m
It doesn't really matter either way. The general consensus is that the last letters from the latin alphabet, i.e. x, y and z are being used as placeholderds for unknown quantities, whereas letters from the beginning (e.g. a, b and c) or middle (e.g. k, l, m and n) are being used as placeholders for known quantities (to be supplied or deduced when doing a specific example). The placeholders for know quantities may be different in different countries for many reasons (ease of pronounciation, legibility, tradition, etc.). Tradition usually also means that often the same equation uses different placeholders in math and physics. Example: in Math class the may use y = ax + b, in Physics class they may use y = mx + c, just because ... (and then of course in the kinetic equations this becomes e.g. v = at + v0 representing physical quantities).
Wow this is my doctor and his funny science
This was very interesting...though, I wonder, just how significant it is ? Can you give me a chi squared on that ?
1:20 They spelt eruptions wrong on the y-axis...
Beginning and end were in a circle.
The other shoe never dropped! So what does equal causation? :)
I’ll have you know that my cat, Mr. Whiskers, loves me.
Wait... Technically everything is connected. Maybe the relationship between 2 variables are correlated even tho it doesn't make sense that they cause each other, but that happens because these 2 variables are connected to other variables that we didn't observe yet these variables can indirectly influence the relationship between the main 2 variables we are comparing. So I guess that means, one way or another, correlation DOES imply causation. Error 404
Are regression lines ever parabolic?
What would be some examples if so?
optimum angle for maximum range. Range in terms of angle would have a turning point around 45 degrees where it reaches its max range then goes back down. one example
you can use a parabola to fit data. It would be polynomial regression where your x's are taken to various powers. Sometimes it's really useful to do so, since often data isn't perfectly linear.
You can "learn" more spurious correlations here:
www.tylervigen.com/spurious-correlations
and even discover new correlations here!
tylervigen.com/discover
Can we do a talk on how you DO identify causation, not just rule out plausible causal relations? Or are we taking a Humean view of causation and saying there is no real force of causation at all, just a fixed regularity that humans imagine happens?
I think it requires an experimental study
Mr. Fluffy does not miss me.
Mr. Fluffy ran away...
does the "r²=0.7" mean that we could predict accurately by 70% ?
yes
I don't know, Nic Cage may be dragging people to the deep after they see his movies. The evidence is there.
Is that a puppycat omg i wish to have it too
But if you have a time turner. . . ?
I watched this video without having seen the previous ones, and spent a considerable amount of time wondering "what the heck is an 'old faithful eruption' ?"
(For those who have the same problem: "Old Faithful" seems to be the name of a geyser. (I don't know where it is, but when an English CZcams show refers to a location, person, event or sports ritual you have never heared of, you can be pretty sure it's in North America.)
The narrator is very easy to listen too. Even I understood the content
i will like to confirm that is the equation of a line equals y=mx +b or y=mx+c