Normal Probability Plots Explained (OpenIntro textbook supplement)

OpenIntroOrg

zhlédnutí 119 751

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 16. 12. 2016
Our accompanying textbooks on openintro.org/books, all of which are free to download. Hard copies are also priced to be affordable for students. (We price our books in a way that we hope ensures all students can get a hard copy if they want one.)
Topics covered in this video:
- Probability basics
- Disjoint / mutually independent
- Probability Distributions
- Complement
- Independence and probability
Video author, voice, and editor: David Diez.

Komentáře • 40

@Mahmoud-li2xn Před 3 lety ⁺⁹
Best explanation on CZcams for this topic, thank you.
@navjotsingh2251 Před rokem ⁺⁴
I loved this video. A nice follow up, would be a video where you go much deeper into the theory and explain the math behind these kind of plots. Thank you.
@rffairchild Před 3 lety ⁺¹
I agree with other comments. This is the best explanation of this topic on CZcams
@Valerie-ws3zr Před 3 lety
just what I was searching for ....... Nice job !!
@kittyxing Před 3 lety
Thanks for the video. How to generate the line for non-normal distributed data? I can understand that for the normal distributed data, the line has slope of STD and intercept of mean, then the x axis value is z score and y axis value is the actual data value. But how about the non-normal data set? how exactly to calculate the x axis value for each data point? how to calculate the y values for the straight line?
@Outlier_G Před rokem
I can't explain how best the video was. thanks 😊
@riccardomattea1240 Před 4 lety ⁺³
Probably the best explanation video out there
@jamesrobertson9149 Před 4 lety
I thought so too.
@rishisingh6111 Před 2 lety
Simply awesome! Thanks for shring this!
@ankmeyester Před 7 lety ⁺⁴
so, the x axis here is the z score values and the y axis is the actual values? and plotting it against one another as seen here, we should see how it lines up? the better the linearity, the more 'normal' the distribution?
@OpenIntroOrg Před 7 lety ⁺⁹
Basically yes :) The x-values are the Z-scores we expect if the population and sample are as perfectly normal as it could be. So the straighter the line, the more encouraging that the data are nearly normal. That said, no population is perfectly normal, and even a sample from a truly normal distribution will not be perfectly straight just due to random sampling. That said, the main goal of this type of plot is often as a basic check to ensure nothing too wonky is going on and the population is roughly normal.
@ricardofraser4243 Před 5 lety
seems like it
@gunning6407 Před 6 lety ⁺⁸
Recipe for QQ-plot (quantile-quantile) in R:
## In R, a key observation is that the "pnorm" and "qnorm" functions are inverses of each other.
## To construct a QQ-plot of N observations (random samples here):
##
## Number of observations
nn
@RobvanMechelen Před rokem ⁺¹
Here is exactly what I was looking for. Thank you very much!
@allanmuganga7075 Před 3 lety
Thanks for the video, it's been helpful. Kudos
@aCllips Před 5 lety ⁺⁶
Right vs. left skewness is depicted the opposite way. The picture on the left is skewed to the left, and the picture on the right is skewed to the right.
@OpenIntroOrg Před 5 lety ⁺²
Are you talking about the plots at about 8:30? The left plot has fewer observations strung out at higher values, which corresponds to right skewed (skew goes in the direction of the long tail). The reverse is true for the plot on the right.
@aCllips Před 5 lety ⁺¹
@@OpenIntroOrg Thanks for the response. I am sorry, I was wrong. It seems, one cannot decide skewness from the histogram which could be drawn based on the first examples in this video. Because the value axis goes from high values to low in those histograms. They would need to be "mirrored" first in order to decide skewness.
@maxrkmrose Před 4 lety ⁺¹
@@OpenIntroOrg Skewness specifically indicates that the MEAN of the data set is not equal to the MEDIAN of the data set. Side note for others: on the histogram, lower values of the data are to the left with higher values of the data to the right.
So a RIGHT skewed data set means that the MEAN of the data set is higher that the MEDIAN of the data set. There will be a higher density of observations to the left on the histogram. This concept seem opposite of what the histogram looks like, but the skewness is determined by the calculations from the data set.
A LEFT skewed data set means that the MEAN is lower than the MEDIAN. There will be a higher density of observations to the right on the histogram.
In a perfectly normal data set, the MEAN and MEDIAN will be approximately equal.
@ilanlivne4472 Před 4 lety
@@aCllips Thanks for this explanation
@araujopsy Před 3 lety
Very instructive, thanks
@robert8552 Před 4 lety
So, my data is skewed and non-normally distributed - What's to be done?
Do I perform some transformation to force normality, or do I rather just perform non-parametric tests?
@OpenIntroOrg Před 4 lety ⁺²
Unfortunately, it's easier to say "something might be risky or broken here" than it is to say "this is how to fix it". What is required will be highly dependent on the circumstances, both the data and what the goals are of the analysis:
- If the sample is large enough and / or the skew isn't severe enough, then non-normality will not matter for some statistical methods. For example, if all your observations are within ~4 SDs of the mean, there are 30+ observations, and the method being applied is a t-confidence interval for the mean, then the skew isn't much of a concern because the Central Limit Theorem will have kicked in to the point the skew won't matter much.
- A more robust method might help. However, be aware that not "nonparametric" does not automatically mean "robust". For instance, the bootstrap percentile method is less robust than t-distribution methods when the sample size is relatively small (
@vladimirtorres1181 Před 2 lety ⁺¹
Very useful!! Thank you
@gunning6407 Před 6 lety ⁺²
In the textbook, I found the QQ-plot explanation to be lacking. Here, too, a number of key attributes are missing. First off, we must order the empirical observations (y-axis), as noted in previous comments. An explicit definition of "quantile" in earlier lectures would set the stage here, motivating "theoretical quantiles": the quantiles of the standard normal associated with the empirical probabilities (e.g. regularly spaced probabilites).
@DavidDiez Před 6 lety ⁺²
Hi Gunning, thanks for the feedback. In short, this is a "special topic" that isn't covered in most intro stat courses (though some do cover it), so we breeze through on theory here and get to the practical application of the method. We don't expect anyone to walk away from this video able to reconstruct this type of plot -- only be able to read one.
@gunning6407 Před 6 lety ⁺¹
I updated my comment to put the "recipe" in a separate comment for curious readers. For context, I'm currently using the text to teach intro stats. This is my first semester with the department, but the department has used this text for several semesters.
I absolutely understand the concern about special topics and coverage. My *personal* feeling is that the text should either include a discussion of QQ-plot along with 3-4 sentences of discussion of construction, or omit it altogether. That said, I would argue that understanding how the plot is constructed is critical to correctly reading it!
@shokhrukhabduahadov3985 Před 5 lety ⁺³
so why it is so? why dont u explain the reason for not fitting the line
@m7mdsaleh523 Před 3 lety
Can we use the slope of the probability plot to measure the population variance of a sample?
@OpenIntroOrg Před 3 lety
The line doesn't quite represent this, especially when the distribution has longer tails than a normal distribution, so it is good to calculate the sample variance separately.
Also, sorry to nitpick, but a clarify to avoid confusion for others: we'd describe "population variance of a sample" as "sample variance", and to further remove any ambiguity, we divide by (n-1) when computing the sample variance (while population variance is often computed by dividing by n).
@mustafizurrahman5699 Před rokem
Simply splendid
@maxtok414 Před 3 lety
Thank you!
@tule9213 Před 2 lety
so touching for an excellent video
@harrygroundwater2590 Před měsícem
Very Helpful
@bensonmathew8679 Před 5 lety
Very helpful!
@garryarvindelgado4107 Před 7 měsíci
thank you
@sunilkumarsamji8871 Před 4 lety
Well, The name is Normal probability plots. a) Why are they called Probability plots? b) Why the plot between the observed data and z score is supposed to be a straight line? Well I can understand if the data fits well its a measure of goodness of the fit, however, I dont understand why this has to be a straight line
@StellaNimas Před 2 lety
im doing my thesis rn, and the data is not normal, what to do with this? 😭😭
@OpenIntroOrg Před 2 lety
Data is never perfectly normal, so you're in good company. Check out OpenIntro Statistics Section 7.1, which offers a couple of rules of thumb on the bottom of the first page of that section. The book is free online as a PDF from our website, see:
www.openintro.org/book/os
@gentle2005phir Před 5 lety
Good one

Další v pořadí

Automatické přehrávání

Probability Introduction (OpenIntro textbook supplement)