How to FIX OUTLIERS in a Distribution (6-9)

Sdílet
Vložit
  • čas přidán 6. 07. 2024
  • The nature of the outlier determines how you should correct it. Some outliers can stay in the dataset. Data entry errors can be corrected. Other outliers can be Winsorized by replacing all outlier values with the highest reasonable value. In certain situations, you may choose to use an alternative non-parametric test. Only if you have no other options should you throw out data. The exception is multivariate outliers which must be removed.
    Chapters
    0:00 The causes of outliers
    1:52 Outliers caused by data entry errors
    1:52 Outliers caused by sensor malfunctions
    4:25 Outliers caused by sampling errors
    6:06 Jerks in data collection
    9:00 Options for fixing outliers
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    This video is part of a series for an introductory statistics course. It supports an academic course QBA 237 Basic Business Statistics at Missouri State University, College of Business, Department of Information Technology and Cybersecurity. Posted beginning in August 2022.
    This series was designed, written, produced, recorded, edited, and posted by Dr. Todd Daniel, a Ph.D. statistician and researcher with extensive experience in academic statistical research and statistics instruction. Dr. Daniel directed a research institute for many years before doing private consulting for Research by Design, LLC. He remains active in teaching and research.
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Statistics Instructors: you are free to link to this video and the playlist for your seated or online statistics course or for other educational purposes.
    Edited in Camtasia 2022
    Visual and audio content from DigitalJuice.com
    Music: Digital Juice Royalty Free Music
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Link to a Google Drive folder with any files that I use in the videos.
    drive.google.com/drive/folder...
    To download, hover your cursor over the file icon and a blue download icon will appear. You do not need to request access to a file.

Komentáře • 1

  • @gabriellemaheux1184
    @gabriellemaheux1184 Před rokem

    Hello, I'm new to the world of statistical analysis, so sorry if my question may seem basic! Also, English is not my first language so sorry for possible errors in my way of writing.
    I was wondering, in the case of analyzing data related to variables that are naturally not normally distributed, for example variables related to criminal acts, what is the best option? for example, I created a variable representing the level of exposure to violence in the community. This variable was created by combining the results of 6 questions of a scale (one that was intended to measure this variable) in order to obtain a continuous variable to use in my linear regression later . When I do the analyzes to identify outliers ​​in this new continuous variable, SPSS detects a lot of them. However, after verification, they are all legitimate... The same thing happens with all my variables related to crime (ex: the level of victimization for a specific crime, attitude towards crime, etc.). In short, I'm not sure how to handle outliers ​​for variables that are naturally not normally distributed. If I leave them there, I have way to many outlier to delete when I look for multivariate outlier (with Mahalanobis) and I think it's going to impact the rest of my analysis way too much. Videos about outliers in crime statistics seems to be rare, so any advice is more than welcome :)