Missingno Python Library | Visualising Missing Values in Data Prior to Machine Learning

Sdílet
Vložit
  • čas přidán 7. 09. 2021
  • Missing data is probably one of the most common issues when working with real datasets. Data can be missing for a multitude of reasons, including sensor failure, data vintage, improper data management, and even human error. Missing data can occur as single values, multiple values within one feature, or entire features may be missing.
    It is important that missing data is identified and handled appropriately prior to further data analysis or machine learning. Many machine learning algorithms can’t handle missing data and require entire rows, where a single missing value is present, to be deleted or replaced (imputed) with a new value.
    If you haven't already, make sure you subscribe to the channel: / @andymcdonald42
    ----
    The notebook for this video can be found on my GitHub repository at: github.com/andymcdgeo/Andys_Y...
    There is a written version of this video available at: towardsdatascience.com/using-...
    Libraries used in this video:
    pandas: pandas.pydata.org
    missingno: github.com/ResidentMario/miss...
    Data Used in this video:
    Bormann, Peter, Aursand, Peder, Dilib, Fahad, Manral, Surrender, & Dischington, Peter. (2020). FORCE 2020 Well well log and lithofacies dataset for machine learning competition [Data set]. Zenodo. doi.org/10.5281/zenodo.4351156
    Books I Recommend:
    As an Amazon Associate I earn from qualifying purchases. By buying through any of the links below I will earn commission at no extra cost to you.
    PYTHON FOR DATA ANALYSIS: Data Wrangling with Pandas, NumPy, and IPython
    UK: amzn.to/3HNycJ9
    US: amzn.to/3DL7qPv
    FUNDAMENTALS OF PETROPHYSICS
    UK: amzn.to/3l1PgSf
    PETROPHYSICS: Theory and Practice of Measuring Reservoir Rock and Fluid Transport Properties
    UK: amzn.to/30UNWZS
    US: amzn.to/3DNqBbd
    WELL LOGGING FOR EARTH SCIENTISTS
    UK: amzn.to/3FHsbfn
    US: amzn.to/3CILAuE
    GEOLOGICAL INTERPRETATION OF WELL LOGS
    UK: amzn.to/3l2v2HV
    US: amzn.to/30UOTkU
    -----
    Thanks for watching, if you want to connect you can find me at the links below:
    / andymcdonaldgeo
    / geoandymcd
    / andymcdonaldgeo
    www.andymcdonald.scot/
    #missingdata #petrophysics #machinelearning #geoscience #missingno #python
  • Věda a technologie

Komentáře • 19

  • @akshaths2092
    @akshaths2092 Před 11 měsíci

    This is very useful. Concise and clear without digressing into other topics. Thank you.

  • @nimaparsa2686
    @nimaparsa2686 Před rokem +1

    It was really helpful. Your explanations were also crystal clear! Thanks

  • @mohammadkeshtkar9655
    @mohammadkeshtkar9655 Před 2 lety +1

    Thanks Andy for useful videos

  • @texasfossilguy
    @texasfossilguy Před 2 lety

    I just discovered this package myself while doing a project for my Aiml Cert. Very cool stuff.

    • @AndyMcDonald42
      @AndyMcDonald42  Před 2 lety

      It’s great. It’s one of my go to libraries when doing EDA

  • @sk45293
    @sk45293 Před 2 lety +1

    This is a cool library. I’ll check it out. Thanks!

    • @AndyMcDonald42
      @AndyMcDonald42  Před 2 lety

      No problem. It is a very small but powerful library

  • @khaldazabel1036
    @khaldazabel1036 Před 2 lety

    Thank you very much Andy , it s helpful

  • @kelvinmacharia3715
    @kelvinmacharia3715 Před 2 lety +1

    What's the scale on the left of the dendrogram? What does it represent for this data set? Are they percentage or actual numbers of the missing data? Or are they corelation values as in heatmaps? Kindly explain some more.

  • @ismailmohamed1019
    @ismailmohamed1019 Před 2 lety

    good job Bro.

  • @javeda
    @javeda Před 2 lety +1

    Can we change the rotation to 90 degrees in msno

    • @AndyMcDonald42
      @AndyMcDonald42  Před 2 lety +1

      I am not sure if this is possible as all of the plotting is handled by missingno.

    • @javeda
      @javeda Před 2 lety

      @@AndyMcDonald42 hopefully this is resolved in couple of months

  • @helenb8070
    @helenb8070 Před rokem

    Enjoying your youtube video series. Great explanations and very grateful for your generosity sharing code and notebooks.
    missingno is certainly very powerful for uncovering the promising correlations! I'm keen to use it but I keep getting an error...
    It's perhaps a long shot, but I would be so grateful if you knew how to manage this error when I run msno functions? Or could suggest how I could find out...
    I am a beginner at programming and Python, working on Mac.
    This one is when I run msno.heatmap(df)
    /Users/hb/opt/anaconda3/envs/myenv/lib/python3.9/site-packages/seaborn/matrix.py:305: UserWarning: Attempting to set identical left == right == 0 results in singular transformations; automatically expanding.
    ax.set(xlim=(0, self.data.shape[1]), ylim=(0, self.data.shape[0]))
    /Users/hb/opt/anaconda3/envs/myenv/lib/python3.9/site-packages/seaborn/matrix.py:305: UserWarning: Attempting to set identical bottom == top == 0 results in singular transformations; automatically expanding.
    ax.set(xlim=(0, self.data.shape[1]), ylim=(0, self.data.shape[0]))