Data Analysis 8: Classifying Data - Computerphile

Sdílet
Vložit
  • čas přidán 24. 08. 2024

Komentáře • 37

  • @Computerphile
    @Computerphile  Před 5 lety +5

    Check out the full Data Analysis Learning Playlist: czcams.com/play/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba.html

  • @zerokelvin3626
    @zerokelvin3626 Před 5 lety +6

    Great video! This training, validation and testing is relevant for modeling and simulation in general, and you would be surprised how many scientists and practitioners get this wrong.

  • @potatoMaster-wr3jz
    @potatoMaster-wr3jz Před 8 měsíci

    You explained so many machine learning concepts easily within 15 minutes of this video. But this video aint popular like your cryptography and cybersecurity stuff,explains what the general audience likes

  • @jurietheron
    @jurietheron Před 4 lety +2

    What a fantastic series! Will definitely rewatch it. I would love a video about image classification and validating results, confusion matrix ext.

  • @onuktav
    @onuktav Před 5 lety +17

    Computer says no 😁

  • @heyandy889
    @heyandy889 Před 5 lety +2

    that's pretty wild that you can automatically create a reasonable decision tree to classify arbitrary data towards an arbitrary target attribute.
    likewise one could imagine targeting the decision tree towards gender, or income; it sounds like the algorithm doesn't care, it just uses clustering techniques to best group the data to predict the target attribute.

  • @randomnessgameful
    @randomnessgameful Před 5 lety +5

    Love this series!

  • @Fractus
    @Fractus Před 4 lety +2

    The use of 'precision' here sounds more like 'accuracy' in a truly scientific sense, that being how well it reflects a 'true' or correct outcome. In this vein 'precision' would be more like the ability of the system to repeatedly classify similar data, or the same sets, to the same outcome.

    • @jlopezg8
      @jlopezg8 Před 4 lety

      In classification, the definitions for precision and accuracy differ from those commonly used in science. Precision is defined as the proportion of instances correctly classified as positive (true positives) among all the instances classified as positive (true positives + false positives). Accuracy, on the other hand, is defined as the proportion of instances classified correctly (true positives + true negatives) among all instances. So, for example, imagine 100 people take a medical test. 20 are diagnosed with a disease, and among those, 15 do have the disease. Furthermore, of the 80 people not diagnosed with the disease, 5 do have the disease, so 75 people are correctly classified as not having the disease. As a result, the precision of the test is 15/20 = 75%, while the accuracy of the test is (15+75)/100 = 90%.

  • @andresg3110
    @andresg3110 Před rokem

    You are absolutely handsome and brilliant! I'm so happy to learn from you such a smart and kind soul thank you for sharing your talent with the world

  • @4.0.4
    @4.0.4 Před 5 lety +2

    I really want a video just on Support Vector Machines! (Example: why would a traditional neural network outperform it?)

  • @DerDieDasBoB
    @DerDieDasBoB Před 5 lety +9

    Love the videos! He is really a good teacher - thanks for all the good explanations. but when i see the paper he draws on, it reminds me on 80's printer paper....are they still in use or what is it for?

    • @WhompingWalrus
      @WhompingWalrus Před 5 lety +5

      Idk if it's true or not, but I've heard that some universities bought a quinjabillion metric clucktonnes of that paper way back when it was expected to be used massively for a long time, so they hand it out gladly to whoever has a use for it now.

    • @jasonspence
      @jasonspence Před 5 lety +3

      That's exactly what it is, and it's the standard Computerphile paper in all of the videos

    • @veeek8
      @veeek8 Před 2 lety

      Yeah nice touch isn't it, makes me feel like it's the 80s again 😂

  • @leantide7880
    @leantide7880 Před 5 lety +12

    So if the data set contains such attributes as gender, race, religion, languages spoken, etc., the machine learning could make modeling decisions on loan approvals for instance heavily based on such factors. Interesting.

    • @SiddharthPrabhu1983
      @SiddharthPrabhu1983 Před 5 lety +6

      Yes. That's precisely why ethics in AI is such a growing concern. Many organizations are working to ensure that these kinds of biases do not inadvertently (or intentionally) make their way into ML-driven decision engines.

    • @snippletrap
      @snippletrap Před 4 lety +1

      Only if those attributes are positively correlated with, say, debt default.

  • @synchro-dentally1965
    @synchro-dentally1965 Před 3 lety

    I'm not sure what the majority of medical doctors would have to say, but I do hear apprehension on the use of AI to aid in diagnosing patients. Which is interesting, because wouldn't it just be another useful tool at their disposal, such as a stethoscope?

  • @jfg31416
    @jfg31416 Před 2 lety

    I loved the series, but I got a bit lost with this video. How does the content of video #8 relate to what was explained up to now? Does video #8 continue where video #7 left off, or does it take its output as an input in some way?

  • @ramixnudles7958
    @ramixnudles7958 Před 5 lety +3

    How is "validation" different from "testing"?

    • @MusicBent
      @MusicBent Před 5 lety +5

      Ramix Nudles here is how I imagine it.
      The training data was used for training your model (obviously) so running the model on training data will always show 100% accuracy.
      The testing data is used by the model developer and is used to analyze he performance. The developer can look into the results and see any obvious mistakes and try and correct for them.
      The validation data would remain invisible to the developer, and would represent ‘new’ data points that the mode would see in the real world after the model has been developed and deployed. It should also perform well for on this with 0 developer interaction or knowledge of the data.

    • @MusicBent
      @MusicBent Před 5 lety +3

      Also, nice profile pic 👌🏻

    • @ramixnudles7958
      @ramixnudles7958 Před 5 lety +1

      @@MusicBent :-D

    • @jlopezg8
      @jlopezg8 Před 4 lety

      @@MusicBent Pretty much, but you mixed up test and validation data. Validation data is used to evaluate the model after training, or even while it's training on the training data, and see if it needs tweaking to improve its performance. But to make sure we ourselves don't overfit the model to the validation data, we evaluate the model on data unseen by the model (test data) to give a final unbiased assessment of its performance.

  • @abhishektyagi4428
    @abhishektyagi4428 Před 5 lety +1

    Sir Could you please make a video explaining the resources you use to learn or enhance your programming skills

  • @Acampandoconfrikis
    @Acampandoconfrikis Před 3 lety

    I'm passing this exam thanks to you lol

  • @grainfrizz
    @grainfrizz Před 5 lety +2

    Neural network is gonna beat KNN, Tree, and SVM. But, no, I don't watch Siraj Raval anymore.

  • @KilgoreTroutAsf
    @KilgoreTroutAsf Před 5 lety +5

    So data classifiers are a new way of building uncompromising bureaucratic rules that escape peer-review and public oversight and not even their creators understand.
    Got it.

    • @4.0.4
      @4.0.4 Před 5 lety +3

      And that can be demonstrably (statistically) fairer (more likely to predict if you'll pay back your debt or not) than any human who decides based on emotion.

    • @KilgoreTroutAsf
      @KilgoreTroutAsf Před 5 lety +2

      @@4.0.4 What a wonderfully naive response.

    • @clarkkentglasses6443
      @clarkkentglasses6443 Před 4 lety +1

      @@4.0.4 who says the training data isn't biased?

    • @quillaja
      @quillaja Před 3 lety

      I love this comment.

  • @hammad8707
    @hammad8707 Před 5 lety

    lol ok