Data Analysis 1: What is Data? - Computerphile

Sdílet
Vložit
  • čas přidán 24. 08. 2024

Komentáře • 157

  • @Computerphile
    @Computerphile  Před 5 lety +18

    Check out the full Data Analysis Learning Playlist: czcams.com/play/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba.html

  • @neumdeneuer1890
    @neumdeneuer1890 Před 5 lety +208

    Please make a video where Mike is just going aroud the campus pointing on things, persons and asking the camera whether this is data ;)

  • @GoatzAreEpic
    @GoatzAreEpic Před 5 lety +143

    HAAHHAHAHA THAT EDITING OF THE RACE IS AMAZING I LOVE THIS CHANNEL

    • @JesusisAlive_33
      @JesusisAlive_33 Před 5 lety +4

      thats one part of this channel i really admire

    • @MrFeupinha
      @MrFeupinha Před 5 lety +2

      Freaking love Mike and Computerphile.

  • @DarrinCullop
    @DarrinCullop Před 5 lety +42

    Shout out to Sean for saying "01" instead of "1" so that it sorts properly with episode "10".

  • @returnexitsuccess
    @returnexitsuccess Před 5 lety +197

    4:56 "Degrees kelvin"
    *several physicists are typing*

    • @raining1975
      @raining1975 Před 5 lety +1

      returnexitsuccess , can’t you convert Celsius to kelvin and change it from interval to ratio?

    • @returnexitsuccess
      @returnexitsuccess Před 5 lety +29

      @@raining1975 Yes you can, my comment was referring to the fact that Kelvin is not a "degree" scale, it is an absolute scale.

    • @nahco3994
      @nahco3994 Před 5 lety +9

      @@returnexitsuccess Isn't it the same with his pH example? The pH scale is logarithmic, not linear, and it measures the concentration of hydrogen ions. So I got the feeling that it wasn't really the best example for what he was trying to say.

    • @harrysvensson2610
      @harrysvensson2610 Před 5 lety +22

      @@returnexitsuccess I am so furious with you because you taught me that Kelvin is not referred to with degrees.
      I hate this.
      I want to write 3° K. But because of you I won't. I'll write 3 K instead.
      Thanks a lot for ruining my day.

    • @misterhat5823
      @misterhat5823 Před 5 lety +1

      Don't forget the Rankin scale.

  • @valudax00
    @valudax00 Před 5 lety +35

    That race animation was just pure gold! And as always - dr Mike Pound is great. Keep up the awesome work!

  • @fritsrits7591
    @fritsrits7591 Před 5 lety +169

    You promised to use

    • @flurki
      @flurki Před 4 lety +2

      I think he said he was going to try to, that's not a promise :P

  • @goagalg784
    @goagalg784 Před 4 lety +5

    When the four data types were enumerated, we should say that the third, the interval data, allow for two main operations:
    1.differences (or delta): 30 degrees minus 20 degrees = 10 degrees is a meaningful calculation;
    2. Ratios of such differences. E.g. (50-30)/(30-20)=2 is meaningful in the way that it takes twice caloric energy to change a body’s temperature from 30 to 50 degrees than to change it from 20 to 30

  • @aungthuhein007
    @aungthuhein007 Před 5 lety +8

    Computerphile doing what the best stats books couldn't do in a 10-minute video! Thanks a whole ton Mike! This is priceless for me!

  • @harodl6663
    @harodl6663 Před 5 lety +28

    love the Netflix aproach of releasing the whole season at once

  • @userou-ig1ze
    @userou-ig1ze Před 5 lety +170

    he's so upbeat it's hard to believe he's in academia

    • @MrCmon113
      @MrCmon113 Před 5 lety +7

      Fortunately I swallowed my drink before reading that comment.

    • @userou-ig1ze
      @userou-ig1ze Před 5 lety +1

      @@MrCmon113 but why? Too much gritty disillusionment?

    • @MrFeupinha
      @MrFeupinha Před 5 lety +3

      I love how much passuion he has, Mike is my favorite!

    • @ethank5681
      @ethank5681 Před 4 lety +1

      Toxic academics are the worst

    • @userou-ig1ze
      @userou-ig1ze Před 4 lety +2

      @@ethank5681 They're the new 'normal'

  • @spicybaguette7706
    @spicybaguette7706 Před 4 lety +3

    I can imagine him going around campus asking about random objects: "is this data?"
    "What's wrong with Dr. Pound?"
    - Nothing, he's just like that

  • @steamyjungleman
    @steamyjungleman Před 5 lety +13

    As a data analyst, a lot of people get really caught up in r and python. They are great tools for sure, but, if you have access to a database then nothing beats good old fashioned sql for sorting, cleansing and transforming and quick analysis.

    • @HenrikDalsager
      @HenrikDalsager Před 5 lety +1

      Except sql and big data may be difficult to combine if there is more data than SQL is designed for, in those cases where your data Is stored in something like hadoop, you may not be able to produce a schema that allows you to utilize SQL.

    • @steamyjungleman
      @steamyjungleman Před 5 lety +2

      @@HenrikDalsager fair point, my point is that no all data analysis involves such massive data sets and when discussing data analysis for new people sql should be mentioned also.

    • @SwiftUIByExample
      @SwiftUIByExample Před rokem

      MapReduce != SQL

  • @VladVladislav790
    @VladVladislav790 Před 5 lety +38

    Now I kinda want to know avarage hex value of people's favorite colors

  • @JTedam
    @JTedam Před rokem +1

    Mike. You are phenomenal. You explain like you understand, as you do. Clear and precise.

  • @davesextraneousinformation9807

    What is Data? Our cat is Data. Lieutenant Commander Data is his full name.
    Thanks for this series on Data Analysis!

  • @jasonmaritz6269
    @jasonmaritz6269 Před 4 lety +2

    its great to know that an 11 min video taught me more about statistics than my stats module did...
    cleared up a lot of confusion they left me with xD

  • @EDoyl
    @EDoyl Před 5 lety +23

    I'm a bit dissapointed that intro didn't continue for the whole "what is data?" video, with Mike holding up different things and asking "is this data?" for up to 12 minutes

    • @AbCd-kq3ky
      @AbCd-kq3ky Před 5 lety

      People are gonna have a field day counting the 'data's in this playlist

  • @kirasguardian6328
    @kirasguardian6328 Před 5 lety +3

    This is excellent, thanks Dr Mike Pound and Computerphile! Really looking forward to drilling into this.

  • @Lightn0x
    @Lightn0x Před 5 lety +8

    4:47 I think we CAN tell if a pH is double another one. A pH of 7 is exactly 10 times higher than a pH of 6 (i.e. the molar concentration of a hidrogen solution is 10 times higher).

    • @Lightn0x
      @Lightn0x Před 5 lety +7

      Also.. about degrees Celsius, sure you can't dirrectly say that 100 is 2 times hotter than 50, but if you convert them to kelvin, you get 373 and 323, so indirrectly you can say that 100 degrees Celsius is 15% hotter than 50 right?

    • @Lefaseer
      @Lefaseer Před 5 lety +2

      @@Lightn0x This is what I was wondering too. If you can convert an interval feature to a ratio feature so trivially then what's the point?

  • @atrijitdas1704
    @atrijitdas1704 Před 5 lety +18

    "Is This Data?" shouldve been an absurd comedy skit on the Eric Andre Show or something

  • @JTedam
    @JTedam Před rokem

    You need a Udemy course, Mate. You are an excellent teacher. You get into the minds of people.

  • @HesderOleh
    @HesderOleh Před 5 lety +5

    Isn't pH a ratio type but on a log scale. Yes, there is no "zero", but you can calculate what is double a pH value, or more easily what is ten times a given pH value.

    • @zachhugo7424
      @zachhugo7424 Před 3 lety +1

      I came here to say the same thing. I think you're right

  • @mathiaswittig5249
    @mathiaswittig5249 Před 2 lety +1

    Proudly supported by thinkpad :D

  • @MrKZee
    @MrKZee Před rokem

    If you make a course - I will probably sign up to that platform and go through it, because I love the channel and I love how people talk about things.

  • @hrly.d5745
    @hrly.d5745 Před 4 lety

    Well Thankyou doctor.. I don't understand at all when my teacher explain. But now I understand.. THANKS A LOT

  • @manarlab84
    @manarlab84 Před 3 lety +3

    Thank you for the great learning series. How can I get the csv files for this lesson and each lesson?

  • @sillybuttons925
    @sillybuttons925 Před 5 lety +1

    Nice series. Hope to see more like this on many topics.

  • @GuruEvi
    @GuruEvi Před 5 lety

    Data analysis in Excel is generally not done because it has lots of quirks once you go deeply into the formula's, Microsoft has included a lot of bugs for the sole purpose of being backwards compatible. Things like dates and references introduce mistakes that are often hard to spot.

  • @suleimanmustafa1473
    @suleimanmustafa1473 Před 5 lety +3

    Data is a Lieutenant Commander on the USS Enterprise-D

  • @tonalddrumpboe5151
    @tonalddrumpboe5151 Před 2 lety +1

    NOIR data types (nomrinal ordinal interval, and ratio):

  • @vapourmile
    @vapourmile Před 5 lety +1

    This is two videos. Up to 5:24 you just can that and have it as a separate video.
    It's weird having been in computing and never having had it explained this way to me before. They didn't even teach this on my computer science degree.
    Before I say this video data to me was just anything in a format that enables it to be processed by machine.
    Because of that though, I'm suspicious: Does data really always fall neatly into one of these categories? Colours, for example, seem to fall into more than one: Yes, you can have nominal red, green, blue, cyan, magenta, yellow, white, but you can also have them stored as RGB values, or a HSV/HSB values. You can have it represented on a CIE graph, and you can have it as radiation frequencies.
    In the real world, does data really always fall neatly into one of the jars?

  • @ButzPunk
    @ButzPunk Před 5 lety

    It seems like the line between a nominal attribute and an ordinal one is a little blurry. I'd argue that you can order weather types by how conducive they are to outdoor activity (e.g., sunny > overcast > rainy).

  • @Bolt6265
    @Bolt6265 Před 5 lety

    The beginning of this video is very good out of context

  • @jamboort
    @jamboort Před 5 lety +2

    Data is a character in Star Treck

  • @astropgn
    @astropgn Před 5 lety

    If you have a kind of interval data like pH, you can't say that pH 10 is twice as much as pH 5, ok. But pH is a kind of data that can be converted to ratio data (since pH = -log([H+])). So, in a sense, since you can't convert easily with a formula, pH itself might not be rational, but you could rationalize it with just a couple more steps.

  • @MrNateDD
    @MrNateDD Před 5 lety +2

    I was really hoping he would hold a picture of Brent Spiner in the beginning

  • @HappyAccident06
    @HappyAccident06 Před 5 lety +1

    Massive missed opportunity was not holding a pic of Commander Data... “is this data??”

  • @ethank5681
    @ethank5681 Před 4 lety +1

    Is this data?
    No...
    This is... SPARTAAAA

  • @velocirapture89
    @velocirapture89 Před 4 lety +1

    Pretty sure I asked this same question during Star Trek: First Contact.

  • @TomHarrisonJr
    @TomHarrisonJr Před 5 lety

    Does ratio data necessarily have a linear scale? That is, is 2 necessarily half of 4? Some measures are expressed as a logarithm, such as the Richter scale, but having a zero that means there's none, so in this case Richter 6 is 10X Richter 5. Of course a logarithm can always be expanded to a unitary value, so my question is more about practicality.

  • @ginavong401
    @ginavong401 Před 5 lety +1

    Are the datasets used in this series available for download if you want to follow along? I can't find a link in the description...

  • @gamzeakkus6020
    @gamzeakkus6020 Před 3 lety

    Can I use primary and secondry data together

  • @klemenkobau1380
    @klemenkobau1380 Před 5 lety

    Isn't all interval data also ratio? For example if you subtract the left most value in the interval representation to get the zero? Or is the problem here that you don't always know the left most variable, since you only know a sample that represents the underlying truth.
    I was just confused since he said that degrees Celsius is interval data, while it can be easily translated to Kelvin that is ratio data.

  • @AlphaCrucis
    @AlphaCrucis Před rokem +1

    He should've held up a picture of Data from Star Trek at the beginning. Is this data?

  • @zerokelvin3626
    @zerokelvin3626 Před 5 lety +3

    9:02: Do R dataframes start at 1? 😨

  • @luketaylor5175
    @luketaylor5175 Před 3 lety

    Thanks. Bro

  • @MAJNKRAFTXJASI
    @MAJNKRAFTXJASI Před 5 lety +1

    0°C + 273.15 = 273.15K
    ?

  • @atrijitdas1704
    @atrijitdas1704 Před 5 lety

    why cant we calculate the mean of Ordinal data? But we can for Interval?
    why does mean temperature make sense but mean star rating not?

  • @SwiftUIByExample
    @SwiftUIByExample Před rokem

    won't negative wind mean no atmosphere?

  • @Vulcorio
    @Vulcorio Před 5 lety

    I don't know why, but every time i see Dr. Pound i just want him to have the best life possible. So charismatic.

  • @alenpaul2523
    @alenpaul2523 Před 5 lety

    Awesome channel

  • @alexschott9567
    @alexschott9567 Před 5 lety

    well you can't have less than -273.15 degrees celsius so wouldn't that make it ratio if kelvin is?

  • @skk20I
    @skk20I Před 5 lety

    Is it possibly to analyze logarithmic data? Like decibels?

  • @outside8312
    @outside8312 Před 5 lety +3

    2:30 I think Safiya Nygaard would beg to differ 😂

  • @michelnormandin8068
    @michelnormandin8068 Před 5 lety

    Omg... Sean Riley has the same name as my Data Anonymous sponsor.

  • @derstreber2
    @derstreber2 Před 5 lety +1

    3:01 "The Rimmer experience"

  • @MrTridac
    @MrTridac Před 5 lety +1

    Oh look, Mike Pou- CLICK

  • @isabellabihy8631
    @isabellabihy8631 Před 5 lety +1

    I'd also discern discrete and continuous data (for the lack of a better word). Discrete data: can take only certain values like Yes, No, Maybe ( 😁 ), integers. It makes no sense to calculate an arithmetic mean, aka average. Frequency counting is OK. A median is also appropriate. A typical misuse of actually discrete data is when they publish that the typical family has 2.3 children. WHAT? This includes nominal, ordinal, and interval data.
    Continuous data: I'd put the the data you mentioned as "ratio" in this group.

  • @matthewfyson6809
    @matthewfyson6809 Před 5 lety +1

    You actually can determine what "double the pH" of something is, since pH is actually just the negative base ten logarithm of the concentration of hydronium ions.
    Edit: tl;dr it's ogarithmic, so double the pH is equal to pH*log2

  • @fiiiie9398
    @fiiiie9398 Před 4 lety

    I love g-easy teaching

  • @Andmunko
    @Andmunko Před 4 lety

    negative wind is just wind in the opposite direction

  • @polares8187
    @polares8187 Před 5 lety +6

    Question is Am i data?

    • @harrysvensson2610
      @harrysvensson2610 Před 5 lety

      Are we data? :o

    • @argeon87
      @argeon87 Před 5 lety

      We are all data biologically speaking. Some sources prompt somewhere around 150 Zettabytes (10^21) of data. And I guess it's mostly nominal one))

  • @BAMBAMBAMBAMBAMval
    @BAMBAMBAMBAMBAMval Před rokem

    Mike, Red > Blue.

  • @laurendoe168
    @laurendoe168 Před 4 lety

    I realize you used temperature merely as an example of the interval data set type, but this particular measurement can be converted to a ratio data type (by converting C or F to K). Would there be any rational (pun intended) reason to do this?

  • @andretheron1833
    @andretheron1833 Před 5 lety

    This is good.

  • @lindhe
    @lindhe Před 5 lety

    Is NOIR considered an exhaustive list?

  • @ShadiMuhammad
    @ShadiMuhammad Před 4 lety

    Everytime I open a video in this channel, I got trapped in it.!! 😍

  • @misterhat5823
    @misterhat5823 Před 5 lety +2

    The example of pH isn't a good one. There is a mathematical difference between each number in the scale.

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 5 lety

      I rewinded video and I think it's correctly placed in "interval data" category. It can't be used for ratios directly, though. If you divide pH 1 by pH 0, you get infinity - nonsensical answer. In order to get meaningful ratio, you must convert it to concentration of H+, then you get 0.1. However, you are now working with DIFFERENT numbers, which are now on ratio scale - their zero is "true".

    • @kevinolson9940
      @kevinolson9940 Před 4 lety

      Hexanitrobenzene that argument doesn’t hold. I could just as easily say if you divide 1 child by 0 children you an undefined result. That doesn’t mean it isn’t type of ratio data. The rational numbers themselves are ratios of integers and they also break if you try and have a ratio of anything over 0.

  • @skydrow4523
    @skydrow4523 Před 5 lety +3

    I see Dr. Pound, I click faster than *data*

  • @grainfrizz
    @grainfrizz Před 5 lety +1

    7:38 Lenovo Thinkpad X1 Extreme winks

  • @minxythemerciless
    @minxythemerciless Před 5 lety +1

    What *ARE* Data! Data is plural. Datum is singular.

  • @MrBitchtits500
    @MrBitchtits500 Před 2 lety

    8:00 even the pros can't avoid those r syntax errors : )

  • @RuilinLinRyan
    @RuilinLinRyan Před 5 lety

    I don't understand why can't Celsius be considered ratio if there is a 0 C ? Why would it be wrong to say 20 C is twice as hot as 10 C?

    • @TheMacfruit
      @TheMacfruit Před 5 lety

      Ruilin Lin because that would imply you can say how much hotter 10° is compared to -10°.
      The math works out to -1, but that doesn't make any sense.

  • @eurekal1903
    @eurekal1903 Před 4 lety

    What is data is a question. How do you collect your data is another question.

  • @iamkapilkalra
    @iamkapilkalra Před 4 lety

    can we get the course files, so that we can follow along?

  • @chickenshieee
    @chickenshieee Před 5 lety

    You are awesome

  • @casperes0912
    @casperes0912 Před 5 lety

    Are examples like this standardised or something? In my uni Databases course I made a decision tree for whether or not tennis would be played based on information like weather...

    • @harrysvensson2610
      @harrysvensson2610 Před 5 lety

      I don't think it's standardized, it's more like use the most basic algorithm that solves the problem for you and call it a day. In your case a decision tree solved your problem, and you called it a day.

  • @murk1e
    @murk1e Před 5 lety +1

    Problem with a series on big data, drop all the vids at once and the algorithm hides them all.

  • @Arthur-mj2vd
    @Arthur-mj2vd Před 5 lety

    Right?

  • @user-tq6my4sx1g
    @user-tq6my4sx1g Před 5 lety

    3:00

  • @AmexL
    @AmexL Před 2 lety

    The real question is: what is Numberwang?

  • @christianmagnus1003
    @christianmagnus1003 Před 4 lety

    Im happy because? that peter parker have the same Portatil like my portatil thinkpad t470

  • @gabeaze
    @gabeaze Před 3 lety

    So what IS data?

  • @winsonjacob3554
    @winsonjacob3554 Před 5 lety

    isn't the ph table a ratio

  • @leon24xxx
    @leon24xxx Před 3 lety

    Ratio data... 100 kelvin is 2 times higher than 50 kelvin... OK. How many times is 100 kelvin higher than 0 kelvin, or how many times is 200 kelvin higher than 0 kelvin.

  • @tensortab8896
    @tensortab8896 Před 5 lety

    What are data?

  • @fawal.1997
    @fawal.1997 Před 5 lety

    Finally

  • @alexandersmith4796
    @alexandersmith4796 Před 4 lety

    WHAT IS DATA

  • @kevintrainor5838
    @kevintrainor5838 Před 2 lety

    Perhaps “What are data?” would be the more appropriate question…

  • @VictorCaldo
    @VictorCaldo Před 5 lety

    godsend

  • @jsabra89
    @jsabra89 Před 4 lety

    Someone please get the man a better pen.

  • @MrAnonyM00SE
    @MrAnonyM00SE Před 5 lety +2

    tsk tsk Dr Pound didn't know that pH is a logarithmic scale.

    • @userou-ig1ze
      @userou-ig1ze Před 5 lety

      I'm sure he does? What minute?

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 5 lety

      I think the point is that you can't use pH for ratio directly, you must convert it to concentration of H+ ions. The same for degrees Celsius - there is a well known relationship between Celsius and Kelvin scales, however, degrees Celsius can't be used for ratios because their zero is "not true".

    • @userou-ig1ze
      @userou-ig1ze Před 5 lety

      @@Hexanitrobenzene You're confusing this with Fahrenheit

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 5 lety

      @@userou-ig1ze
      How so ? 0 degrees Celsius is 273,15 degrees Kelvin, the molecules still have a fair amount of thermal energy.

  • @thenorup
    @thenorup Před 5 lety

    ones and zeroes!
    There! I just saved you 12 minutes.

  • @GammabitFilms
    @GammabitFilms Před 5 lety +1

    its not what is data, its who is data ;)

  • @tdot33367
    @tdot33367 Před 4 lety

    excited to see 54 year old John running down the wing

  • @luis_landazuri
    @luis_landazuri Před 5 lety +1

    Its just me or get the feeling that he thinks Python is nos free... made me double check after many years using it

  • @Dewlyth
    @Dewlyth Před 5 lety

    Huh ? But the wavelength are.. oww.. ok :'(

  • @Pavastes
    @Pavastes Před 5 lety

    I am data.

  • @frankweiler7121
    @frankweiler7121 Před 5 lety

    The mean temperature is 0K ? What? ;)