OpenAI's Sora Made Me Crazy AI Videos-Then the CTO Answered (Most of) My Questions | WSJ

Sdílet
Vložit
  • čas přidán 11. 05. 2024
  • OpenAI’s new text-to-video AI model Sora can create some very realistic scenes. How does this generative AI tech work? Why does it mess up sometimes? When will it be released to the public? What data was it trained on?
    WSJ’s Joanna Stern sat down with OpenAI CTO Mira Murati to find out.
    Chapters:
    0:00 OpenAI’s Sora
    1:08 Reviewing Sora’s videos
    4:03 Optimizing and training Sora
    6:42 Concerns around Sora
    8:55 Identifying Sora videos
    Tech Things With Joanna Stern
    Everything is now a tech thing. In creative and humorous videos, WSJ senior personal tech columnist Joanna Stern explains and reviews the products, services and trends that are changing our world.
    Watch the full, wide-ranging interview with OpenAI CTO Mira Murati here: on.wsj.com/3UbPyaR
    #AI #Sora #WSJ

Komentáře • 1,3K

  • @wsj
    @wsj  Před měsícem +1

    Watch the full, wide-ranging interview with OpenAI CTO Mira Murati here: on.wsj.com/3TNv2Mu

  • @elithecomputerguy
    @elithecomputerguy Před měsícem +2021

    The CTO can't talk about where the training data came from...

    • @blackspetnaz2
      @blackspetnaz2 Před měsícem +248

      They cannot talk about the fishy things they did to achieve this.

    • @tiromandal6399
      @tiromandal6399 Před měsícem +89

      ​@@blackspetnaz2 Lots of cute animals died very painful deaths for the meds that's been keeping everyone you ever knew alive but you're not complaining about that. What a hypocrite!

    • @neon9165
      @neon9165 Před měsícem +52

      @@tiromandal6399Not to mention that a lot of their data comes from their partnership with shutterstock, thats why a lot of the videos look so stock footage like
      but i do imagine there are still some copyrighted things in there so it makes sense she isnt answering a question that could be used in court, they (and maybe google) are perhaps the closest to non copyright infringing ai

    • @3dus
      @3dus Před měsícem

      Well...are you comparing dead animals to copyright royalties? Is it OK for a foreign company to steal OpenAI undisclosed science and use it to create a better model that advances AI?

    • @christiancrow
      @christiancrow Před měsícem +13

      Closed ai open public images free for the taking ?

  • @DevineLuLinvega
    @DevineLuLinvega Před měsícem +1271

    "I can neither confirm nor deny that we trained on youtube videos"

    • @Test-ny6uh
      @Test-ny6uh Před měsícem +39

      Even if you have been trained, these are public videos. There's nothing wrong.

    • @krishhhhhhhhhhhhhhhh
      @krishhhhhhhhhhhhhhhh Před měsícem

      ​@@Test-ny6uhCZcams is many people's livelihood. If they're training on that data, they're stealing their income in the future. We need to start banning AI for somethings. Or else it's just one job after another.

    • @Rosscifer
      @Rosscifer Před měsícem +123

      @@Test-ny6uh No. That's not how it works. There are very strict laws concerning exactly what constitutes fair use and despite this copyrighted material gets uploaded to youtube a million times a day. It needs to constantly be taken down. Just because the character from Spongebob was visible in some fair use clips and some illegally uploaded episodes doesn't mean the character can be plagiarized at will.

    • @andyou2327
      @andyou2327 Před měsícem +14

      @@Test-ny6uh But if you re-upload a CZcams video, you'll be hit with a Content ID claim.

    • @stuckonearth4967
      @stuckonearth4967 Před měsícem +39

      @@Rosscifer Oh, c'mon you copyright people are annoying. AI learns just like how humans learn. Its training is like watching movies for us. We also know how certain characters look and so does AI. It can copy from other videos just like how humans copy from other creators. I think good examples of copying are elves, gnomes, orcs etc.

  • @EggEggEggg1
    @EggEggEggg1 Před měsícem +1497

    The fact that she dodged that question about the training data isn't very "Open"AI of her.

    • @anirudhnarla4711
      @anirudhnarla4711 Před měsícem +44

      Honestly thats every ai chatbot,image generator,llm from every company including google,facebook,microsoft,etc. A training model requires huge amounts of data which isnt just available on hand so they scoop the videos and text from the public internet

    • @BenGrimm977
      @BenGrimm977 Před měsícem +22

      It's understandable that they wouldn't want to specify beyond publicly available data. It was a bad question.

    • @SublimeMind
      @SublimeMind Před měsícem +92

      @@BenGrimm977No, it was a terrible answer. "Do you use CZcams to train Sora" the CTO of Open AI: "Durrrr I unno maybe".

    • @slavloli8755
      @slavloli8755 Před měsícem +49

      she was clearly scared to answer it lol
      Surprised that she wasn't vetted for the interview, this question was bound to be asked and she fumbled
      Maybe OpenAI doesn't care about public's perception, but that's something that can be detrimental later on. Given the rise of luddites and how it may lead to policies to impede AI's growth

    • @andifadeaway11
      @andifadeaway11 Před měsícem +29

      @@slavloli8755 People who are wary of AI aren't just luddites. They just have a sense of consequence and what letting the whole world be run by companies who own image data rather than individual creatives might do. Sorry you can't think more than one second into the future to understand why AI restrictions are incredibly important to the future well being of our society.

  • @punk3900
    @punk3900 Před měsícem +219

    It's like asking a farmer: - Where do you take your apples from? Do you pick them from trees?
    - I dont know. If they grow on trees, perhaps yes.

    • @gokuldastvm
      @gokuldastvm Před měsícem +41

      Don't forget - they were 'publicly visible' apples.

    • @jwilder2251
      @jwilder2251 Před měsícem +9

      This is one of the funniest comments I’ve ever read 😂

    • @throwawaydude3470
      @throwawaydude3470 Před měsícem +23

      She picked them from her neighbors trees. Not the same.

    • @kamu747
      @kamu747 Před měsícem +7

      This is very funny. You got me cracking me up.
      So the question would have been phrased more like whose trees do you pick them from asking a farmer who owns an Apple far that spans thousands of acres neighbouring another farmer who also grows apples.
      So her answer is: only trees I'm permitted from picking, from our farms yes, which other trees, I can not say, but if I was permitted then Yes .

    • @dlebensgefahr
      @dlebensgefahr Před měsícem +1

      No, it is not like that.

  • @jolness1
    @jolness1 Před měsícem +533

    The vagueness of her response about what was used from the CTO shows she definitely knows they’re using copyrighted content imo.

    • @mistressfreezepeach
      @mistressfreezepeach Před měsícem +15

      or she really doesn't and is just there as a pretty head

    • @Korodarn
      @Korodarn Před měsícem +12

      Copyright is automatic. Unless they were training on videos from the early 20th century they couldn't be training on videos with no copyright excepting the very few people who bother to put on a CC 0 or similar license

    • @billf1748
      @billf1748 Před měsícem +32

      If you watch a video and learn something from it, does that mean you copied it, and thus, violated copyright? Of course not. What people don't seem to understand is that AIs are utilizing digital neural nets; they do not need to copy to learn--they simply adjust their weights and biases. Nothing has been copied. Learning has happened. This is the way it will be argued in courts.

    • @neon9165
      @neon9165 Před měsícem +7

      @@KorodarnThey have a Partnership with shutterstock (32 Million videos if shutterstock is making their whole catalogue available)
      And even then, some countries already established that machine learning isnt copyright infringing

    • @SahilP2648
      @SahilP2648 Před měsícem +6

      @@billf1748 well then laws need to change, and fast

  • @TripleMachine
    @TripleMachine Před měsícem +553

    4:50 She's smart enough to know where the models came from and she's hesitant to say because of potential lawsuits.

    • @mack-uv6gn
      @mack-uv6gn Před měsícem +7

      Copyright?

    • @anirudhnarla4711
      @anirudhnarla4711 Před měsícem +46

      They scoop the data from literally everything public on the internet which is a lot. And not just sora but every llm,chatbot and image generator too. Its the same for everything

    • @SublimeMind
      @SublimeMind Před měsícem +22

      @@anirudhnarla4711Oh so the "everyone does it so it's legal" defense.

    • @blackspetnaz2
      @blackspetnaz2 Před měsícem +5

      She knows

    • @mistressfreezepeach
      @mistressfreezepeach Před měsícem +10

      is she smart? it's not obvious to me

  • @official_youtube
    @official_youtube Před měsícem +198

    "My prompt for this crab said nothing about a mustache" is the most 2024 sentence ever.

    • @KDashCR
      @KDashCR Před měsícem +5

      I don’t expect regular people to be impressed by this. For us who work on Computer Science this is like a miracle, can’t wait to dig into this.

    • @VivekChandra007
      @VivekChandra007 Před měsícem

      AI models can’t cook out of thin air, they always are based on data trained on. If one wants to know the real data it was trained on, look at subtle things it put in not mentioned in prompt

    • @VejmR
      @VejmR Před měsícem +5

      ​@@KDashCRby what? Sora or mustache in specific ?

    • @matheusdemoura5529
      @matheusdemoura5529 Před měsícem +4

      But it’s clear that Sora AI copied the sponge bob character

    • @quandovoceleroscomentarios9622
      @quandovoceleroscomentarios9622 Před měsícem +1

      @@VejmRTHe mustache, obviously

  • @discoverymoi
    @discoverymoi Před měsícem +340

    You should’ve asked Sora to make 10 people playing rock paper scissors in fast pace. 😂

    • @geaca3222
      @geaca3222 Před měsícem +8

      😅🤣

    • @WildVoltorb
      @WildVoltorb Před měsícem +8

      😂

    • @pinch254
      @pinch254 Před měsícem +5

      😂

    • @edism
      @edism Před měsícem +8

      This is why it's not free, silly people wasting resources. Why can't you just increase the speed of a normal video for this rubbish? Lol

  • @acer6049
    @acer6049 Před měsícem +584

    Remember the Super Phrase: *Publicly available data*

    • @MrQWUK
      @MrQWUK Před měsícem +58

      Exactly my thoughts after watching this. Barely scratching the surface in respect of rigorous query into attribution. Copyright and IP law is being left in the dust, with creative works being flagrantly stolen for marketing hype, stock inflation and profit. It's a sad path "we're" headed down. Another race to the bottom.

    • @sirdiealot53
      @sirdiealot53 Před měsícem +30

      She’s right. It’s publicly available. If you don’t want everyone to see your videos don’t post them on the internet. Durr

    • @JorisGriffioen
      @JorisGriffioen Před měsícem +25

      @@sirdiealot53 yes, "see", that's not the contention here.

    • @Uday_shirbhate
      @Uday_shirbhate Před měsícem +11

      Yeah, we need a new definition of "Publically available data".

    • @carcolevan7102
      @carcolevan7102 Před měsícem +19

      @@JorisGriffioenIf one can 'see' something, one might also 'learn' from it. So that is the contention. Imagine if you had to get permission before studying anyone else's art. It would be a red-tape paperwork nightmare and will essentially destroy the social function of art. You say copyright law is being violated with "creative works being flagrantly stolen" but copyright law does not require anyone to get permission to view or study anyone else's publicly-available art. Your moral outrage is palpable, but there is no substance to it because copyright law does not require the permission you seem to think it does.

  • @revestrek1
    @revestrek1 Před měsícem +45

    I´m so glad this interviewer asked some real questions, Im sick of all the shallow questions over naive smiles. AI can be good, but also dangerous. It´s important we understand how serious it is, and train and use it carefully.

  • @zivzulander
    @zivzulander Před měsícem +129

    I'm not going to lie, I spent half the video with my eyes darting to Mira's hands just to make sure she's real 😅

  • @sdf8085
    @sdf8085 Před měsícem +418

    I cannot express how wild it seems to me that the *CTO* of one of the biggest genAI companies, promoting their latest genAI product couldn't answer what data the model was trained on.

    • @tarcus6074
      @tarcus6074 Před měsícem +19

      It doesn't matter.

    • @blackspetnaz2
      @blackspetnaz2 Před měsícem +80

      She got caught, she knows there are lawsuits left and right. What was incredible is she was not prepared for the question. But since they know they did pretty bad praxis she got nervous.

    • @ayushnews5735
      @ayushnews5735 Před měsícem

      ​@@tarcus6074 it does matter.

    • @Nainara32
      @Nainara32 Před měsícem +24

      I wouldn't expect the CTO to know the current status of any given contract with a vendor with enough confidence to be quoted on WSJ. These things change all the time and they probably have hundreds of sources of training data just for SORA AI, let alone all the other products that she oversees like ChatGPT and DALI.

    • @i20010
      @i20010 Před měsícem

      Its all stolen from us.

  • @Outcaster88
    @Outcaster88 Před měsícem +94

    The way she dodged over and over the question about the training data, clearly their are not using only publicly available data.

    • @madshader
      @madshader Před měsícem +3

      Or, for legal reasons, she is not allowed to make statements on it yet. Not everything is so black and white. One could argue that humans train on publicly available data when they make their own creations, pulling inspiration from everywhere. AI is just much more efficient at this task than a human is.

    • @shredd5705
      @shredd5705 Před měsícem +18

      Publicly available doesn't mean you freely use it. Copyrighted work can be publicly available. Just because artwork or contet is online doesn't mean you have rights to use it

    • @madshader
      @madshader Před měsícem +4

      @@shredd5705are you serious? Of course a human can use it as inspiration, the same as an AI. The problem arises when you try and claim the exact copyrighted work is your own. But artists use all kinds of art to draw from and create something new. This has been the practice of every artist through history.

    • @diyamond
      @diyamond Před měsícem

      @@madshaderas an artist, i agree that they shouldn’t and it’s disgusting. but through a business perspective… it’s all free to them because technically it is on the internet for anyone to see. it’s not morally correct whatsoever, but it happens.

    • @Mobri
      @Mobri Před měsícem

      ​@@madshader One could argue that, but completely miss the point or understand why this argument is patently idiotic on the surface of it.
      Please, go back to speculating about the social importance of dogecoin and leave the rest of us alone.

  • @mrtang18
    @mrtang18 Před měsícem +65

    Joanna Stern is the best! Love how she grilled her about the training data - super awkward response 😂

  • @unmonged
    @unmonged Před měsícem +29

    this is a product i made that i am very proud of. i'm the spokesperson of the product and was sent here to speak specificity about the product only.
    "how did i make my product", you ask? i don't know.

  • @ishaankelkar
    @ishaankelkar Před měsícem +187

    the questions asked in this interview were really good -- concise and hitting all of the important points. thank you for this informative video

    • @i20010
      @i20010 Před měsícem +9

      Yeas, it's rare these days to have intelligent press asking correct questions.

    • @buttofthejoke
      @buttofthejoke Před měsícem +6

      I was just thinking that. All questions that I wanted answered. precise and terse

    • @BlackParade01
      @BlackParade01 Před měsícem +6

      Joanna is always a hit

    • @unawakeful6931
      @unawakeful6931 Před měsícem +4

      She didn't push back on the jobs subject when the interviewee stated that it will just be a tool.
      Yes, a tool that anyone can use which will make making movies trivial to where it will undoubtedly have a significant affect on jobs.

    • @brandnaqua
      @brandnaqua Před měsícem +5

      Joanna Stern is the best of the best in the business! 🙌 i love how she's honest without ever being rude. she's top tier! 🎉

  • @marttiekstrand4879
    @marttiekstrand4879 Před měsícem +97

    As Sora can create "cartoon animation" OpenAI should show their list of animation production companies who have licensed their films to be used in the machine learning model. There's not much animation is available in public domain, especially 3D animation.

    • @shredd5705
      @shredd5705 Před měsícem

      It's obviously stolen stuff. They steal. Most AI companies do. Instead of OpenAI, PirateAI would be more accurate

    • @bruno2010087
      @bruno2010087 Před měsícem +3

      although maybe it's possible for it to learn particular styles from static images also, no? then i would say there are many more sources of training data

    • @marttiekstrand4879
      @marttiekstrand4879 Před měsícem +10

      Styles of movement in animation can only be modelled from animation. Cartoon characters doesn't move anywhere close to real humans.

    • @Vincent-lg2jh
      @Vincent-lg2jh Před měsícem +1

      Good note, I think they also specifically made a contract with set studios to output "basic" animated movements for the model to learn easily.

  • @xiphoid2011
    @xiphoid2011 Před měsícem +271

    In my 40s now, looking back, I am amazed at the acceleration of technological innovation. It's now it's almost impossible to imagine what computing will be like in just 2-3 years.

    • @Jay-eb7ik
      @Jay-eb7ik Před měsícem +11

      1 million x more powerful by 2030

    • @Vikasslytherine
      @Vikasslytherine Před měsícem +9

      We still don't have flying cars

    • @_ShaDynasty
      @_ShaDynasty Před měsícem +1

      wheres my flying car and television watch?@@Vikasslytherine

    • @ibrahimalharbi3358
      @ibrahimalharbi3358 Před měsícem

      Democracy is a big joke!
      Taxation is a theft!
      Laws is only for citizens not owners, for example Copyright.
      God is real
      God is not a dead man

    • @mistressfreezepeach
      @mistressfreezepeach Před měsícem +5

      @@Vikasslytherine we do have many things out of 1984, though

  • @Jay-rr6me
    @Jay-rr6me Před měsícem +22

    What’s the point of replacing creativity we don’t want to live in a world where nothing is creative

    • @winniethepoohxi1896
      @winniethepoohxi1896 Před měsícem +6

      If an AI can be creative why does that stop you from being creative separately? Having an AI be creative doesn’t mean you can’t be creative as a hobby all you want. Most artists, authors, and musicians just do it as a hobby already. No AI is going to say you are no longer allowed to be creative. You might not be as creative as an AI but you can still do your own thing.

    • @midnightjayy
      @midnightjayy Před měsícem +11

      ⁠@@winniethepoohxi1896AI would not exist with the thousands of artist’s names, works, etc. being inputted into its database. All without anyone’s consent, and the fact that the corporations are profiting off of other’s work, not them. That’s where the problem is.
      Nothing is stopping anybody from actually learning how to draw, there’s thousands of videos on the internet to tell people how… I just see it as people being extremely lazy and jumping to the end result. It’s cheap.

    • @stardust6870
      @stardust6870 Před měsícem

      Unfortunately, no one cares. Most people are lazy and happy to say they created something even though it was AI making it for them. They don't care about the creation process. Instead, people care about the output and profit. That's what capitalism did to us. And I'm saying this as a writer who lost work to ChatGPT a year ago.

  • @TurnRacing
    @TurnRacing Před měsícem +75

    ouch those data training questions hit HARD!

  • @SerenityReceiver
    @SerenityReceiver Před měsícem +70

    The CTO isn't sure if youtube training data was used???

    • @dibbidydoo4318
      @dibbidydoo4318 Před měsícem +13

      well she could've purposely chosen to not look at the dataset to avoid lying in a lawsuit.

    • @RyanMichero
      @RyanMichero Před měsícem +12

      She knows, and can't say.

  • @kakaeriko
    @kakaeriko Před měsícem +187

    not sure of sources??

    • @jolness1
      @jolness1 Před měsícem +52

      They’re definitely just feeding every freely usable or copyrighted video around.

    • @bikedawg
      @bikedawg Před měsícem +26

      she's absolutely knows what the sources are. she doesn't want to open openAI for another lawsuit.

    • @benny-schmidt
      @benny-schmidt Před měsícem +1

      @@bikedawg She doesn't know jack, DEI hire with 0 experience

    • @shredd5705
      @shredd5705 Před měsícem +2

      They are sure, but they don't want to tell because it's stealing copyrighted work. Just like every other AI company training their AIs

    • @vborovikov
      @vborovikov Před měsícem

      torrents

  • @Fiscotte
    @Fiscotte Před měsícem +28

    ClosedAI

  • @Jay-rr6me
    @Jay-rr6me Před měsícem +39

    They always say AI will change us in good ways and we will be much better but as you know it’s actually gonna go the other way

  • @arnavprakash7991
    @arnavprakash7991 Před měsícem +6

    Really good reporting the interviewer asked the correct questions and Mira definitely knows the data they used she does not want to reveal anything because they’ll get crucified

  • @alvarortega2
    @alvarortega2 Před měsícem +21

    That interview took a quick turn! Hehehe

  • @krakenj5237
    @krakenj5237 Před měsícem +50

    Really good interview. Grilled her with sensible questions

  • @aser12104
    @aser12104 Před měsícem +8

    basically they “openly” “take” whatever there are available online to train the ai

  • @itsmebk6820
    @itsmebk6820 Před měsícem +5

    Yikes that was one bomb of an interview… they are definitely training it on CZcams/x/fb/ig everything 🤣

  • @AkshaySinghJamwal
    @AkshaySinghJamwal Před měsícem +4

    Let me get this straight: they've created a model that generates videos indistinguishable from reality, and their genius idea to make that distinction is: a watermark! Wow!

    • @xlrbossshorts
      @xlrbossshorts Před měsícem +1

      There's really nothing they could do as a safety procedure, unless they want to put a huge watermark stamp on the video or limit the things you can create.

  • @TanakaTsikira
    @TanakaTsikira Před měsícem +17

    Hmmmn. The training data issue is going to haunt OpenAI for a while. The law has not caught up with the technology. US Law makers are going to have a tough time with this one.

    • @dibbidydoo4318
      @dibbidydoo4318 Před měsícem +2

      It's not going to haunt it, a vital part of every copyright law in the world is that any infringing material has to be substantially similar to the original.

    • @NoobNoob339
      @NoobNoob339 Před měsícem

      mmm yes cope@@dibbidydoo4318

    • @salifyanjisimwanza9679
      @salifyanjisimwanza9679 Před měsícem +3

      ​​​@@dibbidydoo4318I agree with you on that part. But I think there's still a significant legal challenge here.
      The videos/images produced may be substantially different from the copyrighted material but copyrighted material was used to train the models all the same. The consequence of that act of training is what's haunting the law courts at the moment.
      As a human being, I can observe or read and reproduce what I saw/read in a somewhat different way. AI models have not yet earned that status. Moreover, there are potential data protection issues here.
      Whatever the case, IP laws are about to undergo possibly the biggest change ever in history.

    • @dibbidydoo4318
      @dibbidydoo4318 Před měsícem

      ​@@salifyanjisimwanza9679
      I don't think this will impact IP laws much unless someone believed that the "property" in "intellectual property" was meant to be taken literally.

    • @Mobri
      @Mobri Před měsícem

      ​@@dibbidydoo4318 In the case of humans making art, yes.
      In the case of a program creating art, maybe no.
      But if it is legal to make and sell, then you still have no copyright and I can sell your AI art with no consequence.
      It can't be both, though.

  • @PeterDrewSEO
    @PeterDrewSEO Před měsícem +1

    Great interview, thanks. I'm liking the timeframe mentioned....

  • @therealsimdan
    @therealsimdan Před měsícem +19

    “It’s very difficult to simulate the motion of hands.”
    Every VFX studio from the last 25 years: “Do I mean nothing to you?”

    • @unknown-fd1yz
      @unknown-fd1yz Před měsícem +1

      Doing it with ai and doing it manually is a really different thing. And ai do it quick unlike vfx studios so there is that too.

  • @ropro9817
    @ropro9817 Před měsícem +112

    Lol, wow, that interview turned really sketchy @4:24... 😆

    • @ropro9817
      @ropro9817 Před měsícem +14

      How does the CTO--and former CEO... for 2 days--not know details of what content was used for model training when that is such a controversial hot topic today? 🤔 I call bull💩...

    • @vetboy627
      @vetboy627 Před měsícem +7

      @@ropro9817Because that's the magic behind how the models work and she's not going to reveal that to competitors. Also it's not really the CTO's concern what exact data is used to create a model, as long as it's legal and gets results

    • @IcyyDicy
      @IcyyDicy Před měsícem +9

      Obviously. Why shoot yourself in the foot by being correct?
      Credit is due for WSJ for asking the tough questions and letting viewers judge the non-answers for themselves. I'd argue it tells a lot more.

    • @teachusmore
      @teachusmore Před měsícem

      Future lawsuit here…she intentionally concealed the data source. A discovery of their internal communications will likely reveal that they know they are taking copyright material without permission

    • @lifemarketing9876
      @lifemarketing9876 Před měsícem

      @@vetboy627 Finally someone in the comments using critical thinking instead of dogpiling Mira, when they have absolutely no clue what's going on.

  • @christiancrow
    @christiancrow Před měsícem +24

    Uh oh where that data coming from 😂

    • @christiancrow
      @christiancrow Před měsícem

      If it's public , clarify that please ?

  • @fuu812
    @fuu812 Před měsícem +4

    4:34 GOLDEN reaction

  • @freeabt9916
    @freeabt9916 Před měsícem +4

    will they be using this video as a training to generate a video when the prompt is: did you pay for the data you used?

  • @anirudhnarla4711
    @anirudhnarla4711 Před měsícem +33

    People are really trying to find faults in it to cope but honestly the flaws are so minor that you have to notice it extensively (except the hands part) and the model is still in its beta phase so this is literally a game changer

    • @abdulrazack1683
      @abdulrazack1683 Před měsícem

      So true

    • @jolness1
      @jolness1 Před měsícem

      There are a lot of goofy things that pop up, not just hands. It is super neat but also unbelievably power and compute hungry.

    • @huckleberryfinn6578
      @huckleberryfinn6578 Před měsícem +4

      @@jolness1 Just look static image generation from early 2022 and recent images.

    • @ayushnews5735
      @ayushnews5735 Před měsícem +4

      Flaw is not in the videos generated. It can be solved.
      The real problem is her not telling the source of training data despite being the CTO of "Open" AI.

    • @dibbidydoo4318
      @dibbidydoo4318 Před měsícem +2

      @@ayushnews5735 Well it's obvious that stuff isn't supposed to spoken about until lawsuits conclude.

  • @makuetebeatrice3203
    @makuetebeatrice3203 Před měsícem +2

    A powerful Ai tool like Sora, when used with intricate technical prompts, could potentially be susceptible to certain vulnerabilities. Will they be able to deal with this ?

  • @peterjohansson739
    @peterjohansson739 Před měsícem +12

    I can confirm that the models were trained on all data, publicly available, licensed and non-licensed etc.

    • @SlimTK
      @SlimTK Před měsícem

      yeah maybe meta is privately selling them all our data.

  • @ktktktktktktkt
    @ktktktktktktkt Před měsícem +41

    I feel like a CTO should know where the data came from lol

    • @Hannan_1325
      @Hannan_1325 Před měsícem +10

      She knows, she is not supposed to make announcements about their secrets on News channel.

    • @ktktktktktktkt
      @ktktktktktktkt Před měsícem +12

      @@Hannan_1325 it's hardly a trade secret if it's publicly available/licensed data and she later confirmed the data includes shutterstock. If anything, she didn't want to say the answer on air because it would sound really bad. AI art has caught a ton of flak for using artists art without permission. I would bet they did actually use youtube videos without permission.

    • @Korodarn
      @Korodarn Před měsícem +1

      @@ktktktktktktkt It's a certainty they used things without permission. There is absolutely no way they could get the permission for all of this, and that's fine, because permission is not required for you to view it, it shouldn't be required for them to view it either.

    • @ibrahimalharbi3358
      @ibrahimalharbi3358 Před měsícem

      Democracy is a big joke!
      Taxation is a theft!
      Laws is only for citizens not owners, for example Copyright.

    • @carcolevan7102
      @carcolevan7102 Před měsícem

      @@ktktktktktktkt Right. She knows where the data come from and she knows that there is a widespread belief that allowing an AI to study publicly-available images and videos without permission is illegal. It's not, but that doesn't diminish the moral outrage of those who wrongly believe that it is. So the question "where did the training data come from?" is a loaded one premised on the idea that using images and videos scraped from the internet is illegal, even though it isn't. There was no way to educate the public enough during this short interview that an honest answer wouldn't just pour fuel on the misinformation fire that's already blazing around this topic.

  • @ethanvance3834
    @ethanvance3834 Před měsícem +6

    I stole a car because i saw it on the street and i thought that was public property so by this logic isn't your car publically available for me to use for my own benefit.

  • @shmookins
    @shmookins Před měsícem +14

    Why was she avoiding answering the source data? They absolutely know since there is only two major parts: the code, and the learning source. She avoided answering about the Shutter deal as well, even though they had a deal with them and later in the interview confirms that.
    This doesn't look good.

    • @user-gt2ro6ml6w
      @user-gt2ro6ml6w Před měsícem +3

      It looks fine. It is obvious she isn't answering because of lawsuits.

    • @shmookins
      @shmookins Před měsícem

      @@user-gt2ro6ml6w She could have simply said as much. I heard other businesses reply the same, something like: "we can't comment on on going cases" or some such. But even the Shutter deal- which she confirmed later- was also given a vague response at first.
      It's just odd, that's all. Maybe this person was throne in to fill the simply seat and they don't have experience?
      Oh, well. Humans gonna human.

    • @user-gt2ro6ml6w
      @user-gt2ro6ml6w Před měsícem +3

      she obviously has experience, she has been there since 2018. she also basically explicitly said that she cannot comment on it, so idk where the confusion is coming from@@shmookins

    • @koumorichinpo4326
      @koumorichinpo4326 Před měsícem

      @@user-gt2ro6ml6w if they need to be hush hush to not be sued, maybe your little brain could entertain for a moment that its because they are doing the wrong thing

    • @NoobNoob339
      @NoobNoob339 Před měsícem +1

      ​@@user-gt2ro6ml6w "It looks fine. It is obvious she isn't answering because of lawsuits." I see a massive contradiction there

  • @godmisfortunatechild
    @godmisfortunatechild Před měsícem +11

    I think it's definitely worth trying" ie the profit motive is too great not to.

    • @teebu
      @teebu Před měsícem

      thats really what keeps her up at night

  • @Willibeolder
    @Willibeolder Před měsícem +3

    in a distant future I can imagine cops having to shout "hands! show me your hands! Hands in the air!" for more than one reason

  • @generativeresearch
    @generativeresearch Před měsícem +5

    Lawsuits incoming

  • @colabear4343
    @colabear4343 Před měsícem +1

    Q: "Videos on youtube?"
    CTO: "I'm actually not sure about that."

  • @artbyeliza8670
    @artbyeliza8670 Před měsícem +7

    Some of my art was used for training AI - I didn't give permission. I'm not sure where they got them from.

  • @rubes8065
    @rubes8065 Před měsícem +10

    She knows that videos on CZcams were used as training data. She is not a good liar lol it’s her job to know. OpenAI doesn’t want to get sued, that’s why.
    Sam Altman likes to move fast. They have the LLM models but they don’t have access to enough data. So they take it without paying for it.

    • @aussiepawsborne9056
      @aussiepawsborne9056 Před měsícem +3

      I don’t think they legally have to pay for data that is publicly available. The laws haven’t really been established around neural networks yet

    • @user-fj3wk7mi2n
      @user-fj3wk7mi2n Před měsícem

      Lol, youtube is not entitled to payment.

    • @choptop81
      @choptop81 Před měsícem +3

      Sam Altman liked to move fast into his four year old sister’s bed too (read her SA allegations against him)

    • @coreyjblakey
      @coreyjblakey Před měsícem

      @@user-fj3wk7mi2nNo one here gives 2 fs about YT getting a cent dude, we want the video creators to get either paid, or the option to not be in the data, It should be opt in, its currently not even opt out

  • @amortalbeing
    @amortalbeing Před měsícem +1

    thanks, good job there.

  • @stealcase
    @stealcase Před měsícem +5

    Thank you for asking about the data! This is what people care about: whether their data was ingested to train profitable AI without our consent.

  • @Peeps7468
    @Peeps7468 Před měsícem +4

    Good for the interviewer pushing for the source of the data.
    That was disappointing that the interviewee didn’t seem to know where the data comes from.

  • @sherriffs2554
    @sherriffs2554 Před měsícem +6

    Not going into the details of what trained sora because NYT is currently suing...among others.

  • @xxxxx409
    @xxxxx409 Před měsícem +2

    I can literally take watermarked screenshots and get rid of them within 5 minutes because of AI (Which they probably do when they scrape the web for training data).

  • @atlanta2076
    @atlanta2076 Před měsícem +6

    CTO: «Sora is based on text prompts». NO! It is NOT. It is based on stolen art. The biggest art robbery in history! And it gets even worse. She pretends (!) that she doesn't know whether they stole from CZcams creators (which they def did). She talks about "licensed data", as if OpenAI had any official license from the Sponge Bob right holders to feed their wicked machine with their property. I'm so disgusted! She says "publicly available" as if it meant "public domain". Any Disney DVD is pubicly available. Don't mean I can use that data as I please. A CTO that is "not sure" where they stole the data from. Give me a break. Furthermore, she keeps saying their in the very early stages, but keeps emphasizing that it'll be ready in a few months with mosts kinks ironed out. Which one is it now? WSJ: why did you not call this women out?

  • @crypticvisionary
    @crypticvisionary Před měsícem +16

    "Im not going to go into the details of the data that was used" = We stole most of the data and wont admit it

  • @ariwilsun
    @ariwilsun Před měsícem +1

    @3:32 Joanna's smug smile here has me cracking up. 😆

  • @goodtechdoor
    @goodtechdoor Před 29 dny

    Nice, reliable humans making reliable decisions with far reaching consequences. This is how we got here .. and we shall further!

  • @adrianmunevar654
    @adrianmunevar654 Před měsícem +6

    "Reliable" as the "Open"AI guys, reliable as a politician, reliable as the most of human egos... Profit, that's the word.

  • @pikaso6586
    @pikaso6586 Před měsícem +4

    The data fell from one of CZcams's trucks

  • @seanhardman1964
    @seanhardman1964 Před měsícem +2

    Reading between the lines and tbe expressions of all the top executives they seem to be saying people are going to be able to do whatever they want

  • @TarikTruth
    @TarikTruth Před měsícem

    This was some incredible reporting. Well done to her;

  • @sushienjoyer
    @sushienjoyer Před měsícem +30

    4:34 People are gonna make fun of that answer. However, clearly, the purpose is to dodge, and that answer achieved that.

    • @MnMEminem
      @MnMEminem Před měsícem

      She is just a pretty head they put there to reduce the hate they get from the public and reduce the concerns! This girl doesn't have the technology luminance real CTOs have!

    • @erikouwehand
      @erikouwehand Před měsícem +6

      Must mean they are stealing other people's content, otherwise you would not hide it.

    • @Joel-kw7jf
      @Joel-kw7jf Před měsícem +2

      Thanks captain obvious, what would we have done without you?

    • @joelface
      @joelface Před měsícem

      @@erikouwehand I disagree. There are lawsuits and a lot of contention around the training data, and saying a single wrong word could be used against her and the company and end up costing billions of dollars. Of course the interviewer wants details, but that doesn't mean it's smart for her to answer it even if everything seems above board and legal to her.

    • @the_nows
      @the_nows Před měsícem

      I think she legit didn't know what the training data specifically is, because she's a bad CTO. Also because that's being kept secret for many

  • @davedsilva
    @davedsilva Před měsícem +15

    The interviewer forgot to ask if Sora can replace her 😂

  • @Felttipfuzzywuzzyflyguy
    @Felttipfuzzywuzzyflyguy Před měsícem +1

    Where did they get this data to train on? Must be fine. /s

  • @eugenedw
    @eugenedw Před měsícem +2

    At 5 minutes the Chief Technology Officer of a multi-billion dollar "non-profit" corporation says she's not sure if their flagship Video GPT was trained on CZcams videos? Seriously? Sounds like something SBF would say. Is she hallucinating? We've been struggling creating prompts that'll prevent GPT-4 from making things up. Now, knowing its parents, I see why it lies so much.

  • @BrianHill
    @BrianHill Před měsícem +4

    Two smart women having a hard-headed conversation about an extraordinarily important topic. Nice vid.

  • @lelouchlamperouge8560
    @lelouchlamperouge8560 Před měsícem

    It’s a good thing they’ve somewhat broken down to bits and pcs the ‘creativity’ of the AI. Different AI’s for different applications. At the end of the day, it’s business as usual.

  • @ByrdNick
    @ByrdNick Před měsícem +1

    SORA exhibits/solicits a human-like inattentional blindness: it produces fluent motion that seems normal at a glance. But upon closer inspection it does stuff that’s as weird as a gorilla walking across the frame (from Chabris and Simon’s famous experiments.)

  • @EbenezerNimh
    @EbenezerNimh Před měsícem +11

    Al + Sora = Job Killer

  • @prilep5
    @prilep5 Před měsícem +7

    Perfect timing release of the best fake video AI maker just in time for the biggest election year in the World

  • @legatodi3000
    @legatodi3000 Před měsícem

    These are great questions! It’s a bit a shame that interviewer almost didn’t follow up on responses. For instance follow up on “ public” data can be compliance with usage conditions. Follow up on “how distinguish” AI generated content might be question how it’s already done with released products.

  • @shailong3254
    @shailong3254 Před měsícem +1

    Everything will have to be “taken with a grain of salt” we will have to question everything and trust nothing in the future.

  • @PankajDoharey
    @PankajDoharey Před měsícem +11

    Its an open secret OpenAI used a lot of pirated data from "The Eye" for ChatGPT, so i would assume they used a lot of video and movies not youtube.

    • @OscarJuarezRomero
      @OscarJuarezRomero Před měsícem +4

      What is that? I cant find anything online about "The Eye"

  • @raj5669
    @raj5669 Před měsícem +5

    Interesting conversation start at 4:42

  • @ModiPooja
    @ModiPooja Před měsícem

    The lack of transparency about the data utilized for model training, along with the ability to trace its knowledge origins accurately, is one of the primary shortcomings of AI. It is strange how humans are constantly prompted to reflect on the basis of their knowledge, whereas similar scrutiny is often overlooked in the realm of AI.

  • @thebicycleman8062
    @thebicycleman8062 Před měsícem +1

    I bet you 100% from this interview onwards OPENAI will have a SPECIALIZED TEAM dedicated ONLY TO TRAIN OPENAI on HOW TO RESPOND TO THAT VERY QUESTION regarding TRAINING DATA - They will be like a pro at it lol

  • @andrewkranger
    @andrewkranger Před měsícem +4

    This is like when people were pointing out all the flaws in the original iPhone.

  • @SlavaAloha
    @SlavaAloha Před měsícem +3

    This is iPad 12.9 or 11?

  • @huymaivan8671
    @huymaivan8671 Před měsícem +1

    The same vibe as "You smuggler, where did you hide those drug" -> "I dont know"

  • @bludirlyeh
    @bludirlyeh Před měsícem +1

    So now, "publicly available data" does it mean "public domain"?

    • @TheKnightOfVenus
      @TheKnightOfVenus Před měsícem +1

      That's what she's insinuating.. but I doubt the courts would agree!

  • @punk3900
    @punk3900 Před měsícem +8

    The data was shady :D

  • @AndreaDoesYoga
    @AndreaDoesYoga Před měsícem +6

    Wow, Sora's capabilities are mind-blowing! 😮

    • @Keenan686
      @Keenan686 Před měsícem

      what a time to be alive

    • @NoobNoob339
      @NoobNoob339 Před měsícem +3

      All the hard work people did that they stole from is amazing, yes.

  • @jaymacpherson8167
    @jaymacpherson8167 Před měsícem +1

    “I’m not sure…” if there are dangers using this tool. Regardless, creativity is paramount. For instance, what new ways can someone gaslight others?

  • @PixelsVerwisselaar
    @PixelsVerwisselaar Před měsícem +2

    Now its just waiting for the wistleblower 😂🤭

  • @Korodarn
    @Korodarn Před měsícem +5

    I really want to know what kind of answer to the training data question would satisfy people asking it.
    Give an example of an answer, and remember that you
    1) Don't agree with the copyright assertions that say you have no right to use the data for training
    2) Have expensive lawyers that tell you to be mostly mum or lack specificity on this matter
    3) Actually try to empathize with her position and the fact a canned "good" answer would likely be just as suspicious to people who are in opposition to your views as a "bad" answer that leads them to assuming "the worst" (in their opinion).

    • @Korodarn
      @Korodarn Před měsícem

      I'll give my example, although I don't love it, it is the truth to what they are arguing so far. It isn't evasive or relying on plausible deniability, but asserting a clear boundary.
      "Our position is that training is fair use just as using a work as inspiration by humans is fair use, and we are not inclined to open up on the matter of what we used to train because it is our right to keep that to ourselves and we are involved in ongoing court cases looking to set precedent against fair use."

    • @jake66664
      @jake66664 Před měsícem +2

      @@Korodarn Humans don't get visibly uncomfortable when you ask them what their learning sources were. In fact, humans are usually very happy to tell you where or who they learned from. Also people are also usually very pleased to hear that you learned something from them. The same can't be said for the AI situation.

    • @dibbidydoo4318
      @dibbidydoo4318 Před měsícem +2

      @@jake66664If someone is willing to sue you based on where you learned it from then of course you would be uncomfortable. Being sued doesn't have anything to do where you're wrong or right but it still costs a lot of money regardless of the outcome.

    • @jake66664
      @jake66664 Před měsícem +1

      @@dibbidydoo4318 The problem is these models don't "learn" in any meaningful sense as they are not intelligent. The companies themselves understand this. Fair use is an affirmative defense, its a statement that you are aware you used material in a way that violated copyright. If these models really were comparable to human learning there would be no need to invoke fair use, the cases should all have been thrown out before having their day in court. Instead, multiple AI companies are utterly buried under a mountain of copyright lawsuits, and none of them have been dismissed.

    • @dibbidydoo4318
      @dibbidydoo4318 Před měsícem +3

      @@jake66664
      Whether AI learns or not is irrelevant legally.
      None of the lawsuits that the AI companies are in have ever gotten to court. Plantiffs are at the pleadings, discovery, or motions stage. Dismissals happen after the motion stage.
      You think them being sued is evidence that they're doing something wrong but you realize you can literally sue anyone for anything but believing this is a crime is like believing someone is guilty until proven innocent.

  • @sapphyrus
    @sapphyrus Před měsícem +3

    Looking forward to Stable Diffusion eventually doing this. I'd rather not be bound by censorship and would prefer to create locally without subscription and with open models.

  • @blackhorseteck8381
    @blackhorseteck8381 Před měsícem

    The general public is not ready for what's coming up in the AI field, it's genuinely scary.

  • @TheBlessingReport
    @TheBlessingReport Před měsícem +1

    She didn't answer the important questions: how much does it cost and how long does it take?

  • @joshuastanton6731
    @joshuastanton6731 Před měsícem +9

    At what point do we realize that making devices that can mimic humans is not a good idea? Like every dystopian story out of sci-fi it’s pretty much guaranteed to end badly. Issac Asimov, Robert A. Heinlein, A Space Odyssey, Black Mirror, even the tale of Icarus, warns of our hubris and the dangers of moving forward without thinking. It really is like we’re playing with fire in a gas station and guaranteeing everyone that it’s “going to be okay.”

    • @littledudefromacrossthestr5755
      @littledudefromacrossthestr5755 Před měsícem +1

      Fr

    • @TheListeningParty_TLP
      @TheListeningParty_TLP Před měsícem +1

      Well said

    • @winniethepoohxi1896
      @winniethepoohxi1896 Před měsícem

      Because stories have to be interesting they generally include conflict. A story of perfect utopia singularity exploring the stars isn’t a fun story. The bad outcomes are just more interesting entertainment. Lord of the rings would have been boring if it was Frodo flew over mount doom on an eagle and dropped the ring into the fire. The end. Utopia novels and movies are boring. Dystopian ones are interesting. That’s why almost every instance of AI in entertainment focuses on the worst case outcomes.

    • @joshuastanton6731
      @joshuastanton6731 Před měsícem

      @@winniethepoohxi1896 And the deep fake videos of Biden, the altered photos of Princess Kate? Are those stories?

    • @joshuastanton6731
      @joshuastanton6731 Před měsícem

      @@winniethepoohxi1896 And the deep fake videos of Biden and the altered picture of Princess Kate? Are those just stories?

  • @JoshuaFinancialPL
    @JoshuaFinancialPL Před měsícem +9

    WSJ you should explore the copyright infringement they're doing

  • @collinmartin3589
    @collinmartin3589 Před měsícem

    she certainly did not expect the question bout the source of the training data..... I wish we had journalists like this in South Africa

  • @fromscratch8774
    @fromscratch8774 Před měsícem +1

    She knows, just like everyone else, where the data came from.
    Being CTO is just hard.

  • @arshaddamree6950
    @arshaddamree6950 Před měsícem +2

    Lawsuits incoming!

  • @VREmirate
    @VREmirate Před měsícem +4

    'Bull in a china shop' was very impressive

  • @savagefoox3524
    @savagefoox3524 Před 29 dny

    It’s very interesting video. Thank you for making it. I did notice something disturbing. When you asked about the origin of the data used to teach SORA, I noticed something. It was quite obvious that she was evasive and then seemingly outright lying about the data used to train SORA. I am very aware of the dynamic occurring right now in the industry that is the lack of data. AI will need more and more data to feed into its learning models, but it’s also clear that up until now they have just been accessing the data without permission and breaking copyright law. They thought no one would notice or care, but it has become an issue and clearly she is not being forthright about it. Clearly, they are using everything they get their hands on and ignoring copyright law. That did seem like the Ferdinand and it seem like the crab was a SpongeBob character. I personally think it’s very disturbing that they can crassly lie about what they’ve done here. If they’re willing to lie about where they’re getting the data and their use of copyrighted data then they are also willing to compromise much more in safety. They’re driven by profit and I really didn’t like seeing her. Lie about this. Thanks again for the video.

  • @AntonioAponte00
    @AntonioAponte00 Před měsícem +2

    We are basically taking everythin we can get out hand

    • @PHlophe
      @PHlophe Před měsícem

      Toninho, Meu Deus ! Que situação deprimente

  • @billf1748
    @billf1748 Před měsícem +7

    Unpopular opinion: It doesn't matter that they are using copyrighted material to train their models. Neural nets, whether they are in your brain or in Open AIs models, do not copy information. They use the information (the stimuli) to adjust their weights and biases using their amazing algorithms (backprop, etc.). This is analogous to how we learn. We have not copied the video into our brains, nonetheless, we have learned from it. This is how these cases will be argued in court.

    • @federicoaschieri
      @federicoaschieri Před měsícem +4

      That's just false. Several scientific papers, like "Speak, memory", have proved that AI models store thousands of copyrighted works. For example, all Harry Potter and Game of Thrones saga word by word. Indeed, our brain does have photographic memory, that's why you can recognized your mum 😆

    • @carcolevan7102
      @carcolevan7102 Před měsícem

      @@federicoaschieri Can you provide a link to this paper? A quick internet search on "Speak, memory AI" turns up a lengthy magazine article about creating an AI version of a recently-deceased friend, but it is not a scientific paper and doesn't say anything about memorizing texts.

    • @IconoclastX
      @IconoclastX Před měsícem +1

      ​@@user-fj3wk7mi2nNah, you're definently in the minority. Youre right on the scientific point but most people think that human societies should be for human beings and that we should regulate things if they are harmful to us(shocking). Irrespective of what the technology does, its bad for society and therefore it needs regulation. Same with how we do nuclear weapons.

    • @benny-schmidt
      @benny-schmidt Před měsícem

      They literally copy information down to the image watermark or exact sentences, sounds like you've never used this massively overhyped "AI"
      This channel is deleting comments like crazy, here was my last one. Other ones are gone:
      BookCorpus, which is just 1 of the datasets GPT used, has a license and it says the content may not be "reproduced, copied and distributed for either non-commercial or commercial purposes" - that means you cannot repackage or resell it in any way, even for non-commercial. You can try to sweep it under the rug of "training" but GPT lifts entire sentences, pages, chapters, titles, characters. On my github I have an "ArthasGPT" that allows you to create any fictional persona you want - you think Blizzard won't care if I sold access to an Arthas chatbot? Is that my content? Is it OpenAI's content to sell to me? No of course not.

    • @billf1748
      @billf1748 Před měsícem +1

      @@benny-schmidt So if I download Stable Diffusion onto my desktop computer and get it working, I've also downloaded every image it was ever trained on? My computer cannot hold petabytes of data. The reason you see a watermark is because the AI was trained on many images with watermarks. It has learned, incorrectly, that all images, or a certain category of images, should have watermarks. Humans make similar mistakes, like thinking a watermarked image from GenAi means it was copied.

  • @kobi2187
    @kobi2187 Před měsícem +3

    the bull is very sweet