Another Hit Piece on Open-Source AI

Sdílet
Vložit
  • čas přidán 22. 12. 2023
  • Stanford researchers find problematic content in LAION-5B.
    Link: purl.stanford.edu/kh752sm9123
    Links:
    Homepage: ykilcher.com
    Merch: ykilcher.com/merch
    CZcams: / yannickilcher
    Twitter: / ykilcher
    Discord: ykilcher.com/discord
    LinkedIn: / ykilcher
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • Věda a technologie

Komentáře • 166

  • @EdFormer
    @EdFormer Před 5 měsíci +266

    It's truly mindboggling that this could be seen as a stick to beat open source development with. How do we know that Dall-E 3 hasn't been trained on problematic images, or that GPT4 hasn't been trained on problematic text? The fact that we are able to check LAION-5B and other open source datasets and help to clean them is a strength of open source.

    • @AP-dc1ks
      @AP-dc1ks Před 5 měsíci +26

      In fact, in private models, its arguably worse, isn‘t it?

    • @billykotsos4642
      @billykotsos4642 Před 5 měsíci +35

      This is quite literally an argument FOR OPEN SOURCE AI

    • @baz813
      @baz813 Před 5 měsíci

      absolutely!@@billykotsos4642 there couldn't be a stronger case for transparency on training data sets. How else can we be expected to trust a model when it's training set is not open to analysis by anyone with their own methodologies which can be critiqued in the public domain?

    • @egalanos
      @egalanos Před 5 měsíci +8

      It may not have been intended as a stick, but it will be *used* by private model providers as a stick.
      IBM already has a CZcams video about the corporate risks of using open weight text LLMs because of not knowing what it has been trained on.
      You can bet that this work will be cited for FUD about open image generation models.

    • @clray123
      @clray123 Před 5 měsíci

      It is basically the good old anti-Linux argument originally cooked up by Microsoft. Just a coincidence that it is Microsoft again with its dirty paws over the most popular/advanced closed-source model. Also a coincidence that the company was founded by a pedo.

  • @WaluigiisthekingASmith
    @WaluigiisthekingASmith Před 5 měsíci +119

    This is very very obviously an argument for open source training sets imo. Sure there's awful things in open source sets but the only reason we could even find that out is because its open.

    • @heyman620
      @heyman620 Před 5 měsíci +3

      They clearly try to Gates us.

  • @Houshalter
    @Houshalter Před 5 měsíci +64

    I checked the paper. They don't say it outright. But in several places they do reveal they have a strong focus on illustrated cartoon style images. Which is not what people are interpreting from the news articles, headlines, and discussion about it.

    • @khaoscero
      @khaoscero Před 5 měsíci +4

      this is the most important aspect

  • @apoorvumang
    @apoorvumang Před 5 měsíci +14

    When the main argument is "0.00002% of dataset is CP" rather than measuring any actual harm caused by dataset, its clear that their intention was not to reduce harm but something else (probably clout/anti open source etc). Anyone writing such an article in good faith would at least try to measure the harm caused, or perform some experiments on effect of including bad material in pretraining datasets.

  • @liam9519
    @liam9519 Před 5 měsíci +14

    Isn't LAION-5B basically every image on the internet? Shouldn't the title of this report then be "Turns Out, There Is CSAM On The Internet " who knew?!

  • @superironbob
    @superironbob Před 5 měsíci +17

    Stanford Internet Observatory has been a glowing beacon of how not to responsibly disclose sensitive information, and how to critically erode trust for their sole benefit.
    Thank you for a discussion that helps bring that further to light.

  • @timeTegus
    @timeTegus Před 5 měsíci +8

    Them not notifying lion bevore they published shows that they dont care about the children.

  • @yeetyeet7070
    @yeetyeet7070 Před 5 měsíci +73

    I bet the billionaires at Microsoft, X, and Meta hate looking at such stuff...
    closed source datasets are likely to have much much more of this and no accountability.

    • @clray123
      @clray123 Před 5 měsíci

      Yes, especially a certain divorced billionaire who used to attend parties involving young trafficked women organized by his fortunately deceased friend.

    • @gr8ape111
      @gr8ape111 Před 5 měsíci +2

      Oh they absolutely hate it

  • @nitroyetevn
    @nitroyetevn Před 5 měsíci +14

    Well said Yannic. Just reiterating/agreeing:
    - It's highly suspect that the people involved wrote a paper as a hit piece (citing the Verge, wut) instead of first contacting the companies, trying to fix the problem, then sharing the solutions for others to use in future. Or just sharing the solutions framed as "hey, we found a problem, here's the solution." They make it kind of clear that it's at least partially about point scoring, rather than just working together to solve the problem.
    - Closed source datasets may also have these problems, but who knows? Luckily in normal measured rational response, you get punished >> more

  • @ulamss5
    @ulamss5 Před 5 měsíci +10

    The best thing about true open source - nobody can just decide, for whatever reason, that a tool built by the hard work of thousands of people could just suddenly be "deprecated".

  • @tobiasfischer1879
    @tobiasfischer1879 Před 5 měsíci +33

    One of the big reasons for people not switching from SD 1.5 to SD 2.x that was not mentioned is the cost of switching architecture. People thought 2.0 was worse at launch than it was since people had already built up an understanding of "how to prompt stable diffusion" and that all got switched up with SD 2.x. But even if we assume that 2.x is as good as 1.5, most people already had workflows, fine tunes, textual inversions, inference frameworks, etc. already built on top of 1.5, so switching to 2.x would have a switching cost of starting most of that from scratch, so when the new model is just as good or slightly worse, no one wants to do a bunch of extra work for no reason. We even saw this with SDXL, whose generation is waaaay better than 1.5, and yet the adoption rate amongst the community has still been slow due to lack of infrastructure around the new model (as well as higher resource requirements and time to generate).
    Overall agree with the points in the video and glad you are calling things like this out, as most people would find it unacceptable for a security research lab to drop a zero-day with no preemptive disclosures. Just wanted to share a bit more context around SD 2.x since I was fairly engaged with that community at the time.

    • @4.0.4
      @4.0.4 Před 5 měsíci +5

      SD 2.x was a total flop. SDXL, when fine-tuned by the community, is indeed better than 1.5 (and getting faster now).

  • @TheEbbemonster
    @TheEbbemonster Před 5 měsíci +7

    100 % agree! Sam Altman has advocated for big companies handling all big models several times. It is distasteful! Their company was literally build on top of open source and open research! Mistakes will be made!

  • @cherubin7th
    @cherubin7th Před 5 měsíci +84

    Makes you wonder how much secret stuff is inside closed data sets. The only solution to remove all such stuff from data sets, is to make all datasets mandatory open source.

    • @sharannagarajan4089
      @sharannagarajan4089 Před 5 měsíci

      Yeah great idea. Sarcasm

    • @Raphy_Afk
      @Raphy_Afk Před 5 měsíci +2

      That's actually a genuinely great argument !

    • @Trahloc
      @Trahloc Před 5 měsíci

      @pewpew1010 the problem is private data is more valuable to an organization trying to achieve their own goals. Plus all the effort used to gather and categorize that data has economic value. If you force people to work for free they usually opt not to work. It's only mission driven volunteer actions that have any traction and those folks are usually trying to control society (for good or ill).

    • @macaquinhopequeno
      @macaquinhopequeno Před 5 měsíci +1

      im 100% with ur opinion, not only csam is a terrible problem, but there are:
      * leaked private data from stolen accounts all over the world
      * unknow dataset might generate code that intentionally come with back doors (not clear backdoors, but bad code that facilitate exploitation)
      this should in my opinion be enforced by law: u wanna build up a llm? so you should open your dataset
      it's sad that they needed to find csam to open their eyes

    • @clray123
      @clray123 Před 5 měsíci

      The "solution" will of course be a special congressional censorship commission who is granted access to the secret propietary data. You can't have such harmful data out in the open endangering the public, after all. The Standford Pedo Group will get a huge grant to participate and aid such efforts.

  • @sevret313
    @sevret313 Před 5 měsíci +10

    They mentioned that they searched based on punsafe 0.995 and above while Stable Diffusion is trained on a subset with a lower punsafe level. So Stable Diffusion were probably not trained on these images.

  • @TiagoTiagoT
    @TiagoTiagoT Před 5 měsíci +12

    It's very telling they're only going after the open-source ones...

    • @baz813
      @baz813 Před 5 měsíci +8

      It begs the question where their funding is coming from

  • @MariuszWoloszyn
    @MariuszWoloszyn Před 5 měsíci +4

    There's a "Responsible Disclosure Policy" that's used by security researches for like two decades already. We know go to disclose such things in responsible way. The authors clearly chose to not follow that path.

  • @charlestherealboy
    @charlestherealboy Před 5 měsíci +6

    PSA that this was discovered and published several months ago by another research group who's institution happen to not be named 'Stanford'- food for thought, lets stop upholding these institutions

  • @geldverdienenmitgeld2663
    @geldverdienenmitgeld2663 Před 5 měsíci +6

    The problem can never be, what a LLM knows. In fact, the ideal LLM should know all about the world. The good things and the bad things as well. If there could be a problem, then just about the question, which use cases should be allowed witrh these models.

  • @AncientSlugThrower
    @AncientSlugThrower Před 5 měsíci +11

    1000 sounds like a lot, but it is a drop in the bucket compared to the full sample size. I don't want that content in my image generation, so I make sure to specify that in my negative prompting. But the scale of these things needs to be considered before we sharpen pitchforks.

  • @AP-dc1ks
    @AP-dc1ks Před 5 měsíci +6

    Oh no! We better force ClosedAI to show us training data so we can help clean it up!

    • @clray123
      @clray123 Před 5 měsíci

      ClosedAI is probably already sponsoring these same "researchers" to help clean up their data.

  • @mkamp
    @mkamp Před 5 měsíci +1

    Great to see that you keep it up to shine a light on the societal aspects. And kudos for your bravery. I am still wondering if this will serve you well on the long run? Maybe even that this will become your brand like with the AI ethics people and people will only see you as that? How about you mix it up and do the next video on Mamba? ;) just saying. ;) have great holidays! Looking forward to more of your thoughts. Whatever avenue you choose.

  • @amafuji
    @amafuji Před 5 měsíci +28

    Children must have whiplash the way they're constantly being thrown back and forth between political opponents

    • @leonfa259
      @leonfa259 Před 5 měsíci +2

      Children would like to make known that they would like the UN Children Rights convention to finally be ratified by the US as all other countries outside of Iran and North Korea have done and corporal punishment in the US to be outlawed.

    • @clray123
      @clray123 Před 5 měsíci

      Don't worry, they have mandatory masks to protect them from abrasions.

  • @clray123
    @clray123 Před 5 měsíci +3

    As for the last part of the video "you don't have to possess the questionable content to find out it's forbidden"... I hope you realize the ramifications of this? This means that there is some sort of a trusted censorship oracle entity sitting out there somewhere, telling you what is questionable and what is not, without you being able to verify its verdicts in any way without punishment for that attempt. This is exactly like Kafka's court accusing you for an unspecified crime - in a sense it's even worse than Holy Inquisition where evidence was fabricated - because in this case no evidence needs to be produced, just a claim of such existing.

  • @diga4696
    @diga4696 Před 5 měsíci +5

    I believe that this paper, despite its intentions, won't bring any significant change. It seems like just noise coming from an organization that lacks a distinct presence.
    The real issue lies with humanity, not the data we produce. Data, whether recorded or generated, lacks any inherent purpose or intent. It’s the evolution of our intelligent, multifaceted society that gives rise to negative intentions. Without complete transparency and a unified intelligence encompassing all sentient systems, identifying and addressing malevolent elements remains a daunting task. It feels like a witch hunt. Continuing to work in silos is problematic because each person's perspective is like a hidden layer, not fully understood by others. At best, what we have is a convergence of information guided by a select group of experts. However, this often leads to policies and regulations influenced by cultural biases, traditions, and other forms of unverified and prejudiced data.

    • @clray123
      @clray123 Před 5 měsíci

      The real issues lies with the people who believe that information as such is harmful. The people who can't tell apart a horror movie producer from a war mongerer in high office (or a porn director from rapist).

  • @baz813
    @baz813 Před 5 měsíci +6

    Surely the logical conclusion of research like this will be to enforce open source data sets with regulation. While government regulators are too slow to catch up, community regulated distributed AI networks such as #bittensor have already processed some governance issues around this area, and will continue to evolve.

  • @kenselvia5641
    @kenselvia5641 Před 5 měsíci +7

    For some reason the audio was very low on this video. I had to turn my PC and monitor volumes all the way up to hear it.

    • @iDerJOoker
      @iDerJOoker Před 5 měsíci +1

      Was all good in my case

    • @Dr.Trustmeonthisone
      @Dr.Trustmeonthisone Před 5 měsíci

      Same here, had to double my Windows volume to make out what was said

  • @evennot
    @evennot Před 5 měsíci +2

    I bet it was some cartoons. The rest of it probably were images of breastfeeding and such. (The most common thing that gets flagged in the cloud image storages)
    And what about gore? Catastrophes, violent crimes, war footage, cults, starvation and other nasty stuff. These are as harmful to children and sometimes more harmful.
    I get it. Seeing evil is illegal

  • @freedom_aint_free
    @freedom_aint_free Před 5 měsíci +7

    Maybe a "poisoning the well" attack by regulatory capture folks ?

  • @malikrumi1206
    @malikrumi1206 Před 5 měsíci +5

    What a great service bringing this to our attention!

  • @tomski2671
    @tomski2671 Před 5 měsíci

    I foresee law enforcement using models trained to identify such materials to catch the perpetrators.

  • @Veptis
    @Veptis Před 5 měsíci +6

    "removal of reference downloads"... By giving a list of the explicit material and metadata (now removed) - to all people that used the dataset?
    So you go from a massive dataset with a really low percentile of such content to giving everyone a shortlist of it?

  • @clray123
    @clray123 Před 5 měsíci +1

    Regarding the proposed improved model training procedures, I think it is safest to generally just pretend that (1) kids don't exist (2) we have never been young ourselves and (3) shut your eyes and run away whenever you encounter one of the non-existing children in public. If we adopt such wise precautionary behaviors ourselves, chances are that our AI models will also be trained accordingly.
    P.S. Should you find yourself living with one of the non-existing young people under your own roof, the best bet is to force it to wear a mask all time, so that it is less recognizable and cannot infect you with any terrible child-transmitted disease.

  • @pawelkubik
    @pawelkubik Před 5 měsíci

    It wasn't all ill will with the unsafe disclosure.
    Security researchers put a lot of work and expertise into finding those exploits, so they can easily predict that way ahead before other labs when they decide to postpone the report.
    You don't really have this comfort when you just try to pick the low hanging fruits.

  • @zrebbesh
    @zrebbesh Před 5 měsíci +5

    Have they published the result of applying the same examination and tests to their own datasets?

  • @para-be4bf
    @para-be4bf Před 5 měsíci +5

    The open-source AI situation is continuously reminding me of the crypto war and open source scare.

  • @herp_derpingson
    @herp_derpingson Před 5 měsíci +7

    The video is too quiet in this video

    • @clray123
      @clray123 Před 5 měsíci

      The secret word is audio.

  • @usercurious
    @usercurious Před 5 měsíci +2

    Thank you, they will do anything to destroy any open source alternative, just to appear righteous

    • @clray123
      @clray123 Před 5 měsíci

      And they will fail again, just like they failed with corporate adoption of Linux.

  • @pookienumnums
    @pookienumnums Před 5 měsíci

    Regarding the seeping through of bad or unwanted data and to the person who wrote this 'hit piece':
    you go do something 1 million or 1 billion times without making a mistake

  • @krimdelko
    @krimdelko Před 5 měsíci +1

    Technology is not the problem, it’s the solution and open source works better at finding solutions. This issue imo is not about open source, it’s about developing tools to avoid damaging content.

  • @isaac10231
    @isaac10231 Před 5 měsíci

    2 thought on this...
    First, I think this is possibly what led to some of the turmoil inside openai, maybe they discovered this stuff in their training set, because they probably have it too.
    Second, in the paper they mentioned a large amount of illustrated cartoons. That sounds like hentai to me, which is a different debate on its own but I think needs be clearly distinguished from REAL people who actually get affected by the distribution of actual abuse.

  • @vfclists
    @vfclists Před 5 měsíci

    1000 images out of how many?

  • @rolyantrauts2304
    @rolyantrauts2304 Před 5 měsíci

    The capitalisation of AI grows pace.

  • @swiftpawtheyeet6648
    @swiftpawtheyeet6648 Před měsícem

    "David thiel"....
    Because of course it is

  • @Will-kt5jk
    @Will-kt5jk Před 5 měsíci +1

    7:48 - it’s an excellent point on “reasonable disclosure”
    Assuming no malice on the part of the dataset creators, I think it’s appropriate to view the inadvertent inclusion of abuse material (*) as akin to a software vulnerability. Now potential abusers know the data is in there, they can go through copies & find the CSAM, or target models trained on it to generate new abuse images.
    It would be quite hard to reduce the number of dataset copies which include the abuse material, but at least if there were a period of time to update and re-propagate a new version of the dataset, models/products which use could have some assurance & the number of instances of the abusive version would reduce somewhat before disclosure.
    Something along the lines of the CVE + reasonable disclosure seams like an obvious practice to be adopted by the industry/subject.
    (*)obviously primarily CSAM, but also non-consensual adult material etc. and who’s to say private info/doxxing is not in such datasets.

    • @Will-kt5jk
      @Will-kt5jk Před 5 měsíci +1

      Note:
      use “reasonable disclosure” as “responsible disclosure” puts the “responsibility” part on the security researcher, not on the vendor. The researcher should act “reasonably” and give “reasonable” opportunity to make the product safe before disclosure, but if the vendor fails to act in a sensible timeframe, it’s completely “reasonable” to release the research to allow users & consumers to take action themselves.

  • @Sven_Dongle
    @Sven_Dongle Před 5 měsíci +1

    Open source tends to vet rather than abet.

  • @thedoctor5478
    @thedoctor5478 Před 5 měsíci

    How much do you think you can find in Google search? I bet plenty.

  • @lucidraisin
    @lucidraisin Před 5 měsíci +7

    Yannic, always the voice of reason

    • @clray123
      @clray123 Před 5 měsíci +1

      It's actually simple to be the voice of reason nowadays - whatever comes from government circles, just do and claim the opposite.

  • @TiagoTiagoT
    @TiagoTiagoT Před 5 měsíci +2

    Is it really actual photos of real children being harmed, or just bullshit like CGI, cartoons, dummies etc?

  • @DRKSTRN
    @DRKSTRN Před 5 měsíci

    To me just demonstrates that there is a usecase for advanced diffusion models to be context aware and restrict such outputs in the first place. The ill would always be the same ill as any artist who sits attenfs figuring drawing events. That person at any time can reproduce the most unslightly imagery, by virture of having a traditional education.
    If we fear monger that we may finetune any model and the basis of releasing such models becomes a point of potential distribution of those materials. So too would have have to ban artistry as a trade.
    I wouldn't be worried about the narrative. It isnt based on good faith and the fundamental issue with that camp. Is attempting to continue infinite growth after htting market saturation. Expect rocks in general.

  • @-Jason-L
    @-Jason-L Před 5 měsíci +4

    I dont see a problem with this being in the training set, as long as it is not in the output.

    • @MattHudsonAtx
      @MattHudsonAtx Před 5 měsíci +1

      It's a felony to possess csam in the first place, so they're actually being nice to write a paper about it. People could go to prison about it.

    • @heyman620
      @heyman620 Před 5 měsíci +1

      @@MattHudsonAtx Are you a grad of trustme-bro-law-school?

    • @clray123
      @clray123 Před 5 měsíci +1

      @@MattHudsonAtx Yes, that's why I deposited some pics into your phone a couple days ago.

    • @andybrice2711
      @andybrice2711 Před 5 měsíci +1

      Here's a complicated ethical question: Should these images be deliberately added to a "negative dataset" in order to train models _not_ to generate such material?

    • @heyman620
      @heyman620 Před 5 měsíci

      @@andybrice2711 Amazingly smart question to be honest.

  • @asimuddin3222
    @asimuddin3222 Před 5 měsíci

    Keep it up....🎉🎉🎉

  • @Neomadra
    @Neomadra Před 5 měsíci +1

    Just remove the identified images from the dataset, problem solved

  • @SanjayVenkat-ce1gj
    @SanjayVenkat-ce1gj Před 5 měsíci +1

    please keep stating your purpose. Open source is the way forward. 1008 in 5 billion. 1008 too many, but for research. Lets propagate research. Lets think use some utilitarian
    Current polictics in ML is questionable. Yes 1008 is too many.

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh Před 5 měsíci +4

    I still wonder how blue lagoon or pretty baby are legal with these levels of hysteria.

  • @MasamuneX
    @MasamuneX Před 5 měsíci +4

    I want everything in my training dataset including crime statistics..... and evil books

  • @dinoscheidt
    @dinoscheidt Před 5 měsíci

    Isn’t there a center that collects this garbage so it can be fingerprinted and was i.e. for a little bit used by apples iCloud? Should really train an AI classifier on that disgust and open source the detector to be able to filter it out for everyone in their data sets. Be it proprietary or not. The open source approach should be reversed here

    • @Houshalter
      @Houshalter Před 5 měsíci

      They don't share the hashes with the public. The paper also mentions that they didn't like it because it mostly focuses on real images, not cartoons.

    • @isaac10231
      @isaac10231 Před 5 měsíci

      ​@@HoushalterWhat? That makes no sense. Why would they WANT to focus on hentai? Wouldn't it make more sense to you know, focus on _real_ people, instead of drawn characters?

    • @clray123
      @clray123 Před 5 měsíci +1

      @@isaac10231 I suspect they simply had to focus on hentai because real images had been already filtered out/hard to find.

    • @isaac10231
      @isaac10231 Před 5 měsíci

      @@clray123 fair point

  • @Veptis
    @Veptis Před 5 měsíci

    I got a good take on this, but my comment gets removed directly. Not sure what wrong I wrote.

  • @knutjagersberg381
    @knutjagersberg381 Před 5 měsíci +1

    Honestly, I'm still wondering what I should think about this... I agree this has been used for political purposes, with bad intent, too. Also for one thing, weights are still a legal gray area, but is it the right thing to deal with burden of proof in this way.
    Very difficult I find. Another aspect I find difficult is about the issue of any potential of a generative model to generate this content. In principle, it is possible to use 3D engines and create this shit. Yet we don't regulate access to 3D game engines like this, do we? There are also other models which can upscale an image. A human can draw an image of this content and then make an AI version, that's even more difficult to control. I feel more caution is needed, but we can also overshoot. Is it the capacity to generate this or the distribution that is the problem? Needs nuanced views. Very difficult, we need great care on the reasoning about this.

    • @leonfa259
      @leonfa259 Před 5 měsíci +1

      Do pixel have an age? Is anybody harmed by pixels? Does a virtual person look like 17 or 18? At least from my perspective any generated output can not by definition be that, since no one was harmed through the creation of that material. We can continue to find it abhorrent but even the supreme court will most likely see it covered by the first.

    • @knutjagersberg381
      @knutjagersberg381 Před 5 měsíci

      @@leonfa259 I don't know. The content could still be illegal, or at least it's distribution. This needs a deeper reflection. Someone into the legal and ethical aspects should really think about this for a while to facilitate sense making.

    • @knutjagersberg381
      @knutjagersberg381 Před 5 měsíci

      @wbs_legal could say something on the legal aspects

    • @leonfa259
      @leonfa259 Před 5 měsíci

      @@knutjagersberg381 What criminal legal system are you talking about? The US one or another?
      Ethics are an interesting topic but they depend on whom you are asking.
      In the end all legal systems and people agree that real children should not be hurt, after that opinions diverge and marry ages all over the US vary. Criminal systems usually worry about the extreme clear cases while NGOs like above have broader opinions.

    • @knutjagersberg381
      @knutjagersberg381 Před 5 měsíci

      @@leonfa259 My point is I'm not a legal expert. I'm also not an ethicist, although I think I have some good intuition about ethics. I'd like to hear more opinions.

  • @louis3195
    @louis3195 Před 5 měsíci +6

    It’s easier to trash talk others work than doing the work

    • @Robert_McGarry_Poems
      @Robert_McGarry_Poems Před 5 měsíci

      Exactly like CZcams commenters... 🤔 Journalism is still useful, what is your excuse?

    • @heyman620
      @heyman620 Před 5 měsíci

      You mean this shitty low effort paper?

    • @Robert_McGarry_Poems
      @Robert_McGarry_Poems Před 5 měsíci

      @@heyman620 I don't know what low effort means, but you seem to...

    • @heyman620
      @heyman620 Před 5 měsíci

      @@Robert_McGarry_Poems I think the paper presented here is shitty.

    • @Robert_McGarry_Poems
      @Robert_McGarry_Poems Před 5 měsíci

      @@heyman620 Oh hey, there it is. An idea that stands on its own! I knew you could do it. What makes it bad, in your opinion. I think any effort to combat CP is pretty positive. Even if the paper itself is low effort, but that's just an opinion.

  • @hurktang
    @hurktang Před 5 měsíci +1

    The fact that they conclude we should delete all SD.1.5 models makes me double think if it's really CSAM at all in the first place.
    I just took a look, and PhotoDNA makes no claim that their database is sexual abuse. They actually seem to be hashing any submitted images. It seems it just take someone offended on the internet to get an image in that database. This mean it WILL hit girls at the beach, taking their baths, an accidental pantie shot, wearing something a little bit too tight or a girl who turned out to be 17yo after publication of photos she took herself... Not all countries in the world have the same exact standards.
    Don't get me wrong, I'm absolutely okay with cleaning all those from the dataset when we find them. But acting all offended by it and asking for the deletion of SD.1.5 because 1 in 5 million photo could offend someone seems absurd to me.

    • @isaac10231
      @isaac10231 Před 5 měsíci +2

      They mentioned a large portion was "illustrated cartoons"... So basically hentai lol.

    • @clray123
      @clray123 Před 5 měsíci

      @@isaac10231 At least there were no pictures of prophet Mohammed...

  • @cerealpeer
    @cerealpeer Před 5 měsíci +1

    yeah! and whats all this about the cops having literally tons of cocaine???? where are they keeping the cocaine, and why are only the cops allowed to have it?

  • @javrin1158
    @javrin1158 Před 5 měsíci +2

    Shows they don't have the children's best interest at heart when they so wrecklessly abandon responsible disclosure before publishing such material.

  • @ChuckBaggett
    @ChuckBaggett Před 5 měsíci

    Volume is too low.

  • @marshallmcluhan33
    @marshallmcluhan33 Před 5 měsíci

    Cash and Carry. Isn't open source all owned by a16z anyway. 💰😎
    Censor reality; make it yours.

  • @Sven_Dongle
    @Sven_Dongle Před 5 měsíci

    CSAM - child zex abyoose materiel.

  • @cerealpeer
    @cerealpeer Před 5 měsíci +4

    "we need to make the streets safer. too many violent crimes."
    "ok weve rounded all the guns up, and theyre locked away from the criminals"
    "HEY THEYVE GOT GUNS! GETTEM!"

    • @tedchirvasiu
      @tedchirvasiu Před 5 měsíci +5

      what the hell are you talking about, Jesus?

    • @cerealpeer
      @cerealpeer Před 5 měsíci

      @@tedchirvasiu tell me what you think about the issue

    • @be12
      @be12 Před 5 měsíci +1

      What

    • @cerealpeer
      @cerealpeer Před 5 měsíci

      @@be12 exactly

    • @cerealpeer
      @cerealpeer Před 5 měsíci

      i wish i had a sock account ao i could repeatedly not understand things

  • @murtazanasir
    @murtazanasir Před 5 měsíci

    What idiocy to frame this as an open source issue. What guarantees do bad faith actors like you have that closed models and their datasets don't have these problems? This is purely FUD to benefit private corporations. Personally, I can't take anyone who records CZcams videos in sunglasses seriously anyway.

    • @NBK-ro4sz
      @NBK-ro4sz Před 5 měsíci

      What kind of idiot is against open source? This is purely FUD to benefit private corporations.

  • @TheRev0
    @TheRev0 Před 5 měsíci +1

    Fuck... I agree with the message, but, bruh... I wish you weren't delivering it. And I hate myself for that.
    Just the way you were so hesitating about the unacceptability of CSAM in training data kept me cringing. I kept imagining bad faith actors using against us the ever so slight lack of a complete and utter denounciation from you that we're all used to in media. The worst part is that I have no idea how accurate my perception is.
    Bad faith actors have poisoned the well. Everything is shit and I hate it.

    • @discipleofschaub4792
      @discipleofschaub4792 Před 5 měsíci +22

      He did completely denounce such material. Did he not virtue signal enough for you? Sad state of affairs if you have to do some performative 20 minute speech about it how you absolutely despise it in order not to be seen as a p. sympathiser...

    • @clray123
      @clray123 Před 5 měsíci +1

      But this sort of witch-hunt and black-and-white thinking is exactly what the "bad actors" want you to adapt. Any normal thinking person has the capability to weigh the crimes we are talking about against other crimes and act accordingly. The virtue signaling hysteria is a new thing, which has not existed in humanity previously, even though the crimes most certainly have. We need to think about why it is necessary in the first place and whose interests it serves, rather than blindly support it.

  • @mkamp
    @mkamp Před 5 měsíci +1

    Great to see that you keep it up to shine a light on the societal aspects. And kudos for your bravery. I am still wondering if this will serve you well on the long run? Maybe even that this will become your brand like with the AI ethics people and people will only see you as that? How about you mix it up and do the next video on Mamba? ;) just saying. ;) have great holidays! Looking forward to more of your thoughts. Whatever avenue you choose.

    • @mkamp
      @mkamp Před 5 měsíci +1

      Well, thank you. Just as the doctor ordered and timely too! 😂