Adam Gleave - Vulnerabilities in GPT-4 APIs & Superhuman Go AIs

Sdílet
Vložit
  • čas přidán 20. 06. 2024
  • This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.
    Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.
    OUTLINE
    00:00 Intro
    02:57 Far.AI's Mission
    05:33Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs
    11:48 Divergence Between The Growth Of System Capability And The Improvement Of Control
    13:15 Finding Substantial Vulnerabilities
    14:55 Exploiting GPT 4 APIs: Accidentally jailbreaking a model
    18:51 On Fine Tuned Attacks and Targeted Misinformation
    24:32 Malicious Code Generation
    27:12 Discovering Private Emails
    29:46 Harmful Assistants
    33:56 Hijacking the Assistant Based on the Knowledge Base
    36:41 The Ethical Dilemma of AI Vulnerability Disclosure
    46:34 Exploring AI's Ethical Boundaries and Industry Standards
    47:47 The Dangers of AI in Unregulated Applications
    49:30 AI Safety Across Different Domains
    51:09 Strategies for Enhancing AI Safety and Responsibility
    52:58 Taxonomy of Affordances and Minimal Best Practices for Application Developers
    57:21 Open Source in AI Safety and Ethics
    01:02:20 Vulnerabilities of Superhuman Go playing AIs
    01:23:28 Variation on AlphaZero Style Self-Play
    01:31:37 The Future of AI: Scaling Laws and Adversarial Robustness
    01:37:21 Start of Michael Trazzi interviewing Nathan Labenz(1:37:33) Nathan’s background
    01:39:44 Where does Nathan fall in the Eliezer to Kurzweil spectrum
    01:47:52 AI in biology could spiral out of control(01:56:20) Bioweapons
    02:01:10 Adoption Accelerationist, Hyperscaling Pauser
    02:06:26 Current Harms vs. Future Harms, risk tolerance
    02:11:58 Jailbreaks, Nathan’s experiments with Claude
    The cognitive revolution: www.cognitiverevolution.ai/
    Exploiting Novel GPT-4 APIs: far.ai/publication/pelrine202...
    Advesarial Policies Beat Superhuman Go AIs: far.ai/publication/wang2022ad...
  • Věda a technologie

Komentáře • 1

  • @TheInsideView
    @TheInsideView  Před měsícem +1

    Timestamps of Adam Gleave interview:
    02:57 Far.AI's Mission
    05:33Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs
    11:48 Divergence Between The Growth Of System Capability And The Improvement Of Control
    13:15 Finding Substantial Vulnerabilities
    14:55 Exploiting GPT 4 APIs: Accidentally jailbreaking a model
    18:51 On Fine Tuned Attacks and Targeted Misinformation
    24:32 Malicious Code Generation
    27:12 Discovering Private Emails
    29:46 Harmful Assistants
    33:56 Hijacking the Assistant Based on the Knowledge Base
    36:41 The Ethical Dilemma of AI Vulnerability Disclosure
    46:34 Exploring AI's Ethical Boundaries and Industry Standards
    47:47 The Dangers of AI in Unregulated Applications
    49:30 AI Safety Across Different Domains
    51:09 Strategies for Enhancing AI Safety and Responsibility
    52:58 Taxonomy of Affordances and Minimal Best Practices for Application Developers
    57:21 Open Source in AI Safety and Ethics
    01:02:20 Vulnerabilities of Superhuman Go playing AIs
    01:23:28 Variation on AlphaZero Style Self-Play
    01:31:37 The Future of AI: Scaling Laws and Adversarial Robustness
    Michael Trazzi interviews Nathan Labenz:
    1:37:33 Nathan’s background
    01:39:44 Where does Nathan fall in the Eliezer to Kurzweil spectrum
    01:47:52 AI in biology could spiral out of control
    01:56:20 Bioweapons
    02:01:10 Adoption Accelerationist, Hyperscaling Pauser
    02:06:26 Current Harms vs. Future Harms, risk tolerance
    02:11:58 Jailbreaks, Nathan’s experiments with Claude