New INSANE AI Chip, GPT4o Voice Update, Claude 3.5 Dominates, SpaceX Double Landing, AI Video Games

Matthew Berman

zhlédnutí 76 170

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 3. 07. 2024
Big AI News this week!
Be sure to check out Pinecone for all your Vector DB needs: www.pinecone.io/
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com
Need AI Consulting? 📈
forwardfuture.ai/
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
👉🏻 Instagram: / matthewberman_ai
👉🏻 Threads: www.threads.net/@matthewberma...
👉🏻 LinkedIn: / forward-future-ai
Media/Sponsorship Inquiries ✅
bit.ly/44TC45V
Links:
x.com/etched/status/180562569...
x.com/i/trending/180600446233..., x.com/OpenAI/status/180571639...
x.com/ClementDelangue/status/...
x.com/lmsysorg/status/1805329...
x.com/antonosika/status/18060...
x.com/elonmusk/status/1805804...
x.com/kimmonismus/status/1805...
x.com/kimmonismus/status/1804...
Chapters:
0:00 - Intro
0:13 - Etched AI Chip
3:09 - GPT4o Voice Delayed
4:52 - New LLM Leaderboard
7:13 - Claude 3.5 Dominates
9:16 - SpaceX Double Rocket Landing
9:50 - AI Video Game Future
10:55 - Apple + Meta
Věda a technologie

Komentáře • 335

@matthew_berman Před 5 dny ⁺¹⁹
Do you think this new AI performance is real?
@mdrafatsiddiqui Před 5 dny ⁺⁴
Apple didnt go with Meta may be due to licensing issues. Meta has restrictions for commercial usage by anyone who has more than 700 million users which surely Apple would be breaching. Thats my guess.
@lawyermahaprasad Před 5 dny
Meta doesn’t allow self hosting above 700m users
@minecraftermad Před 5 dny ⁺¹
it's premature, there's so many more methods to make AI than just transformers, and even the transformers are constantly changing, depending on how this chip is designed, it could be obsolete before it's ever out of the factory. what Tenstorrent is doing is way more intelligent of a tactique for this field.
@minecraftermad Před 5 dny
arc challenge is a fantastic test for any models to test how good it's at problem solving.
@CreativeEngineering_ Před 5 dny ⁺¹
I have been using open ai and anthropic for some time now, and Claude 3.5's code is better. You especially notice it with large amounts of code and/or modifying a large project. Just like with any model, you have to be clear and specific. Like in the marble in the cup example. If part of its system prompt is to "Consider actions within the request and determine the outcome or most probable outcome of each action and how it effects other actions." Or "Identify the actions in the request, predict their outcomes, and understand how these outcomes impact other actions." It would have a much higher likelihood of success.
@bdennyw1 Před 5 dny ⁺²²
The hardware lottery refers to when a research idea succeeds primarily because it is well-suited to available hardware and software, rather than because it is inherently superior to alternative approaches.
@DailyTuna Před 5 dny ⁺²⁸
You are an asset to AI enthusiasts and people who want to be informed!🎉
@lysambodia Před 5 dny
I'm like you, i turn to his channel first and foremost for ai news
@jimrhea5484 Před 5 dny ⁺²¹
You are my AI goto guy. It's all moving so fast that I can't keep up. Without your channel I would be lost. Thank you for all the work.
@micbab-vg2mu Před 5 dny ⁺¹⁹
OpenAI delays Anthropic delivers - it's great that we now have competition! The monopoly has ended.
@JuliaMcCoy Před 5 dny ⁺²²
That new AI chip 🤯🤯. Apple talking to Meta to integrate Llama 3 into Siri is really interesting.
@brandonreed09 Před 4 dny ⁺¹
They confirmed that is not happening.
@DailyTuna Před 5 dny ⁺⁸
You got to watch a launch at the Cape. You set up on the west side of the water at numerous spots. The most exciting part is the sound wave that hits later after the visual. The ground rumbles! Freaking awesome!
@joannot6706 Před 5 dny ⁺⁸
Just imagine AI agent running on future generations of that chip .... Crazy.
Imagine that kind of chip running AI in a phone ...
@fynnjackson2298 Před 5 dny ⁺¹
Its 1000 pages a second, Its freaken bonkers...
@brodyalden Před 5 dny ⁺⁴
Thanks
@andreimaimas489 Před 5 dny ⁺¹
Thanks Matthew for all the amazing content you are making!
@MaJetiGizzle Před 5 dny ⁺⁶
There is a reason why they couldn’t just host the Llama models themselves. Meta’s Llama license specifically prevents big tech companies like Apple from using their models for commercial purposes without express permission from Meta.
So I’m assuming that they weren’t able to come to an agreement that would allow Apple to host the Llama models themselves.
@ArnaldoGomes Před 5 dny
You are right. I was about to remind @Matthew about this…
@wingflanagan Před 5 dny ⁺²
The problem with benchmarks is like the problem with any developmenty methodology based on tests: you start targeting the tests instead of the desired functionality.
@yourneighborhood Před 4 dny
You have an awesome channel Matthew!! Really informative and timely. Keep up the good work!!
@TechRenamed Před 5 dny ⁺¹
Wow this is big news thanks bud :)
@02280228 Před 4 dny ⁺¹
It would be great if your AI rubric would test for the ability to use functions. It would require a bit more setup, but I think it would be great to see how good of an agent a model can be.
@MoDs_3 Před 5 dny
Thank you my dear friend! ❤
@DaveShap Před 5 dny ⁺⁵
Regarding procedurally generated AI videogames, it makes me think that maybe reality is a neural network.
@dievas_ Před 5 dny ⁺¹
Well, hate to break it to you, but your brain is exactly a neural network.
@zionsky3342 Před 5 dny
Yes.
@Weirdgeek83 Před 4 dny
I'm feeling this way more and more. We're all just AI agent instances put in a simulation so we don't go mad.
@zionsky3342 Před 4 dny
@Weirdgeek83 🤣👌 doesn't work very well though... like a 10% success rate
@OculusGame Před 5 dny ⁺⁷
I'll never be able to comprehend people's stupidity, imagine hyping Etched after they basically only said: "yeah bro, we're 20x times better than Nvidia, we can do at least 500k tokens/s", most people have no critical thinking, even when they only have to compare a new/unkown company that basically show no demo, only a render to a multi TRILLION $ company. Pathetic.
@CrypticConsole Před 4 dny ⁺²
to even get close to that level of performance you would be relying on HEAVY batching, so its not even worthwhile for most people
@NateAde Před 4 dny ⁺¹
They're not the only company doing it. I think George Hotz company is doing something similar. Reading what you both wrote sounds like what people were saying about crypto ASICs...they were all wrong. Specialized hardware, software and agents will be the future. Nvidia will probably just out build them... the next iteration won't be GPUs.
@generichuman_ Před 2 dny
Imagine being so stupid that you can't even listen to what's being said in a CZcams video. They explain that the reason they can get this performance is because the chip is highly specialized... It can only run transformers. You can get huge performance gains if you design for specific algorithms. Nvidia is not playing this game. Their GPU's are universal and probably always will be. It's a tradeoff between speed and universality.
@stanisd Před 5 dny ⁺¹
Awesome thank you
@kevinvillalobos2757 Před 5 dny ⁺⁴
Its incredible that Phi 3 Medium, with just 14b parameters, ranks highly on the open leaderboard compared to 70 b parameter models :O
@blisphul8084 Před 4 dny
Interestingly enough, phi-3 mini at 3.8b beats Qwen2 7b from a benchmark perspective. This would seem to indicate that Phi may be fundamentally the best model around. I'd like to see a Phi 1.5b, 0.5b, 70b, and maybe a 300b model.
@jonnhanks8274 Před 5 dny ⁺⁵
SpaceX has been landing those boosters and reusing them for years now :D 22 reuses is the current best for a booster
@wonmoreminute Před 5 dny ⁺³
Yeah, I’m not sure why this is news. Was there something unique about this landing?
@mav3818 Před 4 dny
Typical Falcon Heavy launch. They've done it close to a dozen times
@Mohammad-nv1wv Před 4 dny
One of the best videos of news
@MrAluntus Před 4 dny ⁺¹
Matt, SpaceX rocket dual and offshore barge landings have been a thing for a while now. What you need to follow is how they are going to catch their next generation Starship, the next launch they plan to catch the heavy booster when it returns, which is really massive compared to Falcon 9 rockets.
@my_permaculture Před 5 dny ⁺¹
Yes. Claude in an agent setup test.
@Charvak-Atheist Před 5 dny ⁺⁵
Whats the use of demonstration of Sora and GPT-4o if they are not releasing it
@TheCaphits Před 5 dny ⁺⁵
It got me to subscribe for a month, only to be rug pulled.
Give me my twenty bucks back Sam.
@TripleOmega Před 5 dny
It's useful for them, not us. They just want to pull attention away from other AI companies.
@generichuman_ Před 2 dny
That's like asking what's the point of broadcasting the moon landing if I can't take a trip there myself. Isn't showcasing technological improvements valuable in it's own right?
@voodoochild420ai Před 4 dny
great vid
@claudio2081 Před 4 dny
yes would appreciate seeing Claude being tested in agent setting as a next video
@Ding63 Před 4 dny
Yes id like you to test claudde in an agent setting!
@TheStuntman81 Před 5 dny ⁺¹
Khm khma, Matthew, falcon heavy boosters landing has been a thing for 5 years now :)
@Mohammad-nv1wv Před 4 dny
You are the best ai guy 😊❤
@TJTHEFOOTBALLPROPHET Před 5 dny
I told you last year - you are the STANDARD!!!! Now let's make it make money!
@ScottVanKirk Před 4 dny ⁺¹
The idea of managing a programming project without access to the internet and without access to the latest documentation is ludicrous. Until they build that in, it's an interesting toy.
@patrickjreid Před 5 dny ⁺¹
Honestly, any time you say "if you want me to test this" the answer is yes!
@alexlavertyau Před 4 dny
Can’t wait for AI to enter the VR space, want to walk around exploring a beautiful environment
@e11e7en Před 5 dny
I would like to see a comparison of which models perform best in an agent setting
@metonoma Před 5 dny ⁺⁷
the video game video is just a video created by sora. The realism is solely due to generative video and has nothing to do with gaming graphics or a functional game. A game would still need a game engine. That said, game engines could be improved by specialized models and overall the graphics output will get an ai pass to look realistic. But this video is no more impressive than a guy eating a hamburger because it's no different.
@SpragginsDesigns Před 5 dny ⁺¹
I have been using this new FUTO keyboard, and it's amazing, and it's powered by LLM models. So the voice to text is offline.
@mathlinq Před 4 dny
Thank you Matthew for a great channel! If you want a more challenging math/physics problem I came up with this problem the other week so it is perhaps new. When I tested it ChatGPT was pretty far from solving it while Claude did somewhat better but still did not get the right answer. Here it is:
"Picture something which looks like a skateboard ramp. At the bottom there is a plane section with a length L. At both sides there are there are curved sections in the form of quarter circles, each having radius R. For the ideal case, disregarding frictional forces et cetera, calculate the fraction of L that R should have in order to minimize the round trip time in the ramp."
@user-pn8te8tl1t Před 3 dny
Working to ensure "good speak."
@Grahfx Před 5 dny ⁺²
if you have NVIDIA shares, sell asap.
@IrishSkeleton Před 4 dny
9:09 We’re very interested in Agentic use cases, LangGraph, SelfRAG, etc. Would love to see Claude Sonnet testing in agent settings. Love the channel, thanks! 🙏
@rghughes Před 4 dny
One thing I've noticed about the "AI Game" footage is that the observer never turns around or retraces their footsteps at any point.
@StuartJ Před 5 dny
The big question about Etched is how it scales. My suspicion is it cant, hence why they are not making groq comparisons.
@ryzikx Před 5 dny
its transformer only so once transformers are outdated its antiquated. however for now its massive
@uaint1stulast323 Před 5 dny
I watch almost all your videos. I think putting together an agent based test framework would be of value. Although I do realize making one also not tool specific is a difficult task.
@user-fd1ps4eb1p Před 5 dny
Wish you'd find a model and / or explain how to use AI to do a company's intrinsic value analysis.
@RonLWilson Před 5 dny
Here is a prompt that I tried in Chat GPT 4 O that failed. You might try this as one of your test cases.
Generate a Form using Pascal that has 5 tabs. When one tab is selected it highlights and the other four un-highlight. Make the form so that it size can be altered in height and width by click and dragging on its handles. Make the code suitable for the Laureus IDE.
@user-ud3rv5xo6z Před 5 dny
nIce one, thank you. Star interview calls whith guest from industy.
@gold-818 Před 5 dny ⁺¹
9:18 To think they have been doing that since 2018 is amazing. Falcon Heavy is wild but too bad it's going to be retired soon and you know old technology compared to starship.
@joshross4 Před 5 dny ⁺¹
There should be some type of benchmark to see how much money the agent can make you in a given time.
@rRobertSmith Před 3 dny
Etch and qualcomm seem to have better chips than Blackwell, the thing is what kind of time table and volume will they
be able to produce before year end?
@fire17102 Před 5 dny
Hard Q Suggestion: ❤
On a piece of paper there are 4 marked points, each representing a corner of a square. Points are labeled A, B, C D.
Using these 4 points, draw a Box with an X inside it ⛝ , using a continues line. You can not draw over previously drawn segments or raise the pen from the page until your done. Provide the correct sequence of continues lines segments needed. Start from point A
@fire17102 Před 5 dny
Right answer: Trick question, it's impossible to draw a box with an x inside it without lifting the pen or retracing previous segments.
Bonus points: This is a challenge from graph theory, and is solvable if there are 2 or less Odd connected vertices. Since this shape has 4 Odd corners it cannot be drawn under these limitations. But if you allow adding another point (E) this can be solved:
A-C,C-E,E-B,B-D,D-C,C-B,B-A,A-D
None of these segments repeat, A has 3 connections, D has 3 connections, B & C have 4 connections, and E has 2 connections. Since there are no more than 2 Odd-connecter vertices, this is now solvable
Extra bonus: if it tries, fails, understandw why it failed, explain why it's impossible, give a solvable variation
All the best
@darnellarford2439 Před 2 dny
Was this your first time seeing Falcon Heavy boosters land at the same time?
@fynnjackson2298 Před 5 dny ⁺¹
INSANE 500,000 tokens a second is about 1,000 pages.
@WhyteHorse2023 Před 4 dny
See if the AI can solve the traveling salesman problem. "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?"
@nikolajankovic9573 Před 5 dny
Good content as always Matthew, but I'm surprised you didn't know about the limitation of the Llama 2 and 3 licences that prevent corporations with hundreds of millions of monthly active users from running the models locally.
@H3liosphan Před 5 dny ⁺³
Interesting, so Hugging Face have identified effective cheating in the corporate AI world by kind've 'hard coding' benchmarks into their models to improve their scores. Time to try out different game making challenges then, no more snake. Maybe even got so far as to come up with esoteric games and describe the gameto the AI for it to code.
@hastyscorpion Před 5 dny
It’s not cheating. It’s designing to the test. Which makes it better at the test but not “better”.
@H3liosphan Před 5 dny
@@hastyscorpion The reason why I think it's cheating is because real money is invested and made from these AI models, by appearing better at these insufficient tests, they might make more money from them.
@neelmehta9092 Před 5 dny
Hey matthew here are some test ideas:
1. Needle in a haystack
2. 5-6 shot on extremely hard research problem. Like coding a transformer from scratch or any hard leetcode problems not in the training set
3. Some more logical physics problem, maybe try some medium elasticity problems
4. Chemistry tests, some very difficult organic chemistry interactions
5. Music theory, generation of the best possible note and time composition one bar at a time. Once done it can be passed on to irl or to a model like suno. This will be more interpretive
@alansmithee419 Před 5 dny
My only issue with this is how he would verify the tests?
He's already not great at telling whether or not a response demonstrates correct understanding of a problem, and most of these are niche technical science topics. He has no ability to verify the AI's outputs.
@chrisahunter Před 3 dny
Maybe the privacy concerns are regarding the data Llama 3 was trained on.
@MeinDeutschkurs Před 4 dny
Claude needs xml system-prompts to follow the best.
@truehighs7845 Před dnem
If it looks like science fiction it’s maybe because it is. We had a vertical landing lunar module on Texas Instruments some 35 years ago…
@ToonamiAftermath Před 4 dny
I read that Etched is coming to market Q3 2024! There is an eye opening 2 hour interview with the founders on X its so good
@robertheinrich2994 Před 5 dny
if that chip is reasonably priced and capable of running any LLM based on transformers, that future might even more interesting.
maybe, there is a smaller version of that chip for home appliances. 50t/s is useful for lots of cases.
@DefaultFlame Před 5 dny ⁺¹
Do please test Claude in an agent framework.
@nftawes2787 Před dnem
I've been thinking for awhile that, because of how popular you are, your tests need randomness, so the tests are fundamentally stable while being situationally chaotic
@tobiashadlich7082 Před 4 dny
Can you start a ultrasound report question? Or is it too difficult for a non doctor. I use llms to get good ultrasound reports out of one sentence or diagnosis. And I can see huge differences in llm performance and the need for editing
@kirpaS Před 2 dny
Etched.
Thinks it csm compete with nvda?
@Kutsushita_yukino Před 5 dny
claude 3.5 sonnet is really good! it just doesn’t have that emotional intelligence opus had so i would use it for general task, but not for conversarions because it only gives gpt like robotic response
@dg-ov4cf Před 5 dny
maybe it means llama has secret backdoors
@crayzeape2230 Před 2 dny
A good test is to ask an AI model to double a temperature. The correct result involves converting the temperature in degrees (C or F) to kelvin, doubling it, and converting back to degrees. Any AI's I've tested this on just double the initial degrees value, which is incorrect. For instance double 12 degrees C is actually 297.15 degrees C
@euginium1539 Před 4 dny
I have a feeling they are delaying the release to coincide with the iPhone 16 announcement, potentially aligning it with Siri for a greater wow factor.
@Mimic4Gold Před 4 dny
Can you test Claude Sonnet in an Agent setting?
@richielavey1565 Před 4 dny
Use complex math problems similar to the Monty hall problem but less well known for your tests
@giovform Před 5 dny ⁺⁴
Nvidia shares dropping in 3.. 2.. 1..
@user-wm2vi7jn2k Před 4 dny
Here is a good one:
say I have 4 points on a coordinate grid: Point a (3|3) point b (6|6), point c (6|3) and point d (2|7). Say I connect point a to point b and point c to point d, and then rotate the shape by 135 degrees by its midpoint. What kind of shape is it, and in what direction would it be?
You could add the "Think step by step to get to the right answer." at the end to make it easier, but still no LLM is able to get this in my testing.
Answer: The resulting shape is a latin cross, like the christian cross. The position of the shape would be would be the normal upright position, so the short line is perfectly horizontal and the long line would be perfectly vertical.
@ronbridegroom8428 Před 4 dny
SpaceX has achieved a two rocket landing before. It is impressive. If they are able to catch their super booster with the arms of the launch tower it will be beyond impressive
@punk3900 Před 5 dny ⁺¹
Etched is from far fetched... The text on their webpage is like a typical scam story. It seems they have an idea and a 3d rendering of some fancy looking card modeled after NVIDIA ... If ASICS for transformers were feasible, NVIDIA would be already selling them.
@RichBira Před 5 dny
Matt, most likely with the Apple / Meta talks, privacy could very well be a “requirement” that was in the negotiation that they didn’t come to terms with. Meaning Meta could have had a stipulation that certain data of the users needed to be passed through with the prompt. (Just like Apple pointed out that OpenAI will never get the ip address of user’s iPhone with prompts)
@actepukc Před 4 dny
MuSR are your questions from the rubric - about the drying time, the killers, apple as last word in sentences and I think you removed the hole in the latest versions :)
So you can sort your rubric in those fields/themes now :D
Thanks to someone in HF now you might not need to run the test manually - but just execute the test by typing in console -
"run (something) GPQA -o test_data.csv"
(I'm just guessing but maybe who know) :D
@user-pn8te8tl1t Před 3 dny
How does Cerebras fit in?
@Repz98 Před 5 dny
Not sure if you covered it, but OpenAI just bought a company that remote controlls computer, maybe they will implement AI that controlls your computer if given permission? OpenAI have already made ChatGPT software on the mac you can download, so I wouldnt be suprised if they continue with off-website products, like software.
@hotlineoperator Před 5 dny ⁺¹
- Etched can only be judged when there is one, otherwise it's just a rumor to trick investors for lottery
- for AGI we do not even know how to test that, propably model makers create tests that are good for them
- render video and games, I think that AI first create 3D layer, then render it again with AI generation to make it realistic
@discardedparticles Před 5 dny
I get the sense the delay for the advanced voice mode has to do something with their new partnership with Apple. Seems like something Apple would like to tie into the release of IOS 18 and the new iPhone in a few months.
@Tenly2009 Před 5 dny
The GPT4o model is driving me crazy. It insists on providing lengthy responses no matter how many times I tell it to keep its responses brief, or to just give me the first step, or to just give the the changed lines of code - but even if it only changes or adds 1 line of code to an 80 line script, it will ALWAYS re-generate the entire function. After fighting with it over and over, I’ve finally switched back to GPT4.
@fynnjackson2298 Před 5 dny
Swirch to Claude, its where the cool kids hang out, come join us.
@irafuchs Před 5 dny
I would very much like to see you add the New York Times Connections puzzle as one of your LLM benchmarks.
@DailyTuna Před 5 dny ⁺¹
Saw that news the other day, but can the chip train a model?
@milutinke Před 5 dny
Hello, where is the link for Hugging Face benchmark article?
@coenkuijpers1800 Před 5 dny
Claude 3.5 is indeed phenomenal in coding. I was able to create a python project with Qt6 GUI in no time to an MVP.. Only minor bugs were made and a few iterations were necessary. But today it completely went rogue on me. Didn't fully 'read' the documents attached. When adding new functions, old ones were destryed, didn't understand what I wanted and it just felt like it didn't want to 'work' during the weekend :)
@azhuransmx126 Před 5 dny ⁺¹
Sohu would change everything and help to make NVIDIA accelerates even more, this is perfect because AI chips are near to reach the human brain TDP individually more than never before, and this could be the cherry on the cake.
If Blackwell's jump was amazing the Sohu's jump yeap "Will Change Everything"😂
@kaleimamahu Před 4 dny
ChatGPT is quite poor in translating between code and general math equations in general format?
@paulorodriguez6288 Před 5 dny
the transformers ASIC sound too good to be true atm, but we'll see, hopefully it's also very energy efficient, i want to finetune models for cheap!
@MikaelMurstam Před 4 dny
2:56 too, not tool
@lupker Před 5 dny
What is the link to this Chubby game ai render video ? Thanks
@giupidelloglio Před 4 dny
As you can’t use llama3 for commercial use I guess Apple can’t just host it and make it available to everyone
@Utoko Před 5 dny
I think it is more the concerns about the perception of privacy. We saw how the OpenAI solution had a backlash already. Apple's image is very important to them getting close to Meta is not a good move in that regard.
@erb34 Před 5 dny
Where is Qwen2 7B in the list??
@marcsaintjour3384 Před 4 dny
If they cancel the deal with Meta over privacy concerns, it means that you were wrong and Elon is right there are still major privacy concerns about the OpenAI deal regardless of how they described it.
@gabedude68 Před 5 dny
You got/use/pay for "Claude 3.5" but.. How? Where? So much to absorb, but can't find how to try it out. Also, benchmarks are great, but what does it show the winner can DO? Process Chats quicker? Mining? Or is it smarter?
@IceMetalPunk Před 5 dny
Claude 3.5 Sonnet is available to anyone on Anthropic's website. You can then upgrade to a pro plan for more tokens per day.
@gunterstrubinsky9452 Před 5 dny
claude 3.5 cannot process uploaded images
@rocketPower047 Před 5 dny ⁺¹
Fairst
@I_am_a_human_not_a_commodity Před 5 dny
Apple talking about privacy concerns is cute.

Další v pořadí

Automatické přehrávání

Bill Gates Reveals Superhuman AI Prediction