I used the first AI Software Engineer for a week. This is happening.

Underfitted

zhlédnutí 16 738

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 2. 05. 2024
I teach a live, interactive program that'll help you build production-ready Machine Learning systems from the ground up. Check it out here:
www.ml.school
To keep up with my content:
• Twitter/X: / svpino
• LinkedIn: / svpino
Věda a technologie

Komentáře • 89

@tonywhite4476 Před měsícem ⁺¹⁶
It’s not going to replace all software engineers but they won’t need as many.
@paulocacella Před měsícem
That is the correct point.
@moozooh Před měsícem ⁺³
The fact that it's disproportionately disrupting the entry-level jobs first is much more dangerous than simply removing the need for a percentage of jobs per se because it creates a barrier for entry in the profession that will affect every future generation (the further in the future, the more it will), to the point where it's just not financially viable for newcomers to keep investing in it (because you won't get your money back until after you've reached senior level).
@manuelmaxgonzalez2432 Před měsícem
I think it is still a little too early to tell. It depends on how fast this things get better. A drastic improvement in productivity per SWE might enable a lot of proyects that were too expensive before and end up increasing demand. But if this tools improve very fast, then supply will flood demand.
@javaparainiciantes Před měsícem ⁺²
02:45 - This is Devin - 1st test - mnist digit classification
04:59 - Devin ask for help
05:48 - Deploy in heroku
06:28 - Devin said it deployed but didn't
07:34 - Completed exercise but many dead code
09:15 - Second project: tic tac toe
10:21 - Ask Devin to move the button to below the board
11:55 - Devin deployed at netlify
12:05 - The third project: Lunar Lander Project
13:21 - Devin figured out that he had to migrate the TF version
15:30 - Impressive but disappointting. Devin broke the code
16:10 - Python Backend Implementation - take home assessment
17:54 - Improve the UI
18:21 - Final Example - RAG Example -Almost worked but he had closed the session
21:10 - Second try - complete failure
23:00 - Devin feels very slow
23:40 - Opinion: Biggest value of Devin
24:10 - Conclusion
@pabloarroyo7952 Před měsícem ⁺²
Very good video. One to look back to in a couple of years time
@scretney1 Před měsícem
Thanks, Santiago - excellent review of Devin. Appreciate you.
Před měsícem ⁺⁵
The new kind of programmers are the ones who program AIs to program better for us!
@ShpanMan Před měsícem ⁺²
Yea, for 1-2 years. Then AI could do that too.
@demianclarke Před měsícem ⁺²
Thanks Santiago for so valuable content. Un abrazo desde Barcelona
@underfitted Před měsícem ⁺¹
Gracias!
@francescociulla Před měsícem
Thanks for sharign Santiago!
@lokeshsharma4177 Před měsícem ⁺¹
I second every single word you said. I have Computer Engineering background with 28 years in the industries (although tech part was only for first few years) have seen transformation from OnPrem-SelfService-Cloud journeys and as you say Rightly CHIEF , this is nothing but marketing stunt at this time and an ambition where we (the Human) wanted to be in future. God Bless You
@germainrodrigue367 Před měsícem ⁺¹
Santiago, You're amazing 🎉
@rsivakanth Před měsícem
Good one Santiago, you put Devin to test, for sure 🙂Albeit, this is reassuring and SW Developers/Engineers aren't at threat, yet ;-) Thanks.
@divyapadhiyar9470 Před měsícem
What a proper explanation and help us to learn about ai
@pensiveintrovert4318 Před měsícem ⁺³
Summary: it is not useable for what it was claimed. I have now spent 4 days playing with gpt-pilot with Llama 3 70b. Goes around and around, making mistakes, trying to correct mistakes, them doing this infinitely.
@24-7gpts Před měsícem
Awesome video!
@doshin2019 Před měsícem
Thanks for your review! I have a question regarding the LLM model used. While I understand these models are typically trained on open-source data (please correct me if I'm mistaken), I'm curious about the potential future implications.
What if, down the line, LLMs are trained on massive amounts of proprietary code? What kind of outcomes might we expect from such a shift? I'm interested in your thoughts on this.
@henrymaddocks984 Před měsícem ⁺¹
This is a great video. "Some weird things inside" is not OK though. This is why we have senior developers
@moozooh Před měsícem ⁺¹
This is the issue, though. Senior developers didn't start off senior; they were students, then possibly interns, juniors, middles, seniors. If AI disrupts this chain of skill cultivation by removing any need for internment and like 90% of juniors and some middles, how are they going to become seniors in the future? In fact, how would a future software engineer even enter the market and prove their competitive advantage?
@felixronnoh Před měsícem
Nice review. Are you the first person to create the lunar lander?
@villanianalytics Před měsícem ⁺²⁷
This just goes to show that while AI can complete many tasks, right now there is a huge dependency on the user being knowledgeable about what is being requested. You got as far as you did because you were able to help point the AI in the right direction. Someone with no coding background wouldn't even be able to get a fraction of the progress you were able to get
@goatpepperherbaltea7895 Před měsícem ⁺⁶
Yeah but rn computers take up a large room but one day they’ll fit in your pocket and be thousands of times faster
@RavishankarAyyakkannu Před měsícem ⁺¹
The same applies for generative music or image generation. You should be more proficient as an artist or musician to get what you want instead of some random cute generation.
@zedmor Před měsícem ⁺⁵
First customers of systems like this would be developers.
@Yomi4D Před měsícem
That's rn. This wi change.
@malartbecomes236 Před měsícem ⁺²
You'd be surprised what beginner coders can get out of models with enough specificity, especially if you provide it with the right context. The issue is that the models aren't adept at finding, or more importantly, recognizing the correct, up-to-date and actionable information via search and RAG; without very specific instructions, they lack the sufficiently complex, robust memory and reasoning skills that humans do. I don't think we are ever going to get to the point where a human can provide a non-specific prompt and have the model intuit exactly what the human left out, unless we do something ludicrous like training models to be lifelong companions and pairing models and humans at birth. The whole approach is wrong.
We should be encouraging hallucinations and handling them differently. Not sure exactly how, but I know the FLARE framework tries to assess when a model is unsure about a token and uses that as an opportunity to perform RAG generation, but I think a much more effective method would probably be to allow the model to follow the alternative thought path (tangent), with some sort of way to summarize and classify the contents of the tangent, have another model attempt to verify the information, return the model to before the state where the tangent started, inject the information (along with the verification attempt) into some sort of internal thought register, so the model can 'register' the thought without compromising the current output, and then reassess the model's confidence in the next token. I know variations of this are already implemented elsewhere, but I don't think anyone is doing exactly this. It would be sort of similar to the tree of thoughts, but probably more robust, because it would bring up all sorts of other considerations to keep in mind, based on the problems the model ran into on each tangent.
This would obviously get very expensive so it's probably a crappy idea, but I like thinking of stuff like this.
I'm a beginner coder, if it wasn't obvious.
@AndrejsKarpovs Před měsícem ⁺¹
Would definitely use Devin in its current form to boost my learning!
@ndrcntrl Před měsícem ⁺²
Excellent, thanks for the detailed preview of Devin! It’s definitely the real deal. Now I can begin to understand the incredible valuation of such an early stage company. So many tasks from my current dev backlog could be assigned to multiple instances of Devin running in parallel. I can dream of being freed up from many of those mundane dev tasks to pursue the fun and interesting aspects of projects with the help of an AI assistant like Devin. Super excited to get access, hopefully in the not too distant future. Great video, love your content 🤩
@carinebruyndoncx5331 Před měsícem ⁺¹
I feel the same way, I think I am going to invest in a multisession setup to multitask with Devin, devika, ... the future of a software engineer desk will look more like a control room I think
@prasadghumare Před měsícem
Amazing!
@tomas0413 Před měsícem
Hey, Santiago, great video! I’m still on a waiting list for Devin, but I looked at OpenDevin a few weeks ago. It was perhaps a bit too early and I plan to have a look at OpenDevin again. Any thoughts / plans on making a Devin vs OpenDevin comparison?
@underfitted Před měsícem ⁺²
That’s a good idea!
@abudhabi9850 Před měsícem ⁺⁵
So it kinda can solve somewhat easy problems while the solutions it creates are likely hard to maintainable and change. Nice for solutions which just have to work somewhat, however, when you require certainty you wouldn't want it to write your code.
Maybe Devin would really benefit from a "project cleanup" command before it delivers a project?
@davidcrocombe1322 Před měsícem
I think these AI should always do a cleanup automatically, however if they don’t then we need to ask for it as standard procedure.
Come to think of it, we probably need to be specific about what cleanup we need - remove dead code, runtime performance, human readable code style & comments, dependencies allowed.
@carinebruyndoncx5331 Před měsícem ⁺¹
As soon as you have a 3ork8ng program, tests automated , you can start refactoring and improving, look at the focus area of codium
@jofus521 Před měsícem
Do you think for the lunar lander, it would be useful to have it write tests first, then refactor the code afterwards? Would it be capable of writing the tests based on its understanding of the code without running it?
@underfitted Před měsícem
I’m not sure. For the lunar lander, it’s a neural network what powers everything, so it would be very hard to test it with unit tests. More generally, tests can definitely help a tool like Devin
@wwkk4964 Před měsícem ⁺¹
Looks like Devin wrote India's 2019 lunar lander code too, it crashed!
@brucerosner3547 Před měsícem ⁺⁶
I think this missies the whole point. Coding is a mechanical process readily automated. Software engineering comprises first generating requirements, that is, defining what is to be done and then selecting the most appropriate solution to meet the requirements. Defining requirement requires knowledge of the problem space not just computer knowledge.
@raymond_luxury_yacht Před měsícem ⁺¹
Yup. It's concept Vs production. Production is just factory. And only robots work in factories now. Yup. High level conceptual work is the high value for work. Which means you need an imagination, just like Einstein said.
@hansu7474 Před měsícem
Add to that, coding is not mechanical process.. it's a ridiculous statement. I think if you're a software engineer you'd know it's not.
@bjrc Před měsícem
This is exactly what I've concluded over the past few months. But it applies to many domains, not just coding. Retail for example: existing LLMs can give a lot of high level information about how to optimise a retail organisation, but without being spoon-fed very carefully constructed reports and tools, it won't get anywhere. I hope it will get better with future LLMs, but for now they need a lot of guidance.
@patrickwhite9902 Před měsícem
Soz if I missed it, but what LLM is behind the demo? I think the Devin mechanism is good but it's capability is model bound, right?
@underfitted Před měsícem
I’m not sure what LLM they use. I don’t know if they disclose that.
@24-7gpts Před měsícem ⁺¹
It's GPT 4 Turbo 2024 04 09 version
@BhargavSolankisolankibhargav Před měsícem
do you usually always ask remarakbly and grammatically correct prompts?
@underfitted Před měsícem ⁺¹
Only when I’m drunk
@FergusMeiklejohn Před měsícem
What did it cost? I remember swyx said that Devin is expensive.. I wonder what the cost/performance would be if it used Llama3 70b through Groq
@underfitted Před měsícem
I got free access to it.
@middle-agedmacdonald2965 Před měsícem
Thanks, first video I've seen. I don't share your optimism about the future. The idea is to eliminate paying for labor, or to get it as cheaply as possible.
We're all guilty of wanting things cheap, so it's all of our faults.
@tarekabiramia913 Před měsícem
How much time did they take to give you the access ?
@underfitted Před měsícem
I reached out to them directly on social media. They probably gave me access because I have a large audience.
@tarekabiramia913 Před měsícem
@@underfitted i highly appreciate your quick reply, so i need to wait in the queue 😅
@tsaminamina_eheh Před měsícem
Do they use their own LLM or an existing one under the hood?
@riderjohnny5117 Před měsícem
They use GPT-4
@underfitted Před měsícem
Personally, I don’t know.
@raymond_luxury_yacht Před měsícem
It's all about the fine tune. I expect ppl are working on really getting specific models expert in specific languages to write apps for particular contexts.
@davidcrocombe1322 Před měsícem
It changed your request of recognising 0 to 10 numbers to 0 to 9.
@underfitted Před měsícem
Yup
@goldmanguyok66292 Před měsícem
add agent to remove unused code
agent for judging technology, which will be faster and easier
all your comments are easily solvable
@henrymaddocks984 Před měsícem ⁺¹
After everything you saw I don't get why you think the quality of software will improve using these tools.
@underfitted Před měsícem ⁺¹
Because today is Day 1. How much do you think this will change in 5 years?
@raymond_luxury_yacht Před měsícem
What quality software. All the sw I use is crap. Bugs, design issues poor ux, worse ui. It's can't be any worse than the nonsense we already have.
@goldmanguyok66292 Před měsícem
@@underfitted you can make agent to remove unused code. agent to judge technology..all your comments in the video are easily fixable. already in 1 year or less it will be perfected
@henrymaddocks984 Před měsícem
@@raymond_luxury_yacht then make better choices.
@greg-guy Před měsícem
Can you share how much you paid for token of each of the projects Devin was working on ?
@underfitted Před měsícem
I got free access to Devin.
@avi7278 Před měsícem
Can you ask Devin to integrate Branch deep linking into a cross platform flutter application for ios, Android and macos? Their documentation is notoriously sh** and i want to see how's it handles it. I must admit that your example are closer to real world tasks thank most people out here trying to hype this thing, which is something that has always bothered me. The people trying it seem to have little to no real professional development experience. I'm not looking for a junior dev that i have to babysit.
@ShpanMan Před měsícem ⁺¹
Haha, you are in the right direction but you don't appreciate how much smarter than humans AI will be in the coming years.
There will be no need for a human anywhere in the flow (well except for setting the goal). Give me an example of something a human would be needed for and recognize that future AI will do that faster, better, and cheaper.
Devin is just the beginning, it's cool, but you did recognize that improvements are a simple action of replacing the brain behind it with the smarter model - that's it.
The singularity is near.
@surajm.s8561 Před měsícem
thats a lot of tokens
@T___Brown Před měsícem
now you know how frustrating it is to be a BA. lol maybe devin should fix the BA first.
@dfsadsaaad Před měsícem
Well done. However, these tools will only improve over time, and eventually, humans will not need to write the code; they will only need to test it, assess it, and determine its usefulness. I have been writing code for more than 30 years and in more industries and more applications. Easy software jobs will disappear and only real "engineers" will remain. Self-taught techies or graduates from code academies should think about the trades. MORE software will not be needed in the future. AI's will do all of this on the fly on demand within 5 years. I have built a Devin-like system with CrewAI and it works better than Devin. Wait until GPT5.
@raymond_luxury_yacht Před měsícem
The web is dead that god. The future will be publishing content is uploading data to an embedding model which is borged into llm for rag. The output will be generated on the fly to suit the question. Eg spoke , generated video, text, music etc no more web interfaces. Web designers better start retraining.
@robertosolari__ Před měsícem ⁺²
So guys, learn maths, learn code, learn AI...
@ShpanMan Před měsícem ⁺³
More like plumbing..
@robertosolari__ Před měsícem
@@ShpanMan yeah, also. I was thinking about agriculture
@chimwemwechinamale6716 Před měsícem
You really don't need to learn AI no need for that only a handful of individuals mostly in research and at big corps matter
@EduardsRuzga Před měsícem
AI will do math. AI will do code. AI will even do parts of entrepreneurship. AI already does AI :D Aka generates evals, synthetic data sets, picks model to fine tune, does that, runs evals, picks winners :D
But like with Devin, question is on speed, price. Some of the tasks like even large software engineering needs to deal with a lot of uncertainties, UX, GDPR, a lot of random variables from hardware to infra, to OS to software and frameworks and dependencies in the project and with 3rd party modules. There is a lot of work to go trough even for team of expert humans.
I do think we will get there in next 5 years. It feels like we are moving hard from imperative to declarative, and not in coding. Its about figuring out what to do, not how.
Question then is, what is not a commodity, where costs are, what is valuable.
Chips, compute, energy? Intelligence will be exchangeable for those. Kinda like now you can spend money to get back time by making other humans do things.
There are things AI will not change though. Like it can't do anything about land. Land is finite resources and we care more about some places then others, and AI will not change that drastically. Aka AI will not change laws of physics.
Weird times. I wonder if we can get to net 0 where tech can allow to get basic necessities close to 0 like food/shelter/health/education.

Další v pořadí

Automatické přehrávání

Google Releases AI AGENT BUILDER! 🤖 Worth The Wait?