🇫🇷 Mistral AI's NEW 22B Coding Model with Code Inpainting 🎨 Beats DeepSeekCoder 33B!

Sdílet
Vložit
  • čas přidán 28. 05. 2024
  • Meet Codestral, the game-changing code generation model by Mistral AI! This powerful tool assists developers with code completion and interaction through an easy-to-use API. Codestral surpasses the competition, even beating Deepseeker Coder 33B and Llama3 70B! Unlock your coding potential and boost your productivity with Codestral.
    Tell us what you think in the comments below!
    Maxime Tweet: x.com/maximelabonne/status/17...
    Mistral Blog Post: mistral.ai/news/codestral/
    Le Platforme (use Codestral FREE): chat.mistral.ai/chat/feec47ed...
    Hugging Face Card (weights): huggingface.co/mistralai/Code...
    -----------------
    This video contains affiliate links, meaning if you click and make a purchase, I may earn a commission at no extra cost to you. Thank you for supporting my channel!
    My 4090 machine:
    amzn.to/3QMvE4s - MSI 4090 Suprim Liquid X 24G (best linux compatibility)
    amzn.to/3V5R0My - Corsair 1500i PSU
    amzn.to/4dIwybZ - 12VHPWR Cables that DONT MELT!
    Tech I use to produce my videos:
    amzn.to/4bN5eaR - Samsung T7 2TB SSD USB-C
    amzn.to/4dJFHky - Sandisk 32Gb USB-C flash drive
    amzn.to/44LHZeG - Blue XLR Microphone
    amzn.to/3ULTT3N - Focusrite Scarlett Solo Usb C to XLR interface
  • Věda a technologie

Komentáře • 84

  • @ppbroAI
    @ppbroAI Před měsícem +10

    yup, is rlly good. Tried in 4 bits, I like its explanations so far

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      Great to hear! I can't wait to try 8 bit quants once I get back to my GPU machine! :)))

    • @OMGanger
      @OMGanger Před měsícem

      Any suggestions on something better than gpto? I feel like it’s not that hard to run tree and retrieve and dump context at each node along it

  • @JakubHohn
    @JakubHohn Před měsícem +6

    I really like the coding AIs, but what feels like a great downside is that none of them are capable of CRUDing (create, read, update, delete) files directly. When they will be able to do that, I think they will be radically more useful.

    • @aifluxchannel
      @aifluxchannel  Před měsícem +3

      Good point! I'll add this in the next video. I have noticed these models even struggle to string together relatively simple Typescript / react apps.

  • @onoff5604
    @onoff5604 Před měsícem +2

    Many thanks for details in coverage of topic.

  • @QuickTechNow
    @QuickTechNow Před měsícem +1

    Helped me a lot in my C++ project, thought that these companies translate "code" to "python". Thanks!

  • @cd92606
    @cd92606 Před měsícem +2

    Excellent overview. Personally my goal is ultimately to only use locally running models, so this is an exciting step!

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      Which models are you planning to run locally!?

  • @southcoastinventors6583
    @southcoastinventors6583 Před měsícem +7

    Nice video and test of Codestral but if you going to do a snake implementation or some of visual program please run it. Need to add some pizazz. Also its great to have some competition from Europe, always look forward to what Mistral releases.

    • @aifluxchannel
      @aifluxchannel  Před měsícem +4

      Thanks for the feedback! I wanted to keep the video under 20 min! Will do a full demo next time.

  • @justindressler5992
    @justindressler5992 Před měsícem +2

    Cool Mandelbrot set that's the only use case I have for a code gen. Literally the most useful code ever. My entire carrier of 30 years can't say I ever needed or even felt the urge to write a Mandelbrot set.
    Why don't people use real life tasks like write a react login form with unit tests and e2e tests and backend verification with node express server and database again with unit tests. Have it explain security techniques used to protect from hacking and credentials. This is needed in almost every app.
    Until these things can be done flawlessly such as password encrypted in db tls enabled connection data validation avoid code injection, 2fa, cors, SSO with Google... checking, secure session db account scheme and so, rbac. They won't be replacing anyone.

    • @aifluxchannel
      @aifluxchannel  Před měsícem +3

      I generally like to stick to tasks that a human could do, but also tasks that don't take too much time to demo. I generally find that a lot of coding models will "explain away" things they're unsure how to actually implement with pseudo code or explanations of "best practices" - but also because they're just regurgitating documentation when that happens. What else would you like me to focus on / change in future videos when I'm evaluating coding performance?

  • @PythonAndy
    @PythonAndy Před měsícem +1

    thanks for the vid ♥

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      You bet! Let us know what you'd like to see more of!

  • @maloukemallouke9735
    @maloukemallouke9735 Před měsícem +1

    thanks for this experience

  • @JoeBrigAI
    @JoeBrigAI Před měsícem +5

    looks good. let’s see it a real workflow.

    • @aifluxchannel
      @aifluxchannel  Před měsícem +2

      What would you like to see? webdev, solidiy / web3, I'm all ears!

  • @AaronALAI
    @AaronALAI Před měsícem +2

    I've been having great success with wizards mixtral 8*22b model for coding.
    My workflow is pretty simple, I use textgen webui to talk to my models and the spider ide in another window and just talk to the llm like a normal person.

    • @aifluxchannel
      @aifluxchannel  Před měsícem +1

      It'll be curious to see how similar the evals for those two models are. Given they're the same size I wonder if this is just a super-sampling of one of the "experts" from their 8x22B model

    • @AaronALAI
      @AaronALAI Před měsícem

      Ooh interesting hypothesis, I noticed it was a 22b model they released and wondered if it was related in some way to their 8*22b model.​@@aifluxchannel

  • @hobologna
    @hobologna Před měsícem +1

    code inpainting is a brilliant concept!

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      I think it could become a really popular way to interact with coding models, especially if you could point / direct where you want it to focus in a codebase with comments.

  • @pn4960
    @pn4960 Před měsícem +1

    super cool!

  • @peterwood6875
    @peterwood6875 Před měsícem +1

    I like to use Claude 3 haiku for coding. I can always use opus for things like coming up with the coding project itself, or to ask tricky technical questions. I talk to haiku about the implementation and to plan, then get it to come to with some unit tests, then get it to write the code. Getting it to think a bit before generating the code seems to get it to generate good code

    • @aifluxchannel
      @aifluxchannel  Před měsícem +1

      Thanks for sharing! Have you used the new phi-3 as well? Curious what kind of coding you're using this for?

    • @peterwood6875
      @peterwood6875 Před měsícem

      @@aifluxchannel I often have conversations with Claude about maths and physics. Writing some code to do some calculations is a good way to familiarise oneself with relevant concepts, and more fun than doing calculations by hand with a pen and paper. A recent project was to implement a homomorphism and representations of Lie groups that are related to quantisation of spin. I haven't tried phi3. It looks like some versions have a decent context length, but I find that Claude's context length isn't quite enough for the way I use it.

  • @moak4052
    @moak4052 Před měsícem +1

    Which ai do you recommend your coding?

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      I generally use DeepSeek Coder 33B and GPT4 ;)

  • @siegfriedcxf
    @siegfriedcxf Před měsícem +1

    they didnt put codeqwen1.5-7b-chat, its actually score higher in humaneval than codemistral and is way smaller 7b vs 22b. i tried both, codeqwen is actually better.

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      I haven't tried CodeQwen yet, but I've definitely been impressed with Qwen 1.5 - what kind of coding do you do with this model?

  • @dkracingfan2503
    @dkracingfan2503 Před měsícem +1

    Yes it is beats it!

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      Pretty exciting isn't it? What kind of finetunes do you want to see done to this mistral model?

  • @VastCNC
    @VastCNC Před měsícem +1

    I’d like to see a model tuned to a specific language other than Python and JS derivatives. Elixir is a prime candidate with an excellent documentation library (hex docs)

    • @jonmichaelgalindo
      @jonmichaelgalindo Před měsícem +1

      Base model training literally needs hundreds of millions of lines of code.

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      It would be interesting to train the model with as little documentation / english commentary and context to see if a more accurate or actionable model would come from it.

    • @VastCNC
      @VastCNC Před měsícem

      @@aifluxchannel do you think fine tune would be sufficient? I think with elixir, outside of the documentation, open source repositories would be of higher quality because of the skill involved to become productive compared with Python and Js

  • @OMGanger
    @OMGanger Před měsícem +1

    Phi has 128k context and is only 4B?

    • @aifluxchannel
      @aifluxchannel  Před měsícem +1

      It's more about how you use the context window than it's length ;)

  • @garrettbates2639
    @garrettbates2639 Před měsícem +3

    I feel like i missed something about Devin

    • @aifluxchannel
      @aifluxchannel  Před měsícem +3

      Devin turned out to have faked their demo, and in reality was actually quite far away from "replacing software engineers" with ai ;)

    • @garrettbates2639
      @garrettbates2639 Před měsícem +1

      @@aifluxchannel Ahhh. Makes sense. Not much better than repeatedly prompting other models, I imagine?
      That's unfortunate, but at least it spawned some open source projects to try and do what they pretended to do, I suppose.

  • @m12652
    @m12652 Před měsícem +1

    There's been so many changes in javascript, html and css in the last couple of years why would a web dev want to use a tool that is only trained to 2001...

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      Base reasoning is key, because it means finetuning on top of newer javascript docs / code is even easier and translates to solid performance after the fact.

    • @m12652
      @m12652 Před měsícem

      @@aifluxchannel and yet every coder AI model I tried has produced such flaky code it hurts to read it. Even taking into account they might not be trained on new functionality.

  • @lel7531
    @lel7531 Před měsícem +1

    Why are you not running the code ?

    • @aifluxchannel
      @aifluxchannel  Před měsícem +1

      I can do this in livestreams, but for model review videos it takes too much time. thanks for the suggestion.

  • @tapu_
    @tapu_ Před měsícem +1

    You should test out if it can write and run DreamBerd, the greatest language ever.

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      Hahaha can't tell if this is a joke or a real programming language?

  • @sevilnatas
    @sevilnatas Před měsícem +2

    Wait, what happened to Devin?

    • @aifluxchannel
      @aifluxchannel  Před měsícem +2

      Demo was fake, wasn't actually as capable as it's creators claimed.

    • @sevilnatas
      @sevilnatas Před měsícem +1

      @@aifluxchannel Ah, crazy! I guess it was good enough for Microsoft.

  • @Arcticwhir
    @Arcticwhir Před měsícem +1

    Doing some testing it can be quite lazy and its creativity is low, although its coding abilites are definietly sharp and have yet to get any bugs. The way i would use this is would be for autocomplete, psuedo code ( you have to be quite detailed).

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      Interesting, thanks for sharing your results. Curious what terms / attributes you use to measure how "creative" a coding LLM is? This might help me improve how I test models in the future!

  • @jonmichaelgalindo
    @jonmichaelgalindo Před měsícem +1

    But can it write ffmpeg commands?

    • @mirek190
      @mirek190 Před měsícem +1

      yes
      Also you can paste the newest documentation the works even better

    • @jonmichaelgalindo
      @jonmichaelgalindo Před měsícem

      @@mirek190 Have you tried? I guarantee you haven't. Not even GPT-4 can do anything more complicated than mp3 -> ogg, and even struggles with something simple like that.

    • @aifluxchannel
      @aifluxchannel  Před měsícem +1

      GPT4 and Mixtral 8x7B are particularly good with these commands. This was one of the first things that really impressed me about these models.

    • @aifluxchannel
      @aifluxchannel  Před měsícem +1

      It can do things much more complicated! You should try it out.

    • @jonmichaelgalindo
      @jonmichaelgalindo Před měsícem

      @@aifluxchannel We must be prompting it differently then. :-/
      For example (real example): I wanted to input my two camera videos, convert them from fisheye to equirectangular, combine them with one on the left and the other on the right (stereo), crop 120 pixels from left and right of both, move the right down 180 pixels (bad lens alignment from manufacturer), then scale the entire output to no more than 8K. GPT-4 was nowhere near being able to write the command. (I never did figure it out. I'm doing those operations manually in Blender.)

  • @firstlast493
    @firstlast493 Před měsícem +1

    How about AutoCoder 33b?

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      We can test this soon! Is this your go-to coding model?

    • @firstlast493
      @firstlast493 Před měsícem

      @@aifluxchannel No. There's just very little video about this model.

  • @hjups
    @hjups Před měsícem +1

    An interesting model, but unimpressive in my testing. Although, it seems to be dependent on the language and problem difficulty - high resource languages with simpler problems are more likely to succeed.
    Coming from the computer architecture side (hardware design), I always test the models on low-level C and Verilog problems (relatively simple due to low expectations). GPT 3.5 and LLama3-70B succeeded more often than not, but Codestral failed all of my test cases. In fact, Codestral broke math by insisting that a*b == a+b if b is odd else a random number (what ever was previously stored). When pointing out the contradiction, it only double-down. LLama3-70B and GPT 3.5 have never failed that badly for me.

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      It's been a while since I've written verilog, but definitely an interesting edge case to test Codestral with. What kind of work do you generally use LLama3-70B to assist / accelerate?

    • @hjups
      @hjups Před měsícem

      ​@@aifluxchannel It's a fun yet frustrating language.
      I haven't been using LLama3-70B to assist with any hardware tasks, it still fails on anything useful (only succeeds at simple tasks).
      GPT4 can sometimes generate more complicated Verilog, but usually requires manual correction. It's mostly useful for generating sub-function behavior in tooling (C and python). That still requires manual guidance, but speeds up development by ~10x. I would be more hopeful of LLama3-400B, but I guess that won't be released.

  • @linklovezelda
    @linklovezelda Před měsícem +2

    Check your title bro

  • @AI-Wire
    @AI-Wire Před měsícem +2

    "We all know what happened with Devin." Nice engagement bait. Just tell us what you mean. But instead, you bait us for engagement.

    • @aifluxchannel
      @aifluxchannel  Před měsícem

      Thanks for the feedback, I assumed it was well known that Devin was caught faking their demo about a week after announcing their model.

  • @pigeon_official
    @pigeon_official Před měsícem +7

    but GPT-4o is barely decent at coding i cant image the open source stuff will be remotely useful if GPT-4o cant do 90% of coding tasks more complex than like a intro to coding course type thing

    • @aifluxchannel
      @aifluxchannel  Před měsícem +5

      I do generally agree that gtp4o (outside of open AI's demo) is basically useless for real coding tasks. Especially as a co-pilot.

    • @brulsmurf
      @brulsmurf Před měsícem

      the "opensource stuff" isnt lacking behind. and yes, there are a lot of problems with using llm's for coding tasks. you need to be very carefull

    • @yongamamkolokotho9904
      @yongamamkolokotho9904 Před měsícem

      I was creating a bfs generated Maze using 4o so far for me its impressive

    • @handsanitizer2457
      @handsanitizer2457 Před měsícem

      He means for anything complex ​@@yongamamkolokotho9904