Rethinking the Pipelines API

Sdílet
Vložit
  • čas přidán 16. 05. 2024
  • We're experimenting with live streams!
    In this live-stream Vincent will do something pretty experimental: he'll explore a totally new way to define scikit-learn pipelines. It'll be a DSL and it will be a fun one. We're going to see how you can write custom classes to handle scikit-learn on your behalf.
    More information about the project can be found here:
    github.com/koaning/scikit-pla...
  • Věda a technologie

Komentáře • 7

  • @dsds-rj9rg
    @dsds-rj9rg Před 15 dny +1

    this is a really great idea--essentially a grammar of graphics/ggplot setup for pipelines :) essentially you kind of build an algebra of pipelines and all the operations are closed (takes a pipeline and returns a pipeline)
    also thanks in general for putting up these videos, your channel in general is an awesome ML resource

  • @probabl_ai
    @probabl_ai  Před 16 dny +1

    Today we learned that playtime is an unavailable project name of pypi, so we renamed this project to scikit-playtime. Sorry for the confusion folks, we should've claimed it earlier. Our bad!

    • @probabl_ai
      @probabl_ai  Před 16 dny

      This is the new repository: github.com/koaning/scikit-playtime

  • @brodriguesco
    @brodriguesco Před 16 dny +2

    I was going to comment that you it looked like you were re-implementing formulas from R, but right at the end you explained that the inspiration came from R 😄

    • @probabl_ai
      @probabl_ai  Před 16 dny

      There are some subtle difference actually, but yeah, the venn diagrams certainly overlap.

  • @armanboyaci
    @armanboyaci Před 16 dny

    Vincent this looks super fun! A couple months ago I was asking myself why don't we have a "modeling" layer on top of the sckit-learn. And now I am optimistic that we will have one soon :)
    How do you feel about allowing multilevel models? For example suppose you have store level sales data and you would like to introduce a separate intercept for each store. Or you have multiple products in different product categories and you want to try two different models like in one model you have a single holiday coefficient, in the second model you may want try to have separate holiday coefficients for each product category.
    I think it could be useful to take some inspiration from probabilistic programming language libraries in general. For example, I really like "STAN"s approach which makes you think directly in mathematical notation.
    Thanks again for this experiment, please keep going. I hope that it ends up to a mature project. I really would love to use it in the future!

    • @probabl_ai
      @probabl_ai  Před 16 dny

      (Vincent) It was stuff like this that I had in mind when I started exploring this space. My direction won't be exactly to do proper multilevel stuff like STAN/PyMC might do it. But my gut feeling is that we can do something that's close enough and fast to train by doing cool things with features/preprocessing/modelling.
      The joke here is that I have a bag of tricks, not just words ;)
      That said, all of this stuff in the video is very experimental and doesn't resemble what might be in scikit-learn. It could also live on as an idea on top of it or it might turn out to be a bad idea after giving it some more serious datasets. Time will tell!