Rethinking the Pipelines API
Vložit
- čas přidán 16. 05. 2024
- We're experimenting with live streams!
In this live-stream Vincent will do something pretty experimental: he'll explore a totally new way to define scikit-learn pipelines. It'll be a DSL and it will be a fun one. We're going to see how you can write custom classes to handle scikit-learn on your behalf.
More information about the project can be found here:
github.com/koaning/scikit-pla... - Věda a technologie
this is a really great idea--essentially a grammar of graphics/ggplot setup for pipelines :) essentially you kind of build an algebra of pipelines and all the operations are closed (takes a pipeline and returns a pipeline)
also thanks in general for putting up these videos, your channel in general is an awesome ML resource
Today we learned that playtime is an unavailable project name of pypi, so we renamed this project to scikit-playtime. Sorry for the confusion folks, we should've claimed it earlier. Our bad!
This is the new repository: github.com/koaning/scikit-playtime
I was going to comment that you it looked like you were re-implementing formulas from R, but right at the end you explained that the inspiration came from R 😄
There are some subtle difference actually, but yeah, the venn diagrams certainly overlap.
Vincent this looks super fun! A couple months ago I was asking myself why don't we have a "modeling" layer on top of the sckit-learn. And now I am optimistic that we will have one soon :)
How do you feel about allowing multilevel models? For example suppose you have store level sales data and you would like to introduce a separate intercept for each store. Or you have multiple products in different product categories and you want to try two different models like in one model you have a single holiday coefficient, in the second model you may want try to have separate holiday coefficients for each product category.
I think it could be useful to take some inspiration from probabilistic programming language libraries in general. For example, I really like "STAN"s approach which makes you think directly in mathematical notation.
Thanks again for this experiment, please keep going. I hope that it ends up to a mature project. I really would love to use it in the future!
(Vincent) It was stuff like this that I had in mind when I started exploring this space. My direction won't be exactly to do proper multilevel stuff like STAN/PyMC might do it. But my gut feeling is that we can do something that's close enough and fast to train by doing cool things with features/preprocessing/modelling.
The joke here is that I have a bag of tricks, not just words ;)
That said, all of this stuff in the video is very experimental and doesn't resemble what might be in scikit-learn. It could also live on as an idea on top of it or it might turn out to be a bad idea after giving it some more serious datasets. Time will tell!