MLForecast: Scalable Machine Learning Based Time Series Forecasting | José Morales

Sdílet
Vložit
  • čas přidán 7. 07. 2024
  • mlforecast is a framework to perform scalable machine learning based time series forecasting. It performs every step of the process in a distributed way, allowing you to scale to massive amounts of data. dask is used for the parallelism so you can use it either on a single machine or on remote clusters.
    Gradient Boosted Decision Trees (GBDT) can achieve good performance on time series forecasting as shown by the M5 competition, where LightGBM was used in some of the best scoring solutions (1st place, 4th place).
    Computing lag-based features for training is embarrassingly parallel and is fairly straightforward, you just partition your dataset by the series id and perform the preprocessing in parallel. However, most of the times you have to concatenate them back together (which can be expensive) to train a model, and once you've trained it, you have to update your features somehow in order to predict the next timestep, which can be hard to do efficiently.
    ===
    The Dask Distributed Summit is where users, contributors, and newcomers can share experiences to learn from one another and grow together. The Dask Distributed Summit provides content, information, and learning opportunities for attendees of all levels of Dask familiarity and expertise.
    summit.dask.org/
  • Věda a technologie

Komentáře • 1

  • @madhu1987ful
    @madhu1987ful Před 2 lety

    Can we use mlforecast for distributed computing on spark cluster? If yes that will be great. Pls share some resources for the same