Time Series Analysis using Python| ARIMA & SARIMAX Model Implementation | Stationarity Handling

Sdílet
Vložit
  • čas přidán 7. 12. 2022
  • "What is Time Series Analysis", "How to Make Time Series Forecasting Model ARIMA or SARIMAX in Python", "What is Stationarity in Time Series Analysis and How to Reduce it in it", "What is ACF, PACF in Time Series Analysis"... if you have any of this kind of question and what to have the understanding from beginner level then you are going to have all these concepts clarified in this vide.
    You may also like to watch -
    Time Series Playlist - • Time Series
    Pandas all in one - • Python Pandas Complete...
    Pandas Full Playlist - • Python Pandas Tutorial...
    Numpy Full Playlist - • NumPy
    Matplotlib Full Playlist- • Python Matplotlib Tuto...
    Seaborn Full Playlist - • Seaborn Beginner to Pr...
    You can find the code file here - github.com/LEARNEREA/Data_Sci...
    Tags -
    Time Series Analysis,
    Time Series Modelling,
    Components of Time Series,
    Trending Time Series,
    Cyclic Time Series,
    Seasonal Time Series,
    #DataScience #TimeSeries #PyhonProgramming #Python #learnerea
  • Věda a technologie

Komentáře • 90

  • @iustinatorul7579
    @iustinatorul7579 Před 11 měsíci +11

    One of the best ARIMA implementation tutorials I have seen. I’m a bit frustrated I found it after I had used ARIMA for a project. I can’t even tell you how much time I had wasted going online and on forums, trying to understand how it works.
    But hey, now that I learned it the hard way it better be sticking. 😂
    Appreciate it!

  • @rajaganesh3462
    @rajaganesh3462 Před rokem +5

    I have come across many blogs and videos to understand the time series process, but I didn't get a clear picture. However, this video gave me a clear understanding of the process. Really great work! Much appreciated.

  • @cvrbcheppali8214
    @cvrbcheppali8214 Před 7 měsíci +4

    This is one of the best video on Timeseries in youtube .Well Explained.Content is very nice.

  • @fayezullah655
    @fayezullah655 Před 3 měsíci +1

    one of the best video i have ever seen base on the time series in yt. Thanks for making it.

  • @nothing_to_love
    @nothing_to_love Před 22 dny

    Thanks for this amazing VDO!!!

  • @oladayoojekunle1732
    @oladayoojekunle1732 Před 10 měsíci +3

    You really did justice to this topic. Very well done!

    • @learnerea
      @learnerea  Před 10 měsíci

      thank you very much

    • @oladayoojekunle1732
      @oladayoojekunle1732 Před 10 měsíci

      @@learnerea. Please, can you make a video on how to use the transformed data especially one gotten using log, sqrt and shift. I have been trying to figure that out. The area that got me confused is how to transform the data back to the original format. Thank you

  • @pepsibrandambassador
    @pepsibrandambassador Před 3 měsíci +1

    you are great! helped me with my project last minute thanks for the video!!

  • @madhuripatel5250
    @madhuripatel5250 Před 5 měsíci +1

    if I am dealing with time series data with hourly frequency data collected for 2 years. What should I take as lag (shift) value.

  • @PriyeshM-yj8wi
    @PriyeshM-yj8wi Před 3 měsíci

    i have sales data consisting of time period and other features including different schemes as features, almost 7-8 those are active on some months so basically they are categorial variables containing 0 or 1. Should i go ahead with Armia for forecasting, if yes then how to consider those categorical variable

  • @rishabhpandey3609
    @rishabhpandey3609 Před 3 měsíci +1

    Its really a crazy explanation. I would recommend this in my org, Jio. Keep it up man. God bless you!

  • @Vizia219
    @Vizia219 Před 5 měsíci

    Hi, I was using your tutorial to learn how to implement ARIMA models. I then went about and implemented my own with some of my own data that I'm using for a school project. However, while my model fit my data very well, my forecasts are flat and they're strange. Could you help me in any way?

  • @melainetape
    @melainetape Před 3 měsíci +2

    So informative. I do not see a relation between the transformation (Log&Sqrt&Shift) which makes the data stationary and the ARIMA model you build. I'm so confused at this step. I tried with my data and noted that the ShiftDiff transformation makes my data stationary but when it comes to building the model, it does not fit well. Thank in advance.

  • @julianatorressanchez5250
    @julianatorressanchez5250 Před 2 měsíci

    You are amazing. I love the way you explain. Can you do the same for multidimensional data sets?

  • @vanikmalhotra6586
    @vanikmalhotra6586 Před 2 měsíci

    Basic Question...Why did we run the model on original set and towards the end you mentioned on running model on altered data set basically diff/square root ?

  • @scientensity
    @scientensity Před 11 měsíci +1

    In a sarima model while doing an analysis i found that for d=0,D=1(as i did seasonal differencing one and no non-seasonal differencing) prediction is fitting whole data except initial 22 values(predicting almost 0 values for initial 22 values) which is the seasonality of my data.
    can you explain why is this happening?
    I hope you got my question

    • @learnerea
      @learnerea  Před 10 měsíci

      Assuming you are using the same data as in video, please share your code at learnerea.edu@gmail.com so that we could have a view.. and guide you more specifically.. include the data as well if it's different from the video

  • @sellamimohamedkhaled4527
    @sellamimohamedkhaled4527 Před 10 měsíci +1

    really good work👌, keep it up

  • @user-mz2fd1dr9g
    @user-mz2fd1dr9g Před rokem +1

    Thank you so much for this vedio, studying since last 3 years, taken some expensive courses, this is the best explanation, kept me motivated to explore and learn throughout the vedio...let us know how we can support you to make more learning vedio thanks.

    • @learnerea
      @learnerea  Před rokem

      You are most welcome, and I'm glad that it was helpful..
      keep watching

  • @razinust2579
    @razinust2579 Před měsícem

    brother your work is extremely helpful ,brother i looked for the rolling statistics video link but couldn't find it please share it then thanks in anticipation

  • @user-ur3iz4em1c
    @user-ur3iz4em1c Před 5 měsíci +1

    great tutor thanks for the video ❤❤

  • @queenx3572
    @queenx3572 Před 4 měsíci +1

    If you use the time shift method, d will be the interval for the shift. What happens if you use any other method like the log or square root? What will d be?

  • @surendrabera2878
    @surendrabera2878 Před 4 měsíci +1

    Your content is too good. I am not able to understand why yiu have such a low views on this video. One suggesgion please make the thumnail little bit eye catchy.

  • @timetraveller7513
    @timetraveller7513 Před 6 měsíci +1

    Can't thank you enough 🙏

  • @thegroup3261
    @thegroup3261 Před 6 měsíci

    the best tutorials bro

  • @borisgisagara
    @borisgisagara Před 3 měsíci +1

    as you said you were trying to keep it to the beginner's level that's why it's kind of more understandable to the smallest degree possible, except you just got it wrong about the model, it's not ARIMA model that is working bad, it's you trying to predict a whole range of values with the same training data. it means, it'd work well on the first few values but not for all. you have to use the walk forward variation, that is basically to update you training set each time you predict a new value, Thats my idea.
    and thank you for the good video.

  • @2380raj
    @2380raj Před 6 měsíci +1

    👌

  • @Shiva-zn4nz
    @Shiva-zn4nz Před rokem +1

    This was so informative. Thank you a bunch! I understood time series. Do you have similar videos for regressions? Thank you!
    Subscribed

    • @learnerea
      @learnerea  Před rokem +1

      Glad it was helpful. the below one is on linear regression -
      czcams.com/video/IigoyVON0eM/video.html
      here is a problem we solved using the regression and other best fit models -
      czcams.com/video/2YAheiIHNzI/video.html
      I recommend you to have a look at the whole datascience playlist -
      czcams.com/play/PL4GjoPPG4VqOmyh7hQ730evtLaz04LwSf.html

    • @Shiva-zn4nz
      @Shiva-zn4nz Před rokem

      @@learnerea Thank you so much. Love you guys!

  • @sanjaisrao484
    @sanjaisrao484 Před 3 měsíci +1

    thanks

  • @user-xp5bx1tw7x
    @user-xp5bx1tw7x Před 7 měsíci +2

    Hi, Content is very good and very well explained. thanks for sharing it. Can you please help me understand that we have tried to identify the stationarity but did not use it in modelling. and even identifying the stationarity was not concluded. we did not get desired results.

    • @learnerea
      @learnerea  Před 7 měsíci

      Thank you very much for watching it. Yes, that was primarily because it was a beginner level and hence we did not want to spend a lot of time in reverting it back. Certainly we will make another one where we conclude and utilize the stationarity.

  • @esranurgunay1776
    @esranurgunay1776 Před rokem +1

    Hello sir, in the 35:35 , ai didnt get the same result with you when i execute the line of df.head()

    • @learnerea
      @learnerea  Před rokem +1

      >> You may like to revisit the code, you have created
      >> You can put the code here as well, we will analyze the diff. and can help

  • @user-rz2zl8iz3v
    @user-rz2zl8iz3v Před rokem +1

    great

    • @learnerea
      @learnerea  Před rokem

      thank you very much for watching

  • @abhilashpatel1361
    @abhilashpatel1361 Před 11 měsíci

    Hi can you plz help me to understand why lag for pacf is 20

    • @learnerea
      @learnerea  Před 11 měsíci

      It will be great if you can share the time stamp where you spot this point

  • @mattsamelson4975
    @mattsamelson4975 Před 7 měsíci

    I have a situation where I can make reasonable training and predictions with the original (non-stationary) data. When I transform the data, I am able to successfully make it stationary BUT it loses all autocorrelation so predictions are junk. Have you ever seen this? I have found some things on line that says this is possible but it depends very much on the characteristics of the time series.

    • @learnerea
      @learnerea  Před 7 měsíci

      Yes, the situation you're describing is not uncommon in time series analysis, and it's often a delicate balance to strike between achieving stationarity and preserving important characteristics like autocorrelation.
      When you difference or transform a time series to achieve stationarity, you are essentially altering the original data to make it more amenable to modeling. However, as you've observed, too aggressive a transformation can result in the loss of autocorrelation, which is crucial for capturing temporal dependencies in the data.
      Here are a few considerations and potential approaches to handle this situation:
      Selective Transformation:
      Instead of applying a uniform transformation to the entire time series, consider selectively applying transformations to specific components. For example, you might difference the data only where it's necessary or apply different transformations to different seasonal components.
      Partial Transformation:
      Rather than making the entire time series stationary, consider transforming only certain parts of it. For instance, you might apply differencing or another transformation to the trend component while leaving the seasonal component untouched.
      Different Models for Different Components:
      If your time series exhibits both trend and seasonality, you might consider using models that can handle each component separately. Seasonal decomposition of time series (STL) is one such approach where the time series is decomposed into trend, seasonal, and residual components, and each can be modeled independently.
      Advanced Models:
      Explore advanced models that can handle non-stationary data more effectively. Long Short-Term Memory (LSTM) networks and other recurrent neural networks (RNNs) are known for their ability to capture temporal dependencies in data.
      Ensemble Approaches:
      Combine predictions from models trained on the original data and models trained on the transformed data. Ensemble methods can sometimes capture the strengths of different models.
      Grid Search and Cross-Validation:
      Systematically experiment with different combinations of transformations and models. Use grid search and cross-validation to evaluate the performance of various configurations and find the optimal solution.
      It's worth noting that the ideal approach can vary depending on the specific characteristics of your time series data. Experimentation and a deep understanding of the data's behavior are key. If possible, consider consulting with domain experts or seeking feedback from colleagues who have experience with similar time series patterns.
      Remember that achieving stationarity is a means to an end (better model performance), and the goal is to strike a balance that preserves the essential characteristics of the data while making it amenable to modeling.

    • @mattsamelson4975
      @mattsamelson4975 Před 7 měsíci

      Thanks for your detailed reply. How do you conduct a partial transformation? for example, do I difference only a section of the source data that I’m training the model on? How would I even then reverse transform predictions?

  • @MrDevnandan
    @MrDevnandan Před 3 měsíci

    Did you mistakenly plot the PACF of airP['arimaPred'] at time stamp - 1:15:52 ?
    I am not sure why you would plot PACF of predicted values. 😕

    • @srinivasreddy8134
      @srinivasreddy8134 Před 3 měsíci

      For that airP['12diff'] we have to take, as it is seasonal difference

  • @ismailhosni7760
    @ismailhosni7760 Před rokem +1

    Hellow Dr thanks a lot for sharing the information and teanch us .
    I have a little question with your permission
    the question is : if we estimate our model "ARIMA" and found that there is autocorolation between the riseduals the the model ...... how can we fix this problem ?
    thanks again 🤗🙏🙏🧡❤

    • @learnerea
      @learnerea  Před rokem +1

      There are several potential approaches you can take if you find autocorrelation in the residuals of your ARIMA model. Here are a few options you could consider:
      Adding additional AR or MA terms to the model: If the autocorrelation is due to a pattern that has not been captured by the current model, adding additional terms may help to capture this pattern and improve model performance.
      Differencing the data: If the autocorrelation is due to a trend in the data, differencing the data may help to remove this trend and improve model performance.
      Using a different model: If the ARIMA model is not suitable for the data, you may need to consider using a different model altogether. For example, a seasonal ARIMA (SARIMA) model may be more appropriate for data with seasonal patterns.
      Modeling the residuals: If none of the above approaches work, you can try modeling the residuals as a separate time series. This can help to capture any remaining patterns in the data that are not accounted for by the primary model.

    • @ismailhosni7760
      @ismailhosni7760 Před rokem

      @@learnerea 🥰🥰🥰🥰🥰❤❤🧡💛 thanks a lot 🙏🙏

  • @saurabharbal2684
    @saurabharbal2684 Před rokem +1

    Hello sir,
    I don't know whats your mistake
    But i got desired results using arima model at time 1:13;45
    Instead of the line at the bottom i got desired results.
    And I followed all things teached by you.

  • @user-lh4wg2zm4z
    @user-lh4wg2zm4z Před rokem

    Hi, you did not upload a video where stationery data was used.

  • @anghulingalolop3630
    @anghulingalolop3630 Před 6 měsíci

    can you the forecast this?

  • @Cs11-CanhNau
    @Cs11-CanhNau Před 2 měsíci

    The original data series is not a stationary series yet, I see you have done some way to convert it to a stationary series. But why do you use the initial data when training the model when it is not a stationary sequence?

    • @NutritionandMetabolism-uq5kf
      @NutritionandMetabolism-uq5kf Před 2 měsíci

      I have the same query as well. I can understand the section on checking on stationarity, but I don't see how that's getting incorporated into the subsequent training and model fitting. If the original dataset can be used for training rather than the transformed dataset, what's the use of determining if the data is stationary or not? Did I miss something ? Otherwise, excellent video, clearly explained. Would be interested to see videos on Time Series Analysis using other models such as XGBoost, Prophet. Thank you sir.

  • @esranurgunay1776
    @esranurgunay1776 Před rokem +1

    if we were not use the stationarity stuffs, why we calculated them?

    • @learnerea
      @learnerea  Před rokem +1

      Being the Data Scientist, you gotta explore all the posibbilities..
      as explained in the video as well... the decision was taken basis on analysis where it was observed that it won't perform better comparatively and it has also been suggested, that we will try making another video where we utilize the stationary data to see the how it performs..
      As a learner, your question make sense.. keep asking the questions for clarity

    • @user-gp8ww1xf3e
      @user-gp8ww1xf3e Před 7 měsíci

      i was wondering the same

  • @user-xn7lm9to1y
    @user-xn7lm9to1y Před 9 měsíci

    Suppose month attribute is missing you only have year attribute in that case how can u make data stationary,can you explain please I mean u only have year and passenger attribute in that case how to make the data stationary.Please reply

    • @learnerea
      @learnerea  Před 9 měsíci

      Stationarity can be on year basis as well..
      When you're dealing with time series data that only has a yearly frequency, the approach to making the data stationary is similar to what you'd do with more frequent data, but with some specifics to consider.
      Visualizing the Data:
      Start by plotting the data. This will give you an idea of the overall trend, seasonality, and variance. Since the data is yearly, you might not observe any distinct seasonality.
      python code -
      import matplotlib.pyplot as plt
      plt.plot(year, passenger)
      plt.xlabel('Year')
      plt.ylabel('Passenger')
      plt.title('Yearly Passenger Count')
      plt.show()
      Differencing:
      A common approach to making time series data stationary is by differencing the data. Differencing helps to remove trends in the data. You subtract the previous year's observation from the current year's observation.
      python code-
      passenger_diff = passenger.diff().dropna()
      After differencing, plot the data again to see if it appears more stationary.
      Checking for Stationarity:
      The Augmented Dickey-Fuller test is commonly used to check the stationarity of a time series.
      python code -
      from statsmodels.tsa.stattools import adfuller
      result = adfuller(passenger_diff)
      print('ADF Statistic:', result[0])
      print('p-value:', result[1])
      A low p-value (typically ≤ 0.05) indicates that the time series is stationary.
      Transformations:
      If differencing isn't enough, consider other transformations like:
      Log transformation: To stabilize variance.
      python code
      import numpy as np
      passenger_log = np.log(passenger)
      Rolling means: To smooth out short-term fluctuations and highlight longer-term trends.
      python code -
      rolling_mean = passenger.rolling(window=5).mean() # 5-year window as an example
      passenger_detrended = passenger - rolling_mean
      passenger_detrended.dropna(inplace=True)
      Decomposition:
      Even though the data is yearly, if you suspect any seasonality or a strong trend, you can use decomposition. The Seasonal Decomposition of Time Series (STL) from the statsmodels library can be useful.
      python code -
      from statsmodels.tsa.seasonal import STL
      stl = STL(passenger, seasonal=13)
      result = stl.fit()
      detrended = result.trend
      deseasonalized = result.seasonal
      You can then work with the residuals from the decomposition process, which should ideally be stationary.

  • @amazonamazon6510
    @amazonamazon6510 Před 8 měsíci

    How to approach forecasting with he lockdown data?

    • @learnerea
      @learnerea  Před 8 měsíci

      That's an excellent problem statement to choose, little bit of more detail which you might have provided is -
      >> what sort of model you want to develop
      >> what is the main purpose/scope of the model etc.
      lets assume that you want to build a credit risk model and the data which you are taking under consideration, includes the covid period as well. (Before I start, make sure that the data is in relatively balanced quantity & period). Below are the approach which you can undertake -
      Data Collection:
      Gather historical credit risk data, including loan performance, defaults, delinquencies, and relevant economic indicators.
      Include data specific to the COVID-19 period, such as unemployment rates, government stimulus programs, and financial relief measures.
      Data Preprocessing:
      Clean and preprocess the data by addressing missing values, outliers, and data inconsistencies.
      Create relevant features, such as lagged values of credit risk indicators and economic variables, to capture time dependencies.
      Exploratory Data Analysis (EDA):
      Perform EDA to understand the data's characteristics and relationships.
      Explore trends, seasonality, and patterns, paying specific attention to changes during the COVID-19 period.
      Define the Target Variable:
      Define the credit risk metric you want to predict, such as default probability or loan delinquency.
      Feature Selection:
      Identify relevant features that may influence credit risk. This includes economic indicators, loan characteristics, borrower information, and external factors.
      Time Series Decomposition:
      Decompose the time series data to understand underlying trends, seasonality, and residuals, considering the effects of COVID-19.
      Create a Historical Train-Test Split:
      Split the data into training and testing sets, ensuring that the testing set includes the COVID-19 period.
      Model Selection:
      Choose a suitable forecasting model. In this case, time series models like ARIMA, SARIMA, or Prophet may be appropriate.
      Consider using machine learning models like Gradient Boosting, Random Forest, or LSTM if you have sufficient data.
      Model Training:
      Train the selected model on the historical data, excluding the testing period.
      Model Validation:
      Evaluate the model's performance using the testing data, specifically during the COVID-19 period.
      Use appropriate evaluation metrics, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or classification metrics for binary outcomes.
      Model Interpretation:
      Interpret the model's predictions to understand which factors contribute to credit risk during the COVID-19 period.
      Feature Importance:
      Analyze feature importance to identify key drivers of credit risk during the pandemic.
      Model Refinement:
      Fine-tune the model and hyperparameters if the initial model's performance is suboptimal.
      Scenario Analysis:
      Conduct scenario analysis to assess credit risk under different economic conditions related to COVID-19, such as varying levels of unemployment or government interventions.
      Model Deployment:
      Deploy the trained model for ongoing credit risk assessment and predictions.
      Monitoring and Feedback Loop:
      Continuously monitor the model's performance and retrain it as new data becomes available.
      Regulatory Compliance:
      Ensure that your credit risk model complies with regulatory requirements and standards relevant to your industry.
      Documentation:
      Document the entire modeling process, including data sources, preprocessing steps, model selection, and evaluation metrics.
      Keep in mind that the unique challenges posed by the COVID-19 pandemic may require you to adapt your model and data sources to reflect changing economic conditions and government policies. Regularly update and refine your credit risk prediction model to account for these dynamics.

  • @user-gp8ww1xf3e
    @user-gp8ww1xf3e Před 7 měsíci

    Hi, i cannot find the data set, could you help me please! =D

    • @learnerea
      @learnerea  Před 7 měsíci

      the dataset is part of seaborn library.. you can just run the code -
      import seaborn as sns
      df = sns.load_dataset('flights')
      you can also download the notebook github link provided in the description

  • @user-me1gh3ki4n
    @user-me1gh3ki4n Před 10 měsíci +1

    What does diff(12) mean

    • @learnerea
      @learnerea  Před 10 měsíci

      diff computes the difference of a set of values, essentially subtracting each value from the subsequent value in an array or list, if can provide the timestamp here, will be able to give you the specific guidence

  • @siddharthakar9369
    @siddharthakar9369 Před 2 měsíci

    Where is the dataset ?

  • @Devra380
    @Devra380 Před rokem

    But sir the new statsmodels seems to have different functions

    • @learnerea
      @learnerea  Před rokem

      you can mention the function name which has been used in the video from statsmodel but you do not find them in the model now..
      we will try to find and help you with closest alternative function if that doesn't exist

    • @Devra380
      @Devra380 Před rokem

      ​@@learnerea​​@learnerea can you make a new video on implementation of arima.. On share market dataset or weather dataset

  • @arnabmodak3377
    @arnabmodak3377 Před 5 měsíci

    ARIMA Model Building starts here: 56:47

  • @saniyashahin-zp6oz
    @saniyashahin-zp6oz Před 8 měsíci

    share your python notebook sir @Learnerea

    • @learnerea
      @learnerea  Před 8 měsíci

      Here you go -
      github.com/LEARNEREA/Data_Science/blob/main/Scripts/time_series_air_passengers.py

  • @user-mt2wx9ir8i
    @user-mt2wx9ir8i Před 4 měsíci

    my data is the form of year, week

  • @meronika1400
    @meronika1400 Před 11 měsíci

    Can you share this jupyter notebook with me?
    via mail

    • @learnerea
      @learnerea  Před 11 měsíci +1

      Hi Meronika,
      you can find that using -
      file name - time_series_air_passengers.py
      url - github.com/LEARNEREA/Data_Science/tree/main/Scripts

  • @micahdelaurentis6551
    @micahdelaurentis6551 Před 2 měsíci

    the D parameter is the number of differences you take on your data, which is not what you said. This is as basic as it gets man, come on