Build your first pipeline DAG | Apache airflow for beginners

Sdílet
Vložit
  • čas přidán 1. 06. 2024
  • #apacheairflow #airflowforbeginners #datapipeline #etl #etlpipeline #airflow2
    Welcome to series Apache airflow for beginners. In this video, you'll learn how to build your first pipeline DAG in Apache Airflow. As a Data Engineer, you will be tasked to write a data aggregation and cleaning pipeline. We are going to build an ETL pipeline as part of Airflow DAG with a real-world example of Hotel Reservation Records.
    Read at maxcotec.com/learning/apache-...
    ---------------------------------------------------
    Get complete code from here;
    github.com/maxcotec/Apache-Ai...
    Watch next video
    Airflow Macros
    • Use Apache Airflow Mac...
    Watch the previous video here:
    Run Airflow locally via Docker
    • Run Airflow 2.0 via Do...
    --------------------------------------------------------
    Timeline
    00:00 intro
    00:13 definition
    00:40 DAG building blocks
    01:39 Example - ingestion pipeline overview
    03:05 Lets Code DAG
    07:53 Lets deploy DAG
    09:29 Examine DAG output
    10:25 Ingest more data
    11:17 Outro
    --------------------------------------------------------
    Learn Something New Below:
    maxcotec.com/blog/is-a-stunde...
    maxcotec.com/blog/pv-center-f...
    👍 If this video was helpful to you, please don't forget to thumbs-up,
    ⌨️ leave a comment and
    🔗 share with your friends.
    ⭐ Support our work by hitting that subscribe button.
    🔔 hit bell icon to stay alert on more upcoming interesting stuff.
    Stay tuned !

Komentáře • 32

  • @MrMegabeat
    @MrMegabeat Před rokem +2

    My best 10mins investment in the morning! 🎉

  • @demohub
    @demohub Před rokem +3

    This video has definitely given a better understanding of Airflow and now have some ideas on how to use it more effectively for projects.

  • @muhammedalbayati
    @muhammedalbayati Před 2 lety +1

    Thanks a lot. Very good tutorials. Thumbs-up and subscribe

  • @najmuddin7506
    @najmuddin7506 Před rokem +1

    Thanks for the tutorial! However, could explain what the difference is between running this workflow as an airflow DAG and simply running the program and calling both functions sequentially?

    • @maxcoteclearning
      @maxcoteclearning  Před rokem

      Speaking about this single and simple workflow, its really easy to manage, so you may not need airflow. But it becomes really hard to manage and maintain if you have 100+ complex workflows. Watch the first video of this series, where I am explaining what problem airflow solves (czcams.com/video/56GDKurqhCo/video.html).

  • @parikshitchavan2211
    @parikshitchavan2211 Před rokem

    Hello Thanks for such a great tutorial everting you made smooth like butter ,just one question whenever we made new DAG we will have to add (docker-compose-CeleryExecutor, docker-compose-LocalExecutor, and Config for that particular DAG )??

    • @maxcoteclearning
      @maxcoteclearning  Před rokem

      Thanks Parikshit. Only one executor can be used at a time. You can add multiple dags, while keeping single executor with same config file.

  • @downeytang7006
    @downeytang7006 Před 2 lety +2

    a quick question, if you have two different date format from two csv files, and after performing concat, is there a way to unify the date format, for example '2021-01-13' and '01-20-2022'

    • @maxcoteclearning
      @maxcoteclearning  Před 2 lety

      Thanks for watching. have a look at dateutil.parser. Checkout more answers here stackoverflow.com/a/40800072/5167801

  • @victoriwuoha3081
    @victoriwuoha3081 Před 2 lety +1

    @MaxcoTec Please do you have any resource on how I can read data from an API and perform some similar processing & finally write to a destination SQL server. I'll be grateful if you could advise.

    • @maxcoteclearning
      @maxcoteclearning  Před 2 lety

      You can use most popular python library requests (docs.python-requests.org/en/latest/) to fetch data from any API. They have good examples under Quickstart section. hope that helps :)

    • @victoriwuoha3081
      @victoriwuoha3081 Před 2 lety

      @@maxcoteclearning Thank You, I'll try that out

  • @nghianguyen9439
    @nghianguyen9439 Před 2 lety +1

    Thank for very good videos. Can you help me to give some instructions about an example data pipeline in Mongodb?

    • @maxcoteclearning
      @maxcoteclearning  Před 2 lety

      You welcome. Sure, can you explain more about your pipeline. Whats the data flow (source/destination). are you persisting data into mondoDB ? or extracting out of it ? Have you looked at this
      github.com/airflow-plugins/mongo_plugin/blob/master/operators/s3_to_mongo_operator.py

    • @nghianguyen9439
      @nghianguyen9439 Před 2 lety

      @@maxcoteclearning I am trying to do a data sync between 2 separate MongDB or simply read a csv file then import to MongoDB

  • @FedericoLov
    @FedericoLov Před rokem

    good video but it seems that the actual transformations are done in pandas while airflow only provides a layer of logging and task scheduling

    • @maxcoteclearning
      @maxcoteclearning  Před rokem +1

      Thats true. Airflow is a workflow management tool. I've just used a simple ETL operation to show how it can be deployed and managed using Airflow.

  • @ammadkhan4687
    @ammadkhan4687 Před 2 měsíci

    how can I access airflow container when I am hosting to another server and add more dags?

    • @maxcoteclearning
      @maxcoteclearning  Před 2 měsíci

      Hi Ammad, could you explain what does 'when I am hosting to another server ' means?

    • @ammadkhan4687
      @ammadkhan4687 Před měsícem +1

      @@maxcoteclearning suppose I have a docker hosting server. I am connecting to this server remotely. how can we as a team create more dags to work on this hosted server? for example hosting docker container of airflow in azure cloud or on premise docker hosting server.

  • @riyasingh2515
    @riyasingh2515 Před 2 lety

    my task are getting failed in airflow UI can u tell why it is happening, so I copied all your code properly

    • @maxcoteclearning
      @maxcoteclearning  Před 2 lety

      Hi Riya, May I know what exact errors are you getting ?

    • @diptimanraichaudhuri6477
      @diptimanraichaudhuri6477 Před 2 lety +1

      I was also getting a DAG failed initially from the Github code sample, turns out there is a variable "file_date_path" in transform_data method, which gets constructed from the op_args passed to the DAG. So, unless, the file is kept in the same folder hierarchy, the booking read will fail. So, please keep your booking .csv in the following hierarchy "raw_data//" and it will start working. O/wise, you can modify the code where it reads from that folder and just plainly read and write w/o dates in folder names.
      It is rare to find such a well-laid out series. Kudos Maxco Tec ! !

  • @muhammedalbayati
    @muhammedalbayati Před 2 lety

    Please how can save these CSV data to MS Sql server database?

    • @maxcoteclearning
      @maxcoteclearning  Před 2 lety +1

      It will be similar just the way we are loading data to sqlite database. pandas_df.to_sql("table_name", engine)

    • @muhammedalbayati
      @muhammedalbayati Před 2 lety

      @@maxcoteclearning Thanks

  • @orick92
    @orick92 Před rokem +4

    You should delete "beginners" title from the headline...

  • @TheVickramsharma
    @TheVickramsharma Před rokem +1

    Hi @MaxcoTec, i tried running this example and am getting this error: FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/raw_data/2022-12-13/5/booking.csv', could you please help me in this

    • @maxcoteclearning
      @maxcoteclearning  Před rokem

      Are you surely running code from rite branch ? This video code is not in main branch. Check this commit github.com/maxcotec/Apache-Airflow/tree/1787097721a8cec8999bdaee4c04a9f4bc0e1f71/DAG_ingestion_pipeline.

    • @prafulsoni9378
      @prafulsoni9378 Před rokem

      @@maxcoteclearning I'm also facing the same issue, I clone the branch and at the `DAG_ingestion_pipeline` I run `docker-compose up`

    • @prafulsoni9378
      @prafulsoni9378 Před rokem

      @@maxcoteclearning I am using Windows!