Airflow DockerOperator: The Basics (and more 🤫)

Sdílet
Vložit
  • čas přidán 12. 10. 2021
  • Airflow DockerOperator: The Basics (and more 🤫)
    👍 Smash the like button to become an Airflow Super Hero!
    ❤️ Subscribe to my channel to become a master of Airflow
    🏆 BECOME A PRO: www.udemy.com/course/the-comp...
    🚨 My Patreon: / marclamberti
    The Airflow DockerOperator is a very powerful operator.
    It executes your task within a docker container. There are multiple advantages of using the DockerOperator such as:
    - Easier way to test your task
    - Control over the resources needed by your task
    - Avoid dependencies conflicts
    and more.
    Even if you shouldn't use ONLY the DockerOperator, knowing how it works and what you can with it will truly help to make more reliable data pipelines.
    Ready?
    Lets go!

Komentáře • 40

  • @dr_flunks
    @dr_flunks Před 2 lety

    it's actually super helpful that you display the folder/file structure of everything you're using as you go along. very well thought out!

  • @user-ep8sj9te3m
    @user-ep8sj9te3m Před 10 měsíci +2

    Docker Operator doesn't seem to work when I run airflow in docker containers (using a docker-compose) how can I fix this ?
    edit:
    the only solution i found was adding to volumes:
    - /var/run/docker.sock:/var/run/docker.sock
    and setting
    user: root
    instead of
    user: "${AIRFLOW_UID:-50000}:0"
    but apparently this isn't the safest way???
    Anyone got a cleaner way to fix this ?

  • @saritkumarsi4166
    @saritkumarsi4166 Před 2 lety

    Thanks Marc for the video on one of the operators I use extensively :)

  • @anthonyloganhall
    @anthonyloganhall Před 2 lety

    This is how we have our environment setup and it works very well.

  • @joshuabodyfelt1239
    @joshuabodyfelt1239 Před rokem

    Wonderful job Marc! If I could amend this - would be awesome to have a followup video discussing the variety of different Docker registries, and how to connect to them with the Docker Connection.

  • @abhishekacharya5069
    @abhishekacharya5069 Před 2 lety

    Hi Marc, Thanks for the video and it really helped me in understanding airflow.
    Actually I'm trying to pull the docker image by using dockeroperator on Apache Airflow. But wheneever I trigger the dag again and again, it pulls the image from docker hub. Can I save the docker image somewhere , so that whenever the dag is triggered it should not pull from docker hub.
    It will be very helpful, if you help me with this. Thanks

  • @jagadishlucky1793
    @jagadishlucky1793 Před 2 lety

    Hi Marc, thanks for the videos it really helped me to understand airflow effectively. Actually, Iam trying to generate dynamicness in tasks creation. Based on the config parameter from UI the dag has to run the tasks. For Ex: If conf parameter has t1,t2 as true, there should be two tasks running. And if I increase the tasks in config params(t1,t2,t3,t4) it should run that many tasks. I tried multiple approached using operators,its not happening. Can you please suggest an approach? ---> Thank you

  • @Ayush_1908
    @Ayush_1908 Před rokem

    Hi Marc, is it possible to use dockeroperator for running java code on airflow? Or any better option?

  • @aerobot6571
    @aerobot6571 Před rokem

    Merci Marc, premiere video de toi que je regarde : c'est clair, utile et complet. Ben je vais voir les autres ;D

  • @lokeshkumar1365
    @lokeshkumar1365 Před 2 lety

    Could you make video on best practices for kubernetesexecuter on k8s deployment and different tasks can run parallel?

  • @data-freelancer
    @data-freelancer Před 2 měsíci

    Hi sir, can this work on production like cloud composer?

  • @kimted3272
    @kimted3272 Před 2 lety

    hello Marc, thinking of listening yout lectures on udemy. are there any lectures that covers kuberenetesPodOperator? think operartor reference is the closest, but asking just in case if u already have a video. thanks :)

  • @yuricastro522
    @yuricastro522 Před 11 měsíci

    If I'm using an airflow container to call another container, how can I mount volumes generated inside the airflow container to the other ? I'm getting errors trying this with this source parameter

  • @user-ep8sj9te3m
    @user-ep8sj9te3m Před 10 měsíci

    when running a DockerOperator on Airflow running in a Docker Container, the mounts have to be between the DockerOperator and the actual Host machine.. is there any way to avoid this? Can we create mounts betweeen the Airflow Container and the DockerOperator Container ?

  • @user-pr2kr1ts9i
    @user-pr2kr1ts9i Před rokem +1

    getting this error requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

  • @mbkhan1000
    @mbkhan1000 Před rokem

    I understand we can use the templating to pass variables/xcoms/connections to env variables in the docket container, but is there anyway to push values to xcomms from within a docket container? I understand that the process running in the container is isolated from airflow (unless it connects through the rest api?)

    • @gregh6586
      @gregh6586 Před rokem

      Why can't you use `retrieve_output_path`? What exactly are you trying to do?

    • @mbkhan1000
      @mbkhan1000 Před rokem

      @@gregh6586 trying to know if we can push to xcoms within a Docker operator task

  • @user-ow4dv3cp8p
    @user-ow4dv3cp8p Před 8 měsíci +1

    I am trying to use DockerOperator but ger en error: PermissionError(13, 'Permission denied')
    I shoul set chmod 666 /var/run/docker.sock to avoid it.
    Can i use DockerOperator with chmod 660 /var/run/docker.sock?

    • @user-ot8bh3xm9j
      @user-ot8bh3xm9j Před 2 měsíci

      Please tell me, have you solved this problem?
      I have the same problem

  • @736939
    @736939 Před rokem

    How to send data via XCom from DockerOperator? It there any better way than just print the values?
    how to run PythonOperator like script from DockerOperator let's say I want to run not the whole file but the function inside the file, how to do it via DockerOperator?

  • @emanuelgiannattasio3366

    Marc, in your opinion, in which cases would it be convenient to use DockerOperator over PythonVirtualenvOperator?

    • @MarcLamberti
      @MarcLamberti  Před rokem +1

      IMHO the dockeroperator is great as your run a docker image so you can encapsulate your task in it. That can help for testing and versioning.
      Otherwise, go with the python virtual env operator

  • @vaib5917
    @vaib5917 Před rokem

    Hi, I really need to know if we put the python script into a container and run it using DockerOperator, how can we pass the values of Variables from AirFlow Admin UI t the container ?? Please help.

    • @mbkhan1000
      @mbkhan1000 Před rokem

      Templating into docker container environment variables

  • @eduardocarrerah3704
    @eduardocarrerah3704 Před 2 lety

    is this a replacement for k8soperator?

  • @ReenanOFC
    @ReenanOFC Před 2 lety

    Is it possible to set different schedules based on tasks?

    • @MarcLamberti
      @MarcLamberti  Před 2 lety

      Nop

    • @danielpapukchiev3754
      @danielpapukchiev3754 Před 2 lety

      split in multiple DAGs

    • @trench6118
      @trench6118 Před 2 lety +1

      With a branchpythonoperator you can - for example, I have some tasks which have changed from daily to hourly within a client DAG (all sources for that client are in the same DAG). What I did was add a function to check the execution_date.hour and if it was a certain time, I would return all extract task IDs. Otherwise, I would return only the hourly task IDs. The result is that my daily tasks are skipped each hour unless it is say 13:00 UTC, and my hourly tasks run each hour. It makes the DAG a bit messy though because of so many skipped tasks

  • @vladdank9158
    @vladdank9158 Před rokem

    Anyone figure out how to get this to work with Airflow itself running on Docker?
    Kind of lost. It's mentioned in the video around 8:03.
    I'm on Windows so it's kind of horrible LOL

    • @user-ep8sj9te3m
      @user-ep8sj9te3m Před 10 měsíci +1

      the only solution i found was adding to volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      and setting
      user: root
      instead of
      user: "${AIRFLOW_UID:-50000}:0"
      but apparently this isn't the safest way???
      Anyone got a cleaner way to fix this ?

  • @PrakashReddyK
    @PrakashReddyK Před 2 lety

    Hi 👋

  • @Reidloveslions
    @Reidloveslions Před 2 lety +1

    I know this is 6+ months after this has been posted but I think your teach would be a bit more effective if you took the time with your hand writing. If I saw the diagram at 3:25 after listening to you talk about it, I'd have a hard time understanding what it means. Just wanted to provide a helpful tip!

  • @Klayhamn
    @Klayhamn Před 2 lety

    space at the end of the string as actual functionality determiner?
    who the hell designed that bullshit?