Running Airflow 2.0 with Docker in 5 mins
VloĆŸit
- Äas pĆidĂĄn 1. 02. 2021
- Airflow 2.0 is out! How it works, what are the new features, what can do with your DAGs, to answer all those questions, you need to run Airflow 2.0.
What is the easiest and fastest way to do it?
By using Docker!
Let's discover how to run Apache Airflow 2.0 with the CeleryExecutor locally by using Docker!
đ Smash the like button to become an Airflow Super Hero!
â€ïž Subscribe to my channel to become a master of Airflow
đ Take my course : www.udemy.com/course/the-ulti... to join the legends of Airflow
đš My Patreon: / marclamberti to support my work and be friend for life
The docker-compose file:
airflow.apache.org/docs/apach...
Wow, its amazing how far the Airflow team has come with this. Thanks Marc!
Thanks Mike
arrgh .. spent about 3h trying to figure this out, basically all the online-instructions missed one small bit or another ... with your instructions, le voila, it works straight up. Thanks a lot!
I struggled to get airflow running for a long time and this short video helped me SO MUCH, thank you!!
Happy to help!
I finally got it up and running! Thank you, Marc!
Thank you Marc. I was in hurry to find out how to run airflow and kept failed somehow.
However with your nice clear explanation, nothing is mysterious anymore~
The only of many tutorial that actually helped! Thanks
Awesome! Thanks for the tutorial, Marc!
Fabulous! Thanks Marc! I have installed Airflow2.0 successfully. The webserver was failed to start in my MAC. But after I have increased the memory to 4GB ...it works.
how did u increase your mac memory
Thanks marc to sort the installation part of airflowđđđđ
Thank you so much, Marc. Great content!
TIL Using YAML aliases and
Love it too đ
This instruction really helps me thank you so much !
I always try to find a docker image to perform experiments.
Thanks for providing a reference that I can refer anytime in future.
Here it is đ
Thank you! very efficiently and clearly explained !
Hi Marc, great video. Just wondering if you could show us how to install a triggerer into your airflow stack using docker compose? Thanks!
Thank you VERY MUCH!! Marc. This video is very useful.
this is so much fun and informative, thanks.
Thanks Marc, It helped me a lot!
Awesome Marc, thanks for sharing
This is an awesome tutorial ! Thanks a lot~
super
amazing job
thank u!
Great video Marc. It's sort of crazy how easy that was (even on WSL2)... Thank you
Some things I'm still considering afterwards:
1) Is this enough for a production deployment of Airflow if the database was decoupled from the rest of the container? If the container crashed for whatever reason all of the connections would be lost, so separating is a good idea.
3) For local testing/debugging of an instance I'm going to try and mount DAGS that exist in another project folder instead of the one that we created.
3b) For local testing I might also try and store connection details in environment variables in the .env file rather than relying on the persistence of the database.
Docker compose is not enough for production, but you can take the same components (containerized) and use kubernetes to go to production
Thanks Marc ! awesome tutorial.
Thank you đ
Marc, Big fan of your content. Can you make a video for deploying Airflow 2.0 (with Celery executor) on Azure Containers?
Great job boy. keep it up.â€
Thanks Marc ! Great work
My pleasure!
excellent walkthrough,,, Thanks :)
I love this!!! thanks man!
Thanks, simple and clear ;)
This is super useful... Thank you. One question: Can I use it on an AWS instance. How should I configure the security group and firewalls.
Hi Marc, thanks a lot for this!! :)
pleasure :)
super video Marc je m'abonne !
đđđđ
Thanks Marc for this video. Question: Do I need to run everytime I spin the containers?
Thanks for the video!
VERY EASY TO UNDERSTAND
Thaaaaanks, i have been having issues with running airflow and now it worked!!! Ill now be able to automate tasks and be lazier lol,
Letâs gooooo
For anyone coming here from the 2024's and beyond, in Linux, specifically Ubuntu, remember to use: `docker compose up init-airflow`
Amazing! Thank you very much!
Glad you like it!
thanks. you help me a lot.
Awesome man!
Great! but where can i locate the requirements.txt to add for example the apache-airflow-providers-snowflake?
Thanks for the wanderful tutorial. I understood that the DAG file I stored in DAG folder will be added to Airflow. But what happen if Airflow is running in remote docker that I only have web access? I can upload DAG from my local disk to remote? Or is there other way to do it?
Hi Marc, thanks for your sharing! I'd like to know how to install third-party modules. When I installed yfinance module, there was a dag import error : no module named yfinance.
Thanks a lot Marc.
Can you also please make a video for deploying Airflow 2 using helm chart? and go over the options on values.yaml file?
Thanks in advance
Coming but there are some issues right now with the Helm chart đŹ
@@MarcLamberti Thanks!
First of all, thank you so much for this *awesome* video. It is really helpful. I followed this tutorial and was able to access AirFlow seamlessly. But I want to have apache-airflow-providers operators. So, I tried giving them in _PIP_ADDITIONAL_REQUIREMENTS and also building using Dockerfile. But nothing worked and I still see "error: command 'gcc' failed with exit status 1". I changed airflow image to 2.1.2-python3.7 as slim versions don't include extra libraries. But no luck. Could you help me resolve this issue?
Hello Marc thanks for the videos it's great, I have a question for you
how can we version the dag in production ?
Right now, the only way is to change the dag id with the version. For example, my_dag_v1.0.0, my_dag_v1.0.1 and so on. DAG versioning is coming soon but not yet available
Hello Marc, Your videos are always great and helpful and with your video I get Airflow running quite well. The only trouble is that I need to run java within docker and I have not found any good description of how to get this working. I am starting a shell script that starts a java runtime within the terminal. Could you give me some help on how to get this running? Thanks, Armin
Hi Marc, great tutorial. Airflow is running w/o Problems. I tried to use vs code with airflow and found your new video "Configure VS Code to Develop Airflow DAGs with Docker at ease!" However, I don't understand where the Dockerfile come into the picture. Can you please elaborate! ---> Reopen in Container looks totally different as in your video. Thanks
Great tutorial Old but relevant. Thanks! Marc, I am using Visual Studio Code and everytime I want to save my dag file, I need click a button "Retry as Sudo". Can you tell me what to do here... it is quite annyoing! Regards!
THANK YOU
Thank you so much!
It works on Windows and Mac, I tried both and it works, thanks (on Windows with some tricks)
Please can you share the tricks on windows..I tried on windows its not working for me. Please do reply will be very helpful
@@anjanashetty482 for windows use the wsl tool to run the commands described in the video
@@kikecastor Thanks for your response Armonia. I was able to install airflow 2 with wsl but when I create a dag and try to debug in VS I am getting error : ModuleNotFoundError: No module named 'airflow'
@@anjanashetty482 are you in the correct environment?
@@kikecastor Yes I am, do I have to explicitly do pip install apache-airflow
Hello Marc, I have recently installed airflow 2 using docker compose file as suggested in this video. But, when I enhanced the dag with mutiple connections i.e., Gdrive->S3, S3->Snowflake,Snowflake->S3 operations using pyspark and sql scripts, the webserver keeps restarting and at times shows unhealthy. Can you please suggest or advice what could have gone wrong or should I consider increasing docker memory?
neat & clean! thanks!!
Thank you â€ïž
It was great video thanks.
How would i push my custom airflow python file into docker container?
Hello sir, how can we launch every task of etl in a different container as we do via k8s pod operator to launch every task of dag in a different pod?
Thank you for the awesome tutorial. I do have one question though: how can I install python packages with docker-compose when creating the containers? for example I would like to install Pymongo.
Hi, I would recommend use PythonVirtualenvOperator
@@ramsescoraspe yes, but how do I install a python library like PyMongo, or OpenCV in the container? PythonVirtualenvOperator allows for functions/methods to be created including the module imports they need and then they are destroyed, but I do not have those modules installed in the container. Until now, each time I installed Python modules in containers I did it with a help of a Dockerfile (e.g. inside the Dockerfile I enter "RUN pip install opencv-python") but it is not clear to me how to do the same using a docker-compose.yaml file.
Edit: figured it out: I had to add a pointer to the Dockerfile in the docker-compose.yaml
@@derzemel hi, how do you do this ? (add a pointer)
@ in my case, the airflow webserver service is build like this (the Dockerfile is in the same dir as the compose):
airflow-webserver:
build:
context: .
dockerfile: Dockerfile
â@@derzemel do you know how can I add Airflow dependencies inside the docker-compose.yaml file? Also is there a way to provide access to my AWS resources, such as S3, either on the yaml file on the Airflow UI?
Kindly demonstrate on Teradata and keycloak containerisation
Hi Marc,
I have used docker compose to install airflow.
However, the sample dags seems not to work for me and I found no logs.
Is there available docker-compose with mysql?
Can u please share link if you have
Question: what's the recommended way to increase the number of celery workers using docker compose? Say from 2 workers to 10 workers? Copy&paste worker keys in docker compose yaml files?
No, use docker-compose up --scale airflow-worker=10 :)
@@MarcLamberti Thank you for your quick reply! TIL docker-compose up -scale!
Is this usable in production? Could you create a production setup?
Hi Marc
I installed docker desktop at windows using Ubuntu wsl. I changed the dags directory path in .yaml file to my c:\ drive folder in windows.
When I start web UI, it doesn't pick my dags.py file. what can be the issue.
Hi mark. I bought your course but I got an error trying to run the bash operador that insert data into the user table. I have tried the comand alone in the console and it works but when I used inside my dag in my bash operator I got this error bash command failed. The comand returned a non-zero exit code. I have tried a lot but I still can't found a solution for this
For those that have a Mac and install Docker Desktop, you will not need to install Compose separately. It comes with Docker Desktop
While running Airflow 2 via Docker Compose(Just like the above video),
I am unable to successfully execute DockerOperator tasks.
Can you enlighten with a video reference or doc reference about how to properly configure Airflow Docker compose file or Docker Operator to run tasks
Hi Mark! could I perform these steps without problem on a raspberry pi?
Great tutorial! Short and sweet. I followed the exact same steps and checked the containers status, redis and postgres were healthy but airflow-scheduler, flower, worker, webserver and triggerer were unhealthy then I deleted all the containers and repeated all the steps and now I'm getting error as "database "airflow" does not exist". Redis and postgres containers are running without any problem. I would appreciate if you can help me understand the error. Thanks.
hi, did you ever resolve the problem? I am having the same issues. Thanks
Hi guys, I got this problem same as you. I am operating in Windows 10. Instead of applying "echo -e....." command, I created a .env file on same directory as .yaml fileAIRFLOW_UID=50000 in it. Problem was solved!
Any tips for this issue: AirflowException('Celery command failed on host:
I had to increase the amount of RAM available to Docker to 6GB for this to work on my Mac. Also had to enable permissions to the folder i worked in with CHMOD.
Hi, could you please let me know how you enabled permissions using CHMOD?
I keep getting the following error: "OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "version": executable file not found in $PATH: unknown"
How can I install required packages to docker or how can I mentioned required packages in .yaml file
When i followed the steps and installing, I am getting the error " manifest file not found". I have seen this error is reported by others as well; I changed the image to 2.0.1 and it worked by later I get an error about old version used.
When I got to the docker-compose up airflow-init, I get the following error:
"Python-dotenv could not parse statement starting at line 1
Traceback (most recent call last):
File "docker\api\client.py", line 214, in _retrieve_server_version
File "docker\api\daemon.py", line 181, in version
File "docker\utils\decorators.py", line 46, in inner
"
A few dozen more error lines afterwards, but I can't make it work so far
How do we install providers after installing airflow on docker
where to place my custom Python file into the docker container?
Thanks BTW for good video
Localhost:8080 aint opening for me. How to check the logs for any issues?
How can we add and setup airflow.cfg file inside project folder?
I have many problems using PythonVirtualenvOperator or ExternalPythonOperator inside docker because you must include system site packages as True (it creates conflicts between venv and base python libraries) or otherwise you will get "ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv"
anyone know why I can't access the installed airflow docker-compose in ec2 instance via browser?
I have installed airflow using docker-compose in ec2 instance, all containers running, I have set inbound rules security group TCP 8080 port to be accessible. But when I open ec2dnsaddress:8080 on the browser, it shows This site can't be reached.
I have check it also in docker-compose logs airflow-webserver, it doesn't capture access from the outside and it only logs healthcheck
Nice đ
can pleaseeeeeeeeeeeeeeeeeeeeeeeeeee post a video on how to install databricks connection type in airflow 2.0.1
For my Mac with M1 chip: I had to increase the amount of RAM available to Docker to 8GB, and swap to 2GB's.
I got this error "Error response from daemon: manifest for apache/airflow:2.6.0.dev0 not found: manifest unknown: manifest unknow" after I typied "docker-compose up airflow-init"
Why?
I don't know why but it is giving me some python error when i am executing docker compose up airflow-init.
Any suggestion ?
where do I need to write the command at 2:07?
I installed everything and could not open the localhost:8080. Tried many times and safari said that "safari cannot open the page. The server dropped the connection. This happens when the server is busy" why does that happen?
Can someone explain to me why we are running "docker-compose up airflow-init" and then "docker-compose up"?
Very cool! Unfortunately, I got an error saying 'port 5555 is already allocated..' but I am pretty sure there is nothing on there. So, not sure what's going on.
You should already have something running on that port. You can change the port in docker compose file for flower
Thank you! Marc, got that fixed!
When I try to run the official 2.0.1 docker-compose.yaml file at airflow.apache.org/docs/apache-airflow/2.0.1/docker-compose.yaml on my Ubuntu 18.04 LTS I get the following error:
ERROR: The Compose file './docker-compose.yaml' is invalid because:
Invalid top-level property "x-airflow-common". Valid top-level sections for this Compose file are: services, version, networks, volumes, and extensions starting with "x-".
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see docs.docker.com/compose/compose-file/
services.airflow-init.depends_on contains an invalid type, it should be an array
services.airflow-scheduler.depends_on contains an invalid type, it should be an array
services.airflow-webserver.depends_on contains an invalid type, it should be an array
services.airflow-worker.depends_on contains an invalid type, it should be an array
services.flower.depends_on contains an invalid type, it should be an array
Changing the version to 3.4 removes the first error but I still get docker complaining about depends_on. How can I fix it? My docker-compose version is
docker-compose version 1.17.1, build unknown
docker-py version: 2.5.1
CPython version: 2.7.17
OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
while docker version is
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Dec 18 12:21:44 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Thu Dec 10 13:23:49 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.2
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
Thanks in advance,
Flavio
Hi, I'm facing the issue PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler/20201-03-16' when running "docker-compose up airflow-init" .
Any idea? Thanks
you can run the command with sudo (sudo docker-compose up airflow-init) but i want to know how to do without the sudo
@@DaniloPako Hi, I just solved it without using sudo. You just have to make sure all of the files you create like dags, logs, plugins and the folder you're currently in, the owner and the groupid of them are the user you've log in.
I followed the steps mentioned here, but getting no response from gunicorn master within 120 seconds and the webserver keeps getting restarted. Can anyone help with any lead here please?
Great video with details! I am following your steps but I constantly get WARNING - Exception when importing 'airflow.providers.microsoft.azure.hooks.wasb.WasbHook' from 'apache-airflow-providers-microsoft-azure' package: No module named 'azure.storage.blob'. when I did docker-compose up airflow-init.
I keep getting this as well. The all containers seem to run fine except the init container.
Anyone facing problem in getting the logs displayed in the UI? Clicking a task --> Log --> gives me a blank log frame. However, it allows me to download it to my machine.
getting error : Import "airflow" could not be resolved
while importing 'from airflow import DAG'
the only commands that worked for me was the mkdir one
everything else gave me an error
The only issue is that I can't import anything to my dag from other folders (not dag folder). I don't know why but I get a Import Error
Hi I'm a noob I'm using the same YAML file but after running the command "docker-compose up airflow-init" on my ubuntu machine I'm getting this error please help.
ERROR: The Compose file './docker-compose.yaml' is invalid because:
Invalid top-level property "x-airflow-common". Valid top-level sections for this Compose file are: services, version, networks, volumes, and extensions starting with "x-".
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see docs.docker.com/compose/compose-file/
services.airflow-init.depends_on contains an invalid type, it should be an array
services.airflow-scheduler.depends_on contains an invalid type, it should be an array
services.airflow-webserver.depends_on contains an invalid type, it should be an array
services.airflow-worker.depends_on contains an invalid type, it should be an array
services.flower.depends_on contains an invalid type, it should be an array
The container for the webserver seems to be restarting continuously every minute or so. Any idea why this may happen?
You must increase the RAM of docker and that is how it will work
The first time I run "docker-compose up airflow-init", everything is okay. But after I run "docker-compose down" and then I run "docker-compose up airflow-init" once again, I get the message "container for service "postgres" is unhealthy" and the airflow-init container fails. I have to run "docker-compose up airflow-init" once more time to start airflow-init container.
Does anyone get the same problem like me? Could you give me some advice to avoid this, please? Thanks all!
For me it does not work. Docker compose is creating path "./local" and I can not access it. Airflow can not read my DAGS. it is very frustrating. I have been installing airflow for 8th time and none of them worked...
I am importing by from airflow import DAG in a file inside my directory but vscode is unable to recognise airflow
Thatâs because Airflow runs in Docker. You need to connect your VSCode to Docker