Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 1

Sdílet
Vložit
  • čas přidán 11. 09. 2024
  • This is the part 1 of this Zillow data analytics end-to-end data engineering project.
    In this data engineering project, we will learn how to build and automate a python ETL process that would extract real estate properties data from Zillow Rapid API, loads it unto amazon s3 bucket which then triggers a series of lambda functions which then ultimately transforms the data, converts into a csv file format and load the data into another S3 bucket using Apache Airflow. Apache airflow will utilize an S3KeySensor operator to monitor if the transformed data has been uploaded into the aws S3 bucket before attempting to load the data into an amazon redshift.
    After the data is loaded into aws redshift, then we will connect amazon quicksight to the redshift cluster to then visualize the Zillow (rapid data) data.
    Apache Airflow is an open-source platform used for orchestrating and scheduling workflows of tasks and data pipelines. This project will entirely be carried out on AWS cloud platform.
    In this video I will show you how to install Apache airflow from scratch and schedule your ETL pipeline. I will also show you how to use sensor in your ETL pipeline. In addition, I will show you how to setup aws lambda function from scratch, set up aws redshift and aws quicksight.
    As this is a hands-on project, I highly encourage you to first watch the video in its entirety without typing along so that you can better understand the concepts and the workflows after which you should either try to replicate the example I showed without watching the video but consult the video when you are stuck or you could watch the video again the second time in its entirety while also typing along this time.
    Remember the best way to learn is by doing it yourself - Get your hands dirty!
    If you have any questions or comments, please leave them in the comment section below.
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    **************** Commands used in this video ****************
    sudo apt update
    sudo apt install python3-pip
    sudo apt install python3.10-venv
    python3 -m venv endtoendyoutube_venv
    source endtoendyoutube_venv/bin/activate
    pip install --upgrade awscli
    sudo pip install apache-airflow
    airflow standalone
    pip install apache-airflow-providers-amazon
    *Books I recommend*
    1. Grit: The Power of Passion and Perseverance amzn.to/3EZKSgb
    2. Think and Grow Rich!: The Original Version, Restored and Revised: amzn.to/3Q2K68s
    3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: amzn.to/3LLpXRy
    4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: amzn.to/48RbuOb
    5. Introducing Python: Modern Computing in Simple Packages amzn.to/3Q4driR
    6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: amzn.to/3rGF73G
    **************** USEFUL LINKS ****************
    How to remotely SSH (connect) Visual Studio Code to AWS EC2: • How to remotely SSH (c...
    Extract current weather data from Open Weather Map API using python on AWS EC2: • Extract current weathe...
    How to send out email alert ON RETRY and ON FAILURE in Apache airflow | Airflow Tutorial • How to send out email ...
    Monitor workflow with slack alert upon DAG failure | Airflow Tutorial • Monitor workflow with ...
    How to build and automate a python ETL pipeline and slack alert with airflow | Airflow Tutorial • How to build and autom...
    PostgreSQL Playlist: • Tutorial 1 - What is D...
    Rapid API: rapidapi.com/hub
    AWS Lambda function - Create your first Lambda Function | Lambda Function Tutorial for beginners • AWS Lambda function - ...
    Github Repo: github.com/Yem...
    airflow.apache...
    airflow.apache...
    airflow.apache...
    airflow.apache...
    Part 2: • Zillow Data Analytics ...
    Part 3: • Zillow Data Analytics ...
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    DISCLAIMER: This video and description has affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.
    #dataengineering #airflow

Komentáře • 82

  • @i_am_out_of_office_
    @i_am_out_of_office_ Před 7 měsíci +5

    Awesome Tutorial
    Your dedication to teaching end-to-end data engineering pipelines is truly inspiring. Your guidance has not only deepened my understanding of complex concepts but also empowered me to navigate the intricacies of building robust data pipelines. Thank you for your unwavering support and commitment to fostering knowledge in this dynamic field. Love from India 🚩

  • @seth_king_codes
    @seth_king_codes Před 9 měsíci +3

    BEST channel on youtube for learning about data engineering...thank you man
    your content inspires me

    • @tuplespectra
      @tuplespectra  Před 9 měsíci

      Thanks so much for this comment. It really means a lot to me

  • @gyungyoonpark
    @gyungyoonpark Před 6 měsíci +1

    thank you for the awesome tutorial!!! can't wait to start part 2.
    just one correction. in the "commands used", please add "sudo apt install awscli" as well.

  • @nicholasmageto6110
    @nicholasmageto6110 Před 3 měsíci

    The best ETL video I have ever come across. Thank you sir ❤‍🔥❤‍🔥❤‍🔥💯💯

  • @tuananhdo6006
    @tuananhdo6006 Před 3 měsíci

    This is just what I have been searching for, thank you good sir, please kindly post more videos, you are awesome

  • @dudee420
    @dudee420 Před 2 měsíci +1

    Bro, your explanation is really amazing. Nobody explain at that level. if possible can you start some videos on GCP cloud data engineering projects also. Thank you for great learning

    • @avinash390
      @avinash390 Před 2 měsíci

      Hey bro .... Did you complete this project on AWS , how much was the total cost or it was within the free tier limit

  • @srinivasrepala1
    @srinivasrepala1 Před 3 dny

    ❤ good content

  • @nameisnani5573
    @nameisnani5573 Před 9 měsíci

    Awesome Brother, This is the Best channel i have ever seen in youtube to learn something real. Great work, Nobody can explain like you did, Thank you soo much, Lots of love for you. Keep doing this. Thanks a lott again.

    • @tuplespectra
      @tuplespectra  Před 9 měsíci

      Thanks so much for your comment. I really appreciate it, and it means a lot to me and motivates me to do more.

  • @zuesbenz
    @zuesbenz Před 4 měsíci

    another good video from you. keep it going, keep it simple to the point and let it flow together end to end. just as you have been doing.

  • @pareekshitgaddam9912
    @pareekshitgaddam9912 Před 2 měsíci

    Amazing content! Thank you brother. Please do upload more such videos!!

  • @R_SinghRajput
    @R_SinghRajput Před 4 měsíci

    Since I’m a mech engineer coding is almost like mandarin to me but u sir the Great explanation 🙏🏻🔥🫡🫡 really loved it n totally understood ❤❤

  • @tolu_datacation
    @tolu_datacation Před rokem +2

    Very explanatory!

  • @shivanshhedaoo7268
    @shivanshhedaoo7268 Před 10 měsíci +1

    Hi after airflow standalone i am getting error:
    ModuleNotFoundError: No module named 'connexion.decorators.validation'
    How do I fix this?

  • @Friendsforever-rg2bq
    @Friendsforever-rg2bq Před měsícem +1

    Amazing man..!

  • @kandoras.guzman6705
    @kandoras.guzman6705 Před 10 měsíci

    This was awesome! Thank you for this resource.

  • @rajkumardubey5486
    @rajkumardubey5486 Před měsícem

    We can also use the .env file for encryption of api key and use envloader

  • @joshuaroberts3987
    @joshuaroberts3987 Před 10 měsíci +1

    My ip address refuses to connect after i established port 8080. It showed airflow login and i put in credentials then show a refused to connect screen

  • @sibisuriyanarayantiruchira2302

    Very helpful! Thank you so much :)

    • @tuplespectra
      @tuplespectra  Před rokem

      Thank you so much. I'm glad you find it helpful.

  • @akj3344
    @akj3344 Před rokem

    At 19 seconds, already liked and subscribed.

    • @tuplespectra
      @tuplespectra  Před rokem

      Awesome. Thanks so much. And thanks for finding our video valuable.

  • @nayanroy13
    @nayanroy13 Před rokem +1

    Your content is very useful!

    • @tuplespectra
      @tuplespectra  Před rokem

      Thanks so much. Your comment means a lot to us and I'm glad that you find our contents useful and valuable.

  • @assieneolivier5560
    @assieneolivier5560 Před 8 měsíci

    Great and explicative video guys!! Amazing!!!

    • @tuplespectra
      @tuplespectra  Před 8 měsíci

      Thanks so much! I'm glad you like it.

  • @sophialawal7306
    @sophialawal7306 Před 3 měsíci

    which app did you use to create the data pipeline visualization?

  • @shumengshi5925
    @shumengshi5925 Před 5 měsíci

    Thank you for the wonderful tutorial! It's been incredibly helpful, and I've already subscribed to your CZcams channel!
    I have a question about the necessity of using EC2 in this project. Would it be possible to achieve the same results by simply installing Apache Airflow locally within a Python virtual environment? I followed your steps closely, but when I run a DAG with tasks to extract Zillow data via the Rapid API, the DAG seems to get stuck in the running state indefinitely without completing, and it doesn't generate any logs.
    Interestingly, when I test the Rapid API locally in a plain Python file, it works perfectly fine. Additionally, when I create a DAG without making requests to the API, it also works without any issues. The problem only arises when the DAG task attempts to access Zillow data via the Rapid API.
    I'm curious if this is why EC2 is used in the project. Any insights you could provide would be greatly appreciated! Thanks again for putting out great Data Engineering content!!

  • @salmanshikalgar4482
    @salmanshikalgar4482 Před měsícem

    Pip install --upgrade awscli command not running in virtual environment

  • @himanshupatil6661
    @himanshupatil6661 Před 6 měsíci

    I am getting an error while executing apache standalone
    TypeError: SqlAlchemySessionInterface.__init__() missing 6 required positional arguments: 'sequence', 'schema', 'bind_key', 'use_signer', 'permanent', and 'sid_length'

  • @maxubani9219
    @maxubani9219 Před 8 měsíci

    GOD BLESS YOU!❤

  • @HarrisKeith-r5x
    @HarrisKeith-r5x Před 9 měsíci

    Hey, let say I do this end to end mapping myself how much will it cost me to use their services? Can I do this in free tier plus additional cost I may incur using ec2 instance that is not free like you mentioned?

  • @QuanNguyen-z2g
    @QuanNguyen-z2g Před 10 měsíci

    i just wonder this data pipeline using Lambda for loading and transforming data instead of Glue spark jobs?

  • @omkarmore2198
    @omkarmore2198 Před rokem

    Excellent ...

  • @inadaldaldaldal8231
    @inadaldaldaldal8231 Před 9 dny

    can you Azure platform

  • @navaneethur5466
    @navaneethur5466 Před 6 měsíci

    Hi sir,airflow option is not visibile in the vs code interface even after installing it in the ubuntu instance

    • @nikhitabiradar2146
      @nikhitabiradar2146 Před měsícem

      Hi, I'm facing the same issue. Were you able to resolve it?

  • @AlDamara-x8j
    @AlDamara-x8j Před rokem

    Thanks for this great tutorial! Questions: Is it possible to use Cloud 9 as our IDE and from there access our EC2, or viceversa?

    • @tuplespectra
      @tuplespectra  Před 11 měsíci

      I believe you should be able to use it. Although I have not used it for my airflow project before. You will have to provision a cloud9 IDE and use it but you will have to pay for it except if there is a free-tier that you can use.

  • @darshan9340
    @darshan9340 Před 10 měsíci +1

    Hi,
    The project is really good, got to learn so much.
    I have an error while I am trying to transfer my file from ec2 to s3 bucket.
    File "/usr/local/lib/python3.10/dist-packages/airflow/operators/bash.py", line 210, in execute
    raise AirflowException(
    airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 127.
    I have checked my bash code, it is perfectly fine. My first dag, python operator is running and creating the file but when it comes to bash operator task, it is failing.

    • @pranalidarekar_5852
      @pranalidarekar_5852 Před 10 měsíci

      this happened with me too..what is the solution?

    • @darshan9340
      @darshan9340 Před 10 měsíci

      ​@@pranalidarekar_5852 I had a spelling mistake in my code, that's the reason why it was not running.

    • @pranalidarekar_5852
      @pranalidarekar_5852 Před 10 měsíci

      where exactly did you make the mistake
      @@darshan9340

    • @gyungyoonpark
      @gyungyoonpark Před 6 měsíci

      @@darshan9340 I have the same error. can you tell me where you were wrong?

  • @abduljaweed8131
    @abduljaweed8131 Před rokem +2

    Make one ETL project with Apache airflow without using any cloud

    • @akj3344
      @akj3344 Před rokem +2

      Why through? In your job, youll be expected to work with cloud.

    • @abduljaweed8131
      @abduljaweed8131 Před rokem +1

      @@akj3344 yes but cloud is expensive so to understand the technology doing on local machine I think its good then after know the tech doing experimentation with cloud

    • @Edbwalz
      @Edbwalz Před 11 měsíci

      ​​@@abduljaweed8131Let me take you through an overview of a project that you can do without using cloud:
      First start by working with a CSV file. What you do is upload that file to an s3 bucket, and then load the data from the s3 bucket and basically transform it to parquet data type, and then write to another s3 bucket. After that you can use airflow to orchestrate the tasks.
      Now instead of using s3, you can use minio. It's an open source tool that works exactly like s3. Infact, the airflow operators for s3 can be used on minio as well.
      You can use pandas dataframe to do the transformation to parquet and write the file to the minio bucket. If you want to get a bit more fancy, you can use spark to do the same thing(it leverages the use of dataframes)
      After working with a file, then you easily change the data source to api endpoint.
      I can help if you want. Just ask if you need more clarification. I just gave an overview basically.

    • @cOnfidentialcOrp
      @cOnfidentialcOrp Před 10 měsíci

      @@abduljaweed8131
      Main reason why cloud is used among big companies because its cheap vs building your own data center
      Also , aws and azure have free tier plans , enough for you to learn aswell

  • @pranalidarekar_5852
    @pranalidarekar_5852 Před 10 měsíci

    Thanks for the tutorial, I am trying to connect VSC with the same EC2 instance we created in this project but it showing that permission is denied due to public key. I followed each steps from your other video which is 'How to remotely SSH (connect) Visual Studio Code to AWS EC2'. Please help me with this. I have tired everything but showing me same issue. Iam using Macbook. Thankyou for your time!

    • @tuplespectra
      @tuplespectra  Před 10 měsíci

      May be you need to grant permission to the .pem file such as writing "chmod 400 path/to/filename". Another issue might also be the syntax in your config file. You need to make sure you write it the way it should with lower case where it supposed to be etc.

  • @lesa7p2lmansion
    @lesa7p2lmansion Před 10 měsíci

    guys anybody can help with timestamps on the videos ? it will be really helpful
    I am doing the project and putting it on github and linkedin when I finish...Thanks

  • @amanpirjade9
    @amanpirjade9 Před rokem

    Make video on AWS data analytics services project

  • @Mehtre108
    @Mehtre108 Před 6 měsíci

    Domain name pls

  • @kanchandendge1517
    @kanchandendge1517 Před 11 měsíci +1

    Airflow Standalone command getting stuck. not creating user and password . @tuplespectra, could you please help

    • @tuplespectra
      @tuplespectra  Před 11 měsíci

      Can you kill the server(CTR + C) and then restart it?

    • @kartikeymishra2673
      @kartikeymishra2673 Před 10 měsíci

      hey were you able to fix this error?
      I also faced the same issue !

    • @kartikeymishra2673
      @kartikeymishra2673 Před 10 měsíci

      @@tuplespectra well this really helped , thanks :)

    • @tuplespectra
      @tuplespectra  Před 10 měsíci

      @@kartikeymishra2673 You are welcome.

    • @nikkim94nikhil
      @nikkim94nikhil Před 7 měsíci

      @@tuplespectra Hey, i'm getting a typeerror and not getting stuck but not creating user and password either! Can you help please

  • @Nari_Nizar
    @Nari_Nizar Před rokem

    Thank you so much for such and awesone tutorial. I wanted to run these codes and I am getting this error:
    WARNING - Error when trying to pre-import module 'airflow.providers.amazon.aws.sensors.s3' found in /home/ubuntu/airflow/dags/zillowanalytics.py: No module named 'airflow.providers.amazon'
    Please help!

    • @Nari_Nizar
      @Nari_Nizar Před rokem

      @tuplespectra could you please help?

    • @tuplespectra
      @tuplespectra  Před rokem

      @@Nari_Nizar did you remember to do a "pip install apache-airflow-providers-amazon"?

    • @Nari_Nizar
      @Nari_Nizar Před rokem

      @@tuplespectra it worked! Thank you very much, this is an excellent project!

    • @tuplespectra
      @tuplespectra  Před rokem

      @@Nari_Nizar Thanks. I'm glad it worked and you found the project valuable. Please help Like our videos and Share with your friends, team mates, colleagues so more people can benefit. Thanks so much.