Designing DataWarehouse from Scratch | End to End Data Engineering
Vložit
- čas přidán 19. 06. 2024
- FULL END TO END COURSE AVAILABLE AT www.datamasterylab.com/home/c...
Accelerate your Data Mastery by signing up on datamasterylab.com.
This video is divided into 5 parts:
1. Designing the logical view of the DW
2. Creating a Redshift DW cluster
3. Converting the logical view to physical view
4. Loading the DW with Data
5. Implementing the medallion architecture on DW
Timestamps:
0:00 Introduction
2:00 System Prerequisites
9:22 Steps Involved in Designing a Data Warehouse
21:00 The Business Usecase
22:42 Designing the Logical Architecture
56:10 Creating a VPC on AWS
58:21 Creating Redshift Data Warehouse Cluster
1:00:00 Creating Subnet Group on AWS
1:03:11 Creating Security Group and allowing external connections on AWS
1:05:22 Connecting to Redshift Cluster with DBeaver
1:05:40 Connecting to Redshift Cluster with Redshift Query Editor
1:09:20 Creating Dimensions and Fact data
1:16:57 Loading data into Data Warehouse
1:25:44 Creating AWS Data Catalog DB and Tables
1:32:05 Connecting to Redshift to AWS Glue Data Catalog
1:36:30 Creating DBT project
1:41:30 Configuration connections to Redshift from DBT
1:47:12 DBT Project configuration with Variables and Schema
1:49:05 Creating Silver Dimension models
2:05:34 Creating Silver Fact models
2:20:12 Creating Gold Dimension and Fact Models
2:38:17 Other course information
Like this video? Support us: / @codewithyu
👦🏻 My Linkedin: / yusuf-ganiyu-b90140107
🚀 X(Twitter): x.com/YusufOGaniyu
📝 Medium: / yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Video source code: www.buymeacoffee.com/yusuf.ga...
✅ S3 Documentation: docs.aws.amazon.com/s3/
✅ AWS IAM Documentation: docs.aws.amazon.com/IAM/lates...
#DataEngineering #DataWarehousing #DataEngineer #TechTutorial #DataScience #BigData #CloudComputing #AWSRedshift #DatabaseDesign #DataArchitecture #MedallionArchitecture #LearnCoding #TechEducation #DataAnalytics #DWDesign #PhysicalDataModel #LogicalDataModel #DataLoading #DataWarehouseProject #CodeWithYu - Věda a technologie
As some one who is looking to transition into data engineer role from data analyst. Your videos are really good. Keep up the good work.
I was able to Complete the full project, which helped me get quite a few shortlists in interviews, truly amazing content Yusuf.
It would really great if you had a CICD and tools required for CICD pipeline learning and a git, shell scripting operations important for a data engineer video.
Great job! Well done Sam!
Great job breaking everything down! I have not seen a project video as detailed as this. This is how a end to end project video should be! Just started as the only data engineer in my company and I definitely plan to implement all the steps in my current projects. Keep it up!
This is priceless thanks for this video.
Glad you enjoyed it!
i am having a lot of problems at work not getting enough projects and be stuck to transformation layer. you are more helpful than any of my seniors in the company.
I’m glad I’m able to help ❤️
Another very demanding Data Engineering skill is Snowflake. Can you please come up with a comprehensive snowflake course or project?
I have purchased the course in Datamastery lab....Its got everything in it....i would enjoy this project and perfect for my fresher portfolio. Kepp posting contents like this... Love From India!!!!😍
Excellent!
Don’t forget to shout if you have questions/challenges.
Cheers 🥂
@@CodeWithYu Quick doubt , is there a way to personally connect with you through DataMastery lab?(which is preferable) for clearing doubts or do i connect with you through LinkedIn or e-mail
Kindly advise.
perfect
great content, in the project, you ran dbt locally, can we deploy it in aws cloud? can dbt is a full alternative of AWS glue which provides us flexibility to build ETL and schedule it as well in case of incremental or some trigger based flow ?
Such a wonderful video, where did you get the data from
Keep up the good work, Yu
Thank you, I will
Please make a video 'designing data warehouse on Azure'
❤
Can you come up wirh azure data pipeline project next ?
Thank you for the video. I stuck on connect to redshift, it showed connection attempt timed out. I have set the inbound and outbound and checked it is the correct VPC security group. Could you please kindly help ?
1. checked IAM role: amazonredshiftfullcommandfullaccess.
2. checked enable public access
3. checked inbound outbound set up to redshift to ipv4
4. checked username and password
No idea why it still not working
I got it, according to the documentation. need to set all route table of VPC to ipv4 -> internet gateway -> igw.... God blessed.
I love this channel, I'm learning a lot, I'd love to see artificial intelligence videos on your channel again it would be spectacular
Hard disagree. There are more than enough other channels covering artificial intelligence, I would lose interest in this channel if it went this route as well
yes, this channel should be for hardcore data engineers :)
Hi Yusuf, amazing project i have supported you in "buy me a coffee". But i cant find how the data is generated or from where the data is coming and why is the main.py file missing in the project source code.
The main.py is supposed to be in in the source code folder but its not there , other than that rest everything is working just fine.
Can you please put architecture diagram and explain that architecture diagrams for like 1-2 mins because that helps a lot to understand the overview of the project....
True
First ❤
the D (reshift) is little down today trying to skip the hardwork ..lol
Haha lol 😂 I kept making that same mistake over and over lol
😃@@CodeWithYu
@CodeWithYu where is the main file to create data
In the description
share it here, also
am still suffering on how to load data in s3 i dont have sample data
second :D
Disaster Recovery always cause fumbles in the system implementation ..haha
why have you not loaded the bronze schema from OLTP data instead of creating fact and dimensionsCSV files using Python script? Thats so important to understand how the fact and dimension tables are loaded from OLTP source data using SCD-2 . It will really help if you create the models to load bronze facts and dimensions from OLTP sources. I mean the whole point of data modelling and designing a warehouse is how we can convert the data in OLTP database to an OLAP datawarehouse and thats missing. Important concepts like SCD implementation, how facts and diemension tables are loaded from OLTP has been totally skipped.
Also whats the IDE/Editor you are using for creating DBT models ?
yes, agree that's the critical link missing, rather than directly loading fact and dimension tables, it should have been prepared from OLTP database.
The video was getting too long… if you’re interested in the OLTP implementation, you can get the full course on datamasterylab.com
@@CodeWithYu I don't know which course is providing that information on datamastery. That's the most crucial part of data warehousing. building facts and dimensions from OLTP database. also we didn't get to see contents of main.py in this video. We are fine watching 4 hrs of video as long as it clears all our concepts and I really like the way you teach stuff and I learn a lot from them. but I get frustrated when I can't learn few stuff just because it is missing from the video. also no usecase of scd2. how would we know if facts and dimensions handling referential integrity etc. (PK, FK). It would be really helpful if you keep your videos in detail instead of skipping some parts. thank you.
@@yash-ri2lg exactly... bhai.. tune ye pipeline implement kiya kya? wo main.py ke contents kya hai?
I want a Python script for free