Designing DataWarehouse from Scratch | End to End Data Engineering

CodeWithYu

zhlédnutí 9 399

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 19. 06. 2024
FULL END TO END COURSE AVAILABLE AT www.datamasterylab.com/home/c...
Accelerate your Data Mastery by signing up on datamasterylab.com.
This video is divided into 5 parts:
1. Designing the logical view of the DW
2. Creating a Redshift DW cluster
3. Converting the logical view to physical view
4. Loading the DW with Data
5. Implementing the medallion architecture on DW
Timestamps:
0:00 Introduction
2:00 System Prerequisites
9:22 Steps Involved in Designing a Data Warehouse
21:00 The Business Usecase
22:42 Designing the Logical Architecture
56:10 Creating a VPC on AWS
58:21 Creating Redshift Data Warehouse Cluster
1:00:00 Creating Subnet Group on AWS
1:03:11 Creating Security Group and allowing external connections on AWS
1:05:22 Connecting to Redshift Cluster with DBeaver
1:05:40 Connecting to Redshift Cluster with Redshift Query Editor
1:09:20 Creating Dimensions and Fact data
1:16:57 Loading data into Data Warehouse
1:25:44 Creating AWS Data Catalog DB and Tables
1:32:05 Connecting to Redshift to AWS Glue Data Catalog
1:36:30 Creating DBT project
1:41:30 Configuration connections to Redshift from DBT
1:47:12 DBT Project configuration with Variables and Schema
1:49:05 Creating Silver Dimension models
2:05:34 Creating Silver Fact models
2:20:12 Creating Gold Dimension and Fact Models
2:38:17 Other course information
Like this video? Support us: / @codewithyu
👦🏻 My Linkedin: / yusuf-ganiyu-b90140107
🚀 X(Twitter): x.com/YusufOGaniyu
📝 Medium: / yusuf.ganiyu
🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟
🔗 Useful Links and Resources:
✅ Video source code: www.buymeacoffee.com/yusuf.ga...
✅ S3 Documentation: docs.aws.amazon.com/s3/
✅ AWS IAM Documentation: docs.aws.amazon.com/IAM/lates...
#DataEngineering #DataWarehousing #DataEngineer #TechTutorial #DataScience #BigData #CloudComputing #AWSRedshift #DatabaseDesign #DataArchitecture #MedallionArchitecture #LearnCoding #TechEducation #DataAnalytics #DWDesign #PhysicalDataModel #LogicalDataModel #DataLoading #DataWarehouseProject #CodeWithYu
Věda a technologie

Komentáře • 48

@sekharxchandra Před 2 měsíci ⁺¹⁴
As some one who is looking to transition into data engineer role from data analyst. Your videos are really good. Keep up the good work.
@user-id7xj3oh3s Před 12 dny ⁺¹
I was able to Complete the full project, which helped me get quite a few shortlists in interviews, truly amazing content Yusuf.
It would really great if you had a CICD and tools required for CICD pipeline learning and a git, shell scripting operations important for a data engineer video.
@CodeWithYu Před 11 dny
Great job! Well done Sam!
@Ummvlog Před 2 měsíci
Great job breaking everything down! I have not seen a project video as detailed as this. This is how a end to end project video should be! Just started as the only data engineer in my company and I definitely plan to implement all the steps in my current projects. Keep it up!
@dataisfun4964 Před 2 měsíci ⁺²
This is priceless thanks for this video.
@CodeWithYu Před 2 měsíci ⁺¹
Glad you enjoyed it!
@voxdiary Před 2 měsíci ⁺²
i am having a lot of problems at work not getting enough projects and be stuck to transformation layer. you are more helpful than any of my seniors in the company.
@CodeWithYu Před 22 dny
I’m glad I’m able to help ❤️
@MantavyaSoni-cz1dq Před 2 dny
Another very demanding Data Engineering skill is Snowflake. Can you please come up with a comprehensive snowflake course or project?
@user-id7xj3oh3s Před 23 dny ⁺¹
I have purchased the course in Datamastery lab....Its got everything in it....i would enjoy this project and perfect for my fresher portfolio. Kepp posting contents like this... Love From India!!!!😍
@CodeWithYu Před 22 dny
Excellent!
Don’t forget to shout if you have questions/challenges.
Cheers 🥂
@user-id7xj3oh3s Před 22 dny
@@CodeWithYu Quick doubt , is there a way to personally connect with you through DataMastery lab?(which is preferable) for clearing doubts or do i connect with you through LinkedIn or e-mail
Kindly advise.
@farhadshakibaca Před 2 měsíci
perfect
@abdul20ize Před 2 měsíci
great content, in the project, you ran dbt locally, can we deploy it in aws cloud? can dbt is a full alternative of AWS glue which provides us flexibility to build ETL and schedule it as well in case of incremental or some trigger based flow ?
@princewilludoh7338 Před 2 měsíci
Such a wonderful video, where did you get the data from
@ck3000 Před 2 měsíci ⁺¹
Keep up the good work, Yu
@CodeWithYu Před 2 měsíci
Thank you, I will
@user-do4zb5zk3t Před 2 měsíci ⁺²
Please make a video 'designing data warehouse on Azure'
@oyekanemmanuel5636 Před 2 měsíci ⁺¹
❤
@rahul2506 Před 2 měsíci
Can you come up wirh azure data pipeline project next ?
@Remmy1314 Před měsícem ⁺¹
Thank you for the video. I stuck on connect to redshift, it showed connection attempt timed out. I have set the inbound and outbound and checked it is the correct VPC security group. Could you please kindly help ?
@Remmy1314 Před měsícem
1. checked IAM role: amazonredshiftfullcommandfullaccess.
2. checked enable public access
3. checked inbound outbound set up to redshift to ipv4
4. checked username and password
No idea why it still not working
@Remmy1314 Před měsícem
I got it, according to the documentation. need to set all route table of VPC to ipv4 -> internet gateway -> igw.... God blessed.
@todos4ta97 Před 2 měsíci ⁺¹
I love this channel, I'm learning a lot, I'd love to see artificial intelligence videos on your channel again it would be spectacular
@personalbranddata Před 2 měsíci ⁺¹
Hard disagree. There are more than enough other channels covering artificial intelligence, I would lose interest in this channel if it went this route as well
@yash-ri2lg Před 2 měsíci ⁺¹
yes, this channel should be for hardcore data engineers :)
@user-id7xj3oh3s Před 23 dny
Hi Yusuf, amazing project i have supported you in "buy me a coffee". But i cant find how the data is generated or from where the data is coming and why is the main.py file missing in the project source code.
@user-id7xj3oh3s Před 23 dny
The main.py is supposed to be in in the source code folder but its not there , other than that rest everything is working just fine.
@manoharlakshmana6171 Před 2 měsíci
Can you please put architecture diagram and explain that architecture diagrams for like 1-2 mins because that helps a lot to understand the overview of the project....
@ramchandra833 Před 2 měsíci
True
@luongduyang9146 Před 2 měsíci ⁺¹
First ❤
@yash-ri2lg Před 2 měsíci ⁺²
the D (reshift) is little down today trying to skip the hardwork ..lol
@CodeWithYu Před 2 měsíci ⁺¹
Haha lol 😂 I kept making that same mistake over and over lol
@yash-ri2lg Před 2 měsíci
😃@@CodeWithYu
@roshaanzafar8092 Před 2 měsíci
@CodeWithYu where is the main file to create data
@CodeWithYu Před 2 měsíci
In the description
@gauravbisht5190 Před měsícem
share it here, also
@Rugambageorge Před 16 dny
am still suffering on how to load data in s3 i dont have sample data
@yash-ri2lg Před 2 měsíci ⁺¹
second :D
@yash-ri2lg Před 2 měsíci ⁺¹
Disaster Recovery always cause fumbles in the system implementation ..haha
@alearning4454 Před 2 měsíci
why have you not loaded the bronze schema from OLTP data instead of creating fact and dimensionsCSV files using Python script? Thats so important to understand how the fact and dimension tables are loaded from OLTP source data using SCD-2 . It will really help if you create the models to load bronze facts and dimensions from OLTP sources. I mean the whole point of data modelling and designing a warehouse is how we can convert the data in OLTP database to an OLAP datawarehouse and thats missing. Important concepts like SCD implementation, how facts and diemension tables are loaded from OLTP has been totally skipped.
Also whats the IDE/Editor you are using for creating DBT models ?
@yash-ri2lg Před 2 měsíci
yes, agree that's the critical link missing, rather than directly loading fact and dimension tables, it should have been prepared from OLTP database.
@CodeWithYu Před 2 měsíci
The video was getting too long… if you’re interested in the OLTP implementation, you can get the full course on datamasterylab.com
@shubhammahajan9117 Před 2 měsíci
@@CodeWithYu I don't know which course is providing that information on datamastery. That's the most crucial part of data warehousing. building facts and dimensions from OLTP database. also we didn't get to see contents of main.py in this video. We are fine watching 4 hrs of video as long as it clears all our concepts and I really like the way you teach stuff and I learn a lot from them. but I get frustrated when I can't learn few stuff just because it is missing from the video. also no usecase of scd2. how would we know if facts and dimensions handling referential integrity etc. (PK, FK). It would be really helpful if you keep your videos in detail instead of skipping some parts. thank you.
@shubhammahajan9117 Před 2 měsíci
@@yash-ri2lg exactly... bhai.. tune ye pipeline implement kiya kya? wo main.py ke contents kya hai?
@abdoashour2445 Před měsícem
I want a Python script for free

Další v pořadí

Automatické přehrávání

Real Estate End to End Data Engineering using AI