Big Data Engineering Mock Interview | Big Data Pipeline | AWS Cloud Services | Project Architecture

Sumit Mittal

zhlédnutí 9 226

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 24. 03. 2024
𝐓𝐨 𝐞𝐧𝐡𝐚𝐧𝐜𝐞 𝐲𝐨𝐮𝐫 𝐜𝐚𝐫𝐞𝐞𝐫 𝐚𝐬 𝐚 𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫, 𝐂𝐡𝐞𝐜𝐤 trendytech.in/?src=youtube&su... for curated courses developed by me.
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
𝐖𝐚𝐧𝐭 𝐭𝐨 𝐌𝐚𝐬𝐭𝐞𝐫 𝐒𝐐𝐋? 𝐋𝐞𝐚𝐫𝐧 𝐒𝐐𝐋 𝐭𝐡𝐞 𝐫𝐢𝐠𝐡𝐭 𝐰𝐚𝐲 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐬𝐨𝐮𝐠𝐡𝐭 𝐚𝐟𝐭𝐞𝐫 𝐜𝐨𝐮𝐫𝐬𝐞 - 𝐒𝐐𝐋 𝐂𝐡𝐚𝐦𝐩𝐢𝐨𝐧𝐬 𝐏𝐫𝐨𝐠𝐫𝐚𝐦!
"𝐀 8 𝐰𝐞𝐞𝐤 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 𝐝𝐞𝐬𝐢𝐠𝐧𝐞𝐝 𝐭𝐨 𝐡𝐞𝐥𝐩 𝐲𝐨𝐮 𝐜𝐫𝐚𝐜𝐤 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰𝐬 𝐨𝐟 𝐭𝐨𝐩 𝐩𝐫𝐨𝐝𝐮𝐜𝐭 𝐛𝐚𝐬𝐞𝐝 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐛𝐲 𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐢𝐧𝐠 𝐚 𝐭𝐡𝐨𝐮𝐠𝐡𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐚𝐧𝐝 𝐚𝐧 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡 𝐭𝐨 𝐬𝐨𝐥𝐯𝐞 𝐚𝐧 𝐮𝐧𝐬𝐞𝐞𝐧 𝐏𝐫𝐨𝐛𝐥𝐞𝐦."
𝐇𝐞𝐫𝐞 𝐢𝐬 𝐡𝐨𝐰 𝐲𝐨𝐮 𝐜𝐚𝐧 𝐫𝐞𝐠𝐢𝐬𝐭𝐞𝐫 𝐟𝐨𝐫 𝐭𝐡𝐞 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 -
𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLINR
𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐨𝐮𝐭𝐬𝐢𝐝𝐞 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLUSD
30 INTERVIEWS IN 30 DAYS- BIG DATA INTERVIEW SERIES
This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
Our highly experienced guest interviewer, Satinder, / satinder-singh-699aab2b shares invaluable insights and practical advice coming from her extensive experience.
Our talented guest interviewee Aditya Patil, / ap-patil has an impressive approach to answering the interview questions in a very well articulated manner.
Link of Free SQL & Python series developed by me are given below -
SQL Playlist - • SQL tutorial for every...
Python Playlist - • Complete Python By Sum...
Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
Social Media Links :
LinkedIn - / bigdatabysumit
Twitter - / bigdatasumit
Instagram - / bigdatabysumit
Student Testimonials - trendytech.in/#testimonials
Discussed Questions : Timestamp
2:34 Brief overview of projects.
3:19 Describe your data pipeline flow and architecture.
5:10 What transformations do you use, and in which format do you write data to Redshift?
6:44 How do you handle null values?
9:03 Which file format do you use for end-user data?
9:50 Why is Parquet preferred over ORC?
11:10 What are the join types in Hive?
12:07 Which types of joins are used to avoid shuffling in Hive and PySpark? Do you know the specific term?
12:53 Explain how broadcast join avoids shuffling.
14:07 Which property controls broadcast join in Spark?
14:40 How do you start a Spark application in PySpark?
16:09 What does the builder do in Spark session creation?
17:43 What are the partitioning types in Hive?
18:36 Difference between managed and external tables in Hive.
19:16 Have you performed Spark performance tuning?
19:36 Difference between repartition and coalesce in Spark?
20:25 Have you used NoSQL databases?
21:02 SQL coding question
Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

Komentáře • 53

@imranhossain1660 Před 4 měsíci ⁺⁵
parquet is a columnar based storage format, so it is a very good file format in terms of retrieving the data through the query. It definitely reduces the usage of i/o read and network bandwidth. Besides that it has built in support for compression in the form of snappy format. So it reduces the space usgae. Another one I can think of is, parquet files comes up a structure with 3 components, they are header, body and footer. Heder actually the name of the file(part001,part002). Body is actual data content which it is storing and footer is basically for the metadata. This metadata includes the minimum and maximum values of the columns. So whenever we try to query the data which is stored in parquet format this metadata helps us for the data skipping which in turn fast our query execution. Hope it helps.
@pallavigosavi6851 Před 3 měsíci
Thank you!! 👍
@mojibshaikh4092 Před 2 měsíci
Informative and Excellent interview.
@sruthiselvakumar9817 Před 4 měsíci ⁺²
This interview is really great as Satinder explained some concepts like property for broadcast etc more clearly. Thanks Sumit Sir!! Expecting more videos like this..
@sumitmittal07 Před 4 měsíci ⁺¹
satinder will be conducting more interviews
@tanujarora4906 Před 2 měsíci
Satinder sir is awesome, always something to learn from his questions.
@KiyanshLife Před 4 měsíci ⁺⁴
Best Interview I ever seen. Both of you too good at your level.
@sumitmittal07 Před 4 měsíci ⁺¹
yes this interview was next level
@sauravroy9889 Před 3 měsíci
Really nice interview sir.❤
@mohammedalikhan9819 Před 4 měsíci ⁺⁹
The interview was more focused on pyspark, sql we expect interviewer to ask more qns on AWS cloud as well. Because in most of the interview videos posted pyspark has been asked a lot.If qns on AWS would have been asked it would have been very helpful.
@sumitmittal07 Před 4 měsíci ⁺¹
Hi Mohammed, will definitely have some interviews planned specifically for AWS in the upcoming days.
@mohammedalikhan9819 Před 4 měsíci
Thank you sir😊
@avinash7003 Před 3 měsíci
I see mostly asked 70% in Pyspark SQL rest cloud @@mohammedalikhan9819
@abhishekmodak8496 Před 3 měsíci
This was a good interview and Satinder has good experience as an interviewer.
@user-im6ui9zd8v Před 4 měsíci ⁺²
This was a good interview. Different from the earlier one's. Satinder's question and advice was very good.
@sumitmittal07 Před 4 měsíci
this interview has really gone well
@goldykarn5922 Před 3 měsíci
Best interview session so far.
@akshaykumarverma8644 Před 3 měsíci
This was a very good video
@safarnama65 Před 3 měsíci
Very Informative one of the best mock interview with proper answering and details
@sumitmittal07 Před 3 měsíci
Keep watching for more such insightful interviews
@grim_rreaperr Před 3 měsíci
Hi Sumit Sir,
In the first sql problem where we are required to find subject wise toppers, one case where row_number() will fail is when we have two top-scorers with the same marks in a specific subject. Please check the example below:
student_name, subject, marks (-- derived column)
stud_1, maths, 90 -- 1
stud_2, maths, 90 -- 1
stud_1,economics, 95 --1
stud_2, economics, 90 -- 2
stud_3, economics, 88 -- 3
Instead of row_number(), we can choose any one from rank or dense_rank as we just need the first rankers(based on highest marks scored in each subject). My approach will be as follows:
WITH top_scorers AS
(
SELECT student_name,
subject,
marks,
DENSE_RANK() OVER(PARTITION BY subject ORDER BY marks DESC) AS rnk
FROM student_marks
)
SELECT student_name,
subject,
marks
FROM top_scorers
WHERE rnk = 1;
@Sagar0155 Před 4 měsíci
Interview was insightful. Learnt core concepts of spark from Satinder
@sumitmittal07 Před 4 měsíci
glad that it helped you
@DataJourneyHuub Před 4 měsíci
It’s really helpful sir. Thank you so much
@sumitmittal07 Před 4 měsíci
Most welcome
@abhishekkmalik4399 Před 4 měsíci
Very informative video, liked the point of view by Satinder Sir.
@sumitmittal07 Před 4 měsíci
satinder is a very knowledgeable person
@ashwenkumar Před 2 měsíci
Aditya - u need to be strong in the basics and always answer straight forward and crisply on points . Don’t beat the bush
@DesireIsIrrelevant Před 3 měsíci
Thanks for uploading such a great Interview video Sir!
@sumitmittal07 Před 3 měsíci
Glad you found the interview informative!
@sabyspeaksonline Před 3 měsíci
What's the difference between parquet and delta format?
@AliKhanLuckky Před 4 měsíci ⁺¹
Sir i personaly want to see satinder sirs more interviews 😊
@sumitmittal07 Před 4 měsíci ⁺¹
yes definitely, he will be conducting more interviews
@Abhishek-14 Před 4 měsíci ⁺³
Sir please continue python course along with this 🙏
@sumitmittal07 Před 4 měsíci ⁺²
yes, one video coming tomorrow at 7 pm
@Abhishek-14 Před 4 měsíci
@@sumitmittal07 thank you so much sir that's a relief to hear this.
@zaffer2024 Před 4 měsíci
Excellent
@sumitmittal07 Před 4 měsíci
Thanks
@amritmanash7950 Před 4 měsíci
Very nice interview
@sumitmittal07 Před 4 měsíci
glad that you liked it
@doyouwanttoknow3366 Před 4 měsíci
Please upload a gcp data engineer interview video sir
@sumitmittal07 Před 4 měsíci
very soon
@mohitbutola1140 Před 4 měsíci
have anyone have taken the course ?
@sumitmittal07 Před 4 měsíci
Please share your contact number if you would like to know more about the courses that I offer
@ameygoesgaming8793 Před 3 měsíci
My SQL would be:
SELECT student_id, max(marks)
FROM class
GROUP BY subject
@grim_rreaperr Před 3 měsíci
every non-aggregated column in your select statement must be included in the group by statement.( here student_id is a non aggregated column and it should be in your group by clause and same applies for the subject column too which is not being called in the select statement)
@ameygoesgaming8793 Před 3 měsíci
@@grim_rreaperr Oh yes, its a typing bug.
It should be:
SELECT subject, max(marks)
FROM class
GROUP BY subject
@ameygoesgaming8793 Před 3 měsíci
what is NC SQL way?
@SB-ix7db Před 3 měsíci
ANSI
@ameygoesgaming8793 Před 3 měsíci
so ANSI SQL is normal SQL syntax which we write right?@@SB-ix7db
@zaffer2024 Před 4 měsíci
Why data engineer roles have very easy questions
@sumitmittal07 Před 4 měsíci ⁺¹
we make it look easy, else its complex.. haha
@akhilsingh3801 Před 11 dny
Bro is cheating on mock interview with zero fundamental knowledge of Spark or Hadoop 😂😂😂. At least interviewer has asked questions to get something out of this video.

Další v pořadí

Automatické přehrávání

Data Engineer Mock Interview | ADF | Medallion Architecture | BRONZE, SILVER & GOLD Layer| ADLS GEN2