Big Data Engineering Mock Interview | Big Data Pipeline | AWS Cloud Services | Project Architecture
Vloลพit
- ฤas pลidรกn 24. 03. 2024
- ๐๐จ ๐๐ง๐ก๐๐ง๐๐ ๐ฒ๐จ๐ฎ๐ซ ๐๐๐ซ๐๐๐ซ ๐๐ฌ ๐ ๐๐ฅ๐จ๐ฎ๐ ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ, ๐๐ก๐๐๐ค trendytech.in/?src=youtube&su... for curated courses developed by me.
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
๐๐๐ง๐ญ ๐ญ๐จ ๐๐๐ฌ๐ญ๐๐ซ ๐๐๐? ๐๐๐๐ซ๐ง ๐๐๐ ๐ญ๐ก๐ ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐๐๐ญ๐๐ซ ๐๐จ๐ฎ๐ซ๐ฌ๐ - ๐๐๐ ๐๐ก๐๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐๐ฆ!
"๐ 8 ๐ฐ๐๐๐ค ๐๐ซ๐จ๐ ๐ซ๐๐ฆ ๐๐๐ฌ๐ข๐ ๐ง๐๐ ๐ญ๐จ ๐ก๐๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐๐ซ๐๐๐ค ๐ญ๐ก๐ ๐ข๐ง๐ญ๐๐ซ๐ฏ๐ข๐๐ฐ๐ฌ ๐จ๐ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐๐ญ ๐๐๐ฌ๐๐ ๐๐จ๐ฆ๐ฉ๐๐ง๐ข๐๐ฌ ๐๐ฒ ๐๐๐ฏ๐๐ฅ๐จ๐ฉ๐ข๐ง๐ ๐ ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐๐๐ฌ๐ฌ ๐๐ง๐ ๐๐ง ๐๐ฉ๐ฉ๐ซ๐จ๐๐๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ ๐๐ง ๐ฎ๐ง๐ฌ๐๐๐ง ๐๐ซ๐จ๐๐ฅ๐๐ฆ."
๐๐๐ซ๐ ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐๐๐ง ๐ซ๐๐ ๐ข๐ฌ๐ญ๐๐ซ ๐๐จ๐ซ ๐ญ๐ก๐ ๐๐ซ๐จ๐ ๐ซ๐๐ฆ -
๐๐๐ ๐ข๐ฌ๐ญ๐ซ๐๐ญ๐ข๐จ๐ง ๐๐ข๐ง๐ค (๐๐จ๐ฎ๐ซ๐ฌ๐ ๐๐๐๐๐ฌ๐ฌ ๐๐ซ๐จ๐ฆ ๐๐ง๐๐ข๐) : rzp.io/l/SQLINR
๐๐๐ ๐ข๐ฌ๐ญ๐ซ๐๐ญ๐ข๐จ๐ง ๐๐ข๐ง๐ค (๐๐จ๐ฎ๐ซ๐ฌ๐ ๐๐๐๐๐ฌ๐ฌ ๐๐ซ๐จ๐ฆ ๐จ๐ฎ๐ญ๐ฌ๐ข๐๐ ๐๐ง๐๐ข๐) : rzp.io/l/SQLUSD
30 INTERVIEWS IN 30 DAYS- BIG DATA INTERVIEW SERIES
This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
Our highly experienced guest interviewer, Satinder, / satinder-singh-699aab2b shares invaluable insights and practical advice coming from her extensive experience.
Our talented guest interviewee Aditya Patil, / ap-patil has an impressive approach to answering the interview questions in a very well articulated manner.
Link of Free SQL & Python series developed by me are given below -
SQL Playlist - โข SQL tutorial for every...
Python Playlist - โข Complete Python By Sum...
Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
Social Media Links :
LinkedIn - / bigdatabysumit
Twitter - / bigdatasumit
Instagram - / bigdatabysumit
Student Testimonials - trendytech.in/#testimonials
Discussed Questions : Timestamp
2:34 Brief overview of projects.
3:19 Describe your data pipeline flow and architecture.
5:10 What transformations do you use, and in which format do you write data to Redshift?
6:44 How do you handle null values?
9:03 Which file format do you use for end-user data?
9:50 Why is Parquet preferred over ORC?
11:10 What are the join types in Hive?
12:07 Which types of joins are used to avoid shuffling in Hive and PySpark? Do you know the specific term?
12:53 Explain how broadcast join avoids shuffling.
14:07 Which property controls broadcast join in Spark?
14:40 How do you start a Spark application in PySpark?
16:09 What does the builder do in Spark session creation?
17:43 What are the partitioning types in Hive?
18:36 Difference between managed and external tables in Hive.
19:16 Have you performed Spark performance tuning?
19:36 Difference between repartition and coalesce in Spark?
20:25 Have you used NoSQL databases?
21:02 SQL coding question
Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs
parquet is a columnar based storage format, so it is a very good file format in terms of retrieving the data through the query. It definitely reduces the usage of i/o read and network bandwidth. Besides that it has built in support for compression in the form of snappy format. So it reduces the space usgae. Another one I can think of is, parquet files comes up a structure with 3 components, they are header, body and footer. Heder actually the name of the file(part001,part002). Body is actual data content which it is storing and footer is basically for the metadata. This metadata includes the minimum and maximum values of the columns. So whenever we try to query the data which is stored in parquet format this metadata helps us for the data skipping which in turn fast our query execution. Hope it helps.
Thank you!! ๐
Informative and Excellent interview.
This interview is really great as Satinder explained some concepts like property for broadcast etc more clearly. Thanks Sumit Sir!! Expecting more videos like this..
satinder will be conducting more interviews
Satinder sir is awesome, always something to learn from his questions.
Best Interview I ever seen. Both of you too good at your level.
yes this interview was next level
Really nice interview sir.โค
The interview was more focused on pyspark, sql we expect interviewer to ask more qns on AWS cloud as well. Because in most of the interview videos posted pyspark has been asked a lot.If qns on AWS would have been asked it would have been very helpful.
Hi Mohammed, will definitely have some interviews planned specifically for AWS in the upcoming days.
Thank you sir๐
I see mostly asked 70% in Pyspark SQL rest cloud โ@@mohammedalikhan9819
This was a good interview and Satinder has good experience as an interviewer.
This was a good interview. Different from the earlier one's. Satinder's question and advice was very good.
this interview has really gone well
Best interview session so far.
This was a very good video
Very Informative one of the best mock interview with proper answering and details
Keep watching for more such insightful interviews
Hi Sumit Sir,
In the first sql problem where we are required to find subject wise toppers, one case where row_number() will fail is when we have two top-scorers with the same marks in a specific subject. Please check the example below:
student_name, subject, marks (-- derived column)
stud_1, maths, 90 -- 1
stud_2, maths, 90 -- 1
stud_1,economics, 95 --1
stud_2, economics, 90 -- 2
stud_3, economics, 88 -- 3
Instead of row_number(), we can choose any one from rank or dense_rank as we just need the first rankers(based on highest marks scored in each subject). My approach will be as follows:
WITH top_scorers AS
(
SELECT student_name,
subject,
marks,
DENSE_RANK() OVER(PARTITION BY subject ORDER BY marks DESC) AS rnk
FROM student_marks
)
SELECT student_name,
subject,
marks
FROM top_scorers
WHERE rnk = 1;
Interview was insightful. Learnt core concepts of spark from Satinder
glad that it helped you
Itโs really helpful sir. Thank you so much
Most welcome
Very informative video, liked the point of view by Satinder Sir.
satinder is a very knowledgeable person
Aditya - u need to be strong in the basics and always answer straight forward and crisply on points . Donโt beat the bush
Thanks for uploading such a great Interview video Sir!
Glad you found the interview informative!
What's the difference between parquet and delta format?
Sir i personaly want to see satinder sirs more interviews ๐
yes definitely, he will be conducting more interviews
Sir please continue python course along with this ๐
yes, one video coming tomorrow at 7 pm
@@sumitmittal07 thank you so much sir that's a relief to hear this.
Excellent
Thanks
Very nice interview
glad that you liked it
Please upload a gcp data engineer interview video sir
very soon
have anyone have taken the course ?
Please share your contact number if you would like to know more about the courses that I offer
My SQL would be:
SELECT student_id, max(marks)
FROM class
GROUP BY subject
every non-aggregated column in your select statement must be included in the group by statement.( here student_id is a non aggregated column and it should be in your group by clause and same applies for the subject column too which is not being called in the select statement)
@@grim_rreaperr Oh yes, its a typing bug.
It should be:
SELECT subject, max(marks)
FROM class
GROUP BY subject
what is NC SQL way?
ANSI
so ANSI SQL is normal SQL syntax which we write right?@@SB-ix7db
Why data engineer roles have very easy questions
we make it look easy, else its complex.. haha
Bro is cheating on mock interview with zero fundamental knowledge of Spark or Hadoop ๐๐๐. At least interviewer has asked questions to get something out of this video.