Data Engineer Mock Interview | SQL | PySpark | Project & Scenario based Interview Questions

Sdรญlet
Vloลพit
  • ฤas pล™idรกn 26. 08. 2024
  • ๐“๐จ ๐ž๐ง๐ก๐š๐ง๐œ๐ž ๐ฒ๐จ๐ฎ๐ซ ๐œ๐š๐ซ๐ž๐ž๐ซ ๐š๐ฌ ๐š ๐‚๐ฅ๐จ๐ฎ๐ ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ, ๐‚๐ก๐ž๐œ๐ค trendytech.in/... for curated courses developed by me.
    I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
    ๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ!
    "๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ."
    ๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ -
    ๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLINR
    ๐‘๐ž๐ ๐ข๐ฌ๐ญ๐ซ๐š๐ญ๐ข๐จ๐ง ๐‹๐ข๐ง๐ค (๐‚๐จ๐ฎ๐ซ๐ฌ๐ž ๐€๐œ๐œ๐ž๐ฌ๐ฌ ๐Ÿ๐ซ๐จ๐ฆ ๐จ๐ฎ๐ญ๐ฌ๐ข๐๐ž ๐ˆ๐ง๐๐ข๐š) : rzp.io/l/SQLUSD
    30 INTERVIEWS IN 30 DAYS- BIG DATA INTERVIEW SERIES
    This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
    Our highly experienced guest interviewer, Ankur Bhattacharya, / ankur-bhattacharya-100... shares invaluable insights and practical advice coming from his extensive experience, catering to aspiring data engineers and seasoned professionals alike.
    Our talented guest interviewee, Praroop Sacheti, / praroopsacheti has a remarkable approach to answering the interview questions in a very well articulated manner.
    Link of Free SQL & Python series developed by me are given below -
    SQL Playlist - โ€ข SQL tutorial for every...
    Python Playlist - โ€ข Complete Python By Sum...
    Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
    Social Media Links :
    LinkedIn - / bigdatabysumit
    Twitter - / bigdatasumit
    Instagram - / bigdatabysumit
    Student Testimonials - trendytech.in/...
    Discussed Questions : Timestamp
    1:30 Introduction
    3:29 When you are processing the data with databricks pyspark job. What is the sink for your pipeline?
    4:58 Are you incorporating fact and dimension tables, or any schema in your project's database design?
    5:50 What amount of data are you dealing with in your day to day pipeline?
    6:33 What are the different types of triggers in ADF?
    7:45 What is incremental load ? How can you implement it through ADF ?
    10:03 Difference between Data Lake and Data Warehouse?
    11:41 What is columnar storage in a data warehouse ?
    13:38 What were some challenges encountered during your project, and how were they resolved? Describe the strategies implemented to optimize your pipeline?
    16:18 Optimizations related to Databricks or pyspark ?
    20:41 What is broadcast join ? What exactly happens when we broadcast the table ?
    23:01 SQL coding question
    35:46 PySpark coding question
    Tags
    #mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

Komentรกล™e • 43

  • @rajnarayanshriwas4653
    @rajnarayanshriwas4653 Pล™ed 5 mฤ›sรญci +2

    For incremental laod why we go about MERGE or UPSERT. MERGE or UPSERT we use to implement SCD types. For incremental load what we want is to copy newly arrived data in ADLS. For which we keep track of some reference key, through which we can recognize the new data. For example, in an Order fact table lets say it is Order_ID which keeps on increasing whenever we get a new order.

  • @RahulSaini-ng6po
    @RahulSaini-ng6po Pล™ed 5 mฤ›sรญci +3

    Hi Folks, below is the solution to the PySpark problem written in >>SCALA

  • @ShubhamYadav-gq6fe
    @ShubhamYadav-gq6fe Pล™ed 5 mฤ›sรญci +5

    Please provide the interview feedback in few mins at the end to help more with this.

  • @harshitgoel2985
    @harshitgoel2985 Pล™ed 5 mฤ›sรญci +6

    Please attach the questions list link(in view mode) that are asked in mock interview in description

  • @Vlogs..573
    @Vlogs..573 Pล™ed 5 mฤ›sรญci +4

    Great Initiative Sumit Sir !

    • @sumitmittal07
      @sumitmittal07  Pล™ed 5 mฤ›sรญci +1

      thank you. A big thanks to people who are participating in this.

  • @PradyutJoshi
    @PradyutJoshi Pล™ed 5 mฤ›sรญci +5

    Good initiative. This is quite helpful on how to answer the scenario based questions, with an example. Thank you sir, Ankur and Praroop! ๐Ÿ™Œ

  • @3A3A11
    @3A3A11 Pล™ed 5 mฤ›sรญci +6

    Sir please make videos on topics like " Someone working in Tech Support from past 5 years and now moving to Data Engineer" What they should write in their resume like in experience section... Whether should give try as fresher or whatever

    • @ravichakraborty3878
      @ravichakraborty3878 Pล™ed 5 mฤ›sรญci +1

      Sir, I also have the same question.

    • @Shivamyogi10
      @Shivamyogi10 Pล™ed 5 mฤ›sรญci +1

      Yes that is very valuable. As most of the people are working in different roles but being in support roles in data field we are interested to switch into data engg.

    • @sumitmittal07
      @sumitmittal07  Pล™ed 5 mฤ›sรญci +2

      surely will release a video on this soon

  • @WadieGamer
    @WadieGamer Pล™ed 5 mฤ›sรญci +1

    Great video for new data engineers like me.

  • @yifeichen5198
    @yifeichen5198 Pล™ed 5 mฤ›sรญci +1

    great content! very insightful questions and answers!

  • @gopalgaihre9710
    @gopalgaihre9710 Pล™ed 5 mฤ›sรญci +1

    Please make videos for freshers as well, because these days no one is looking for freshers for data engineering roles...

  • @BooksWala
    @BooksWala Pล™ed 5 mฤ›sรญci +1

    Please also some video regarding what kinds of problems data engineer face in their day to days working

    • @sumitmittal07
      @sumitmittal07  Pล™ed 5 mฤ›sรญci

      noted, will bring a video on this soon

  • @karthikeyanr1171
    @karthikeyanr1171 Pล™ed 5 mฤ›sรญci +2

    Solution for Pyspark Problem
    def location_f(loc):
    if loc == 'CHN':
    return 'CHENNAI'
    elif loc == 'AP':
    return 'ANDHRA PRADESH'
    elif loc == 'HYD':
    return 'HYDERABAD'
    else:
    return loc
    re_location = F.udf(location_f, StringType())
    df1 = df.withColumn('ref_id1', F.split('ref_id','\DIV-|\_')).drop('ref_id')
    df2 = df1.withColumn('ref_id', F.col('ref_id1')[2]).withColumn('location', re_location(F.col('ref_id1')[1]))
    df3 = df2.select('name', 'ref_id', 'salary','location')
    df3.show

    • @ArunKumar-mr7pc
      @ArunKumar-mr7pc Pล™ed 2 mฤ›sรญci

      from pyspark.sql.functions import col, lit,when
      df_employee.withColumn("LOCATION",
      when(col("REF-ID").like("DIV-CHN%"), "CHN-CHENNAI")
      .when(col("REF-ID").like("DIV-HYD%"), "HYD-HYDERABAD")
      .when(col("REF-ID").like("DIV-AP%"), "AP-ANDHRA PRADESH")
      .when(col("REF-ID").like("DIV-PUNE%"), "PUNE-PUNE")).show()

  • @NabaKrPaul-ik2oy
    @NabaKrPaul-ik2oy Pล™ed 5 mฤ›sรญci

    Hi Sir, Thanks for this series, very insightful. Just a query, does majority of the interviews goes till coding part or majority cases its theory only? or is it mix and match?

  • @user-jg2tn1wb3d
    @user-jg2tn1wb3d Pล™ed 5 mฤ›sรญci

    thank you so much sumit sir its really helpful

    • @sumitmittal07
      @sumitmittal07  Pล™ed 5 mฤ›sรญci

      Happy to share more such informative videos for the community!

  • @salonisacheti7350
    @salonisacheti7350 Pล™ed 5 mฤ›sรญci

    Good Work Praroop โค

  • @swapnildande4706
    @swapnildande4706 Pล™ed 5 mฤ›sรญci

    Hi Sir ,Request you to please upload more videos on Data engineer mock interview

    • @sumitmittal07
      @sumitmittal07  Pล™ed 5 mฤ›sรญci

      one video daily for next 30 days

  • @Raghavendraginka
    @Raghavendraginka Pล™ed 5 mฤ›sรญci

    sir please make complete video on sql and mock interviews too

    • @sumitmittal07
      @sumitmittal07  Pล™ed 5 mฤ›sรญci

      Definitely, will be covered in the upcoming videos

  • @ashwinigadekar2956
    @ashwinigadekar2956 Pล™ed 5 mฤ›sรญci

    Please make interview session
    for fresher.

  • @saurabhgavande6728
    @saurabhgavande6728 Pล™ed 5 mฤ›sรญci

    can u make a video for aws cloud as of azure

  • @shivanisaini2076
    @shivanisaini2076 Pล™ed 4 mฤ›sรญci

    I want to give mock interview.

  • @digantapurkait6231
    @digantapurkait6231 Pล™ed 5 mฤ›sรญci

    wahh

  • @vishaldeshatwad8690
    @vishaldeshatwad8690 Pล™ed 5 mฤ›sรญci

    df_new = df.select(col("name"),col("refid"),col("salary"),split("refid","-")[1].alias("l"),split("l","_")[0].alias("loc")).drop(col("l"))
    final_result_df = df_new.withColumn("location",when(col("loc")=="CHN","CHENNAI")\
    .when(col("loc")=="HYD","HYDERABAD")\
    .when(col("loc")=="AP","ANDRA_PRADESH")\
    .when(col("loc")=="PUN","PUNE") ).drop("loc")