Big Data Engineer Mock Interview | AWS | Kafka Streaming | SQL | PySpark Optimization

Sdílet
Vložit
  • čas přidán 6. 09. 2024

Komentáře • 6

  • @arunsundar3739
    @arunsundar3739 Před 5 měsíci +3

    very insightful on sql, aws, data modeling concepts & applications of those concepts, helps to recall & understand better the concepts learnt in big data master course & sql leetcode playlist :)

  • @ankandatta4352
    @ankandatta4352 Před 5 měsíci +3

    In the case of creating a primary key in case unavailable, we can select any attribute and check if that attribute has 1 to 1 relationship with other composite values (in excel using a pivot table, check distinct values) and then use sha2 or md5 in adf to form the surrogate key. Correct me if I'm wrong

  • @sonuparmar5836
    @sonuparmar5836 Před 5 měsíci +1

    @sumitmittal07 The SQL aggregate question in which we need to calculate cumulative profit won't use ROWS Between as that will be used for rolling profit between a range, instead it should be simply: CUMULATIVE_PROFIT = SUM(profit) OVER(ORDER BY transaction_id, transaction_date). Let me know if I understood the question correctly or not.
    Also, in the partitioning and bucketing question interviewee have explained vice-versa.

    • @aniruths9900
      @aniruths9900 Před 3 měsíci

      You are right - Buckets are stored as files. Partitions are stored as directories.