Salting in Apache Spark - Part I

Sdílet
Vložit
  • čas přidán 6. 09. 2024

Komentáře • 5

  • @TheBigDataShow
    @TheBigDataShow  Před 2 měsíci

    A practical demonstration will be relaxed tomorrow. Kindly watch this video to understand the theory in depth.

  • @mufaddalrampurawala247
    @mufaddalrampurawala247 Před 2 měsíci +1

    This also increases the data size of the second dataset as we explode it, so is it still optimized as the data scan will be increased a lot and lot of shuffle will be involved?

    • @nishabansal2978
      @nishabansal2978 Před 2 měsíci +1

      While salting can increase the data size and shuffle overhead in Spark, its benefits in mitigating data skewness and improving workload distribution often outweigh these drawbacks. The other important thing is to decide on salting factor to choose for your workload as that will again impact the overall distribution

  • @payalbhatia6927
    @payalbhatia6927 Před 2 měsíci

    which pentab/device is used for video , can you please share ?