Job, Stage and Task in Apache Spark | PySpark interview questions

Sdílet
Vložit
  • čas přidán 6. 09. 2024

Komentáře • 9

  • @Simrankotiya10
    @Simrankotiya10 Před 13 dny

    Great explaination

  • @ChetanSharma-oy4ge
    @ChetanSharma-oy4ge Před 3 měsíci +1

    What if count function we used along with some variable and transformation?

    • @TheBigDataShow
      @TheBigDataShow  Před 3 měsíci +1

      count is a tricky action. Most Data Engineers actually get confused with this. Ideally, count() is an action and should create a brand new JOB but Apache spark is a very smart computing engine and it uses its source and predicate pushdown and purning, if source stores the value of count() in their meta data then it will directly fetch the value of count() instead of creating a brand new JOB.

    • @ChetanSharma-oy4ge
      @ChetanSharma-oy4ge Před 3 měsíci

      @@TheBigDataShow Great, Thanks for answering ...do we have some other examples as well? or the resources from where i can get these concepts?

  • @siddheshchavan2069
    @siddheshchavan2069 Před 3 měsíci +1

    Can you make end to end data engineering projects?

    • @TheBigDataShow
      @TheBigDataShow  Před 3 měsíci

      I have already created one. Please check the channel. There is no prerequisite for this 3-hour long video and project. You just need to know the basics of PySpark. Please check the link.
      czcams.com/video/BlWS4foN9cY/video.htmlsi=qL0ZSXBELEEKe2L2

    • @siddheshchavan2069
      @siddheshchavan2069 Před 3 měsíci

      @@TheBigDataShow great, thanks!

  • @debabratabar2008
    @debabratabar2008 Před 3 měsíci

    is below correct ?
    df_count = example_df.count() ----> transformation
    example_df.count() ---> job ?

    • @user-dj4ht7rg2f
      @user-dj4ht7rg2f Před měsícem

      No, count() it self is an action. In First line itself it will create Job