count is a tricky action. Most Data Engineers actually get confused with this. Ideally, count() is an action and should create a brand new JOB but Apache spark is a very smart computing engine and it uses its source and predicate pushdown and purning, if source stores the value of count() in their meta data then it will directly fetch the value of count() instead of creating a brand new JOB.
I have already created one. Please check the channel. There is no prerequisite for this 3-hour long video and project. You just need to know the basics of PySpark. Please check the link. czcams.com/video/BlWS4foN9cY/video.htmlsi=qL0ZSXBELEEKe2L2
Great explaination
What if count function we used along with some variable and transformation?
count is a tricky action. Most Data Engineers actually get confused with this. Ideally, count() is an action and should create a brand new JOB but Apache spark is a very smart computing engine and it uses its source and predicate pushdown and purning, if source stores the value of count() in their meta data then it will directly fetch the value of count() instead of creating a brand new JOB.
@@TheBigDataShow Great, Thanks for answering ...do we have some other examples as well? or the resources from where i can get these concepts?
Can you make end to end data engineering projects?
I have already created one. Please check the channel. There is no prerequisite for this 3-hour long video and project. You just need to know the basics of PySpark. Please check the link.
czcams.com/video/BlWS4foN9cY/video.htmlsi=qL0ZSXBELEEKe2L2
@@TheBigDataShow great, thanks!
is below correct ?
df_count = example_df.count() ----> transformation
example_df.count() ---> job ?
No, count() it self is an action. In First line itself it will create Job