08 Working with Strings, Dates and Null
Vložit
- čas přidán 26. 07. 2024
- Video explains - How to use Case When in Spark ? How to manipulate String data in Spark DataFrames? How to cast dates in Spark ? How to extract date portions in Spark ? How to work with NULL data in Spark ?
Chapters
00:00 - Introduction
01:08 - How to use Case When in Spark?
04:30 - String Regex Replace
06:00 - How to convert string to date in Spark?
08:10 - How to add current date or timestamp in Spark ?
10:07 - How to drop NULL records in Spark ?
10:50 - How to transform NULL Columns in Spark ?
12:18 - Fix DataFrame
14:00 - Bonus Tip
Local PySpark Jupyter Lab setup - • 03 Data Lakehouse | Da...
Python Basics - www.learnpython.org/
GitHub URL for code - github.com/subhamkharwal/pysp...
Documentation Spark Functions - spark.apache.org/docs/latest/...
Documentation Date/Timestamp Patterns - spark.apache.org/docs/latest/...
The series provides a step-by-step guide to learning PySpark, a popular open-source distributed computing framework that is used for big data processing.
New video in every 3 days ❤️
#spark #pyspark #python #dataengineering
Wonderful.. I ever seen these kind of teaching.. thankyou bro!! Please add more videos.
Sure, I am working on it now.
great content, Please keep adding more videos, very helpful.
Thanks, will do!
You're a very awesome guy. Your explanation is straightforward to understand. I have a few clarifications. Why do we have to import the libraries for each function? Is there an option to import the main libraries and achieve the same? For example, for the date conversion, you import date_format and the_date. I believe we can use Import *
Hello, Thank you. Please share this with your network over LinkedIn ❤️
And for the second part, yes you can import as per your choice. Only importing required functions make it more neat and optimized.
@easewithdata, definitely I will do that. Keep following this energetic training. You have a very bright future in the IT world.
Good content
Thanks 👍 Please make sure to share with your network 🛜
need to understand one thing why yyyy and dd not in capital letter is there any reason for that
Spark follows the following datetime pattern format (mostly resembles to Unix formats)
spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Can we use na.fill to fill missing values, instead of coalesce?
coalesce is used for condition handling for nulls. na.fill will do the genaric fill for the columns.
Thanks, this cleared my doubt 😀
Bro, what is the purpose of using coalesce here??
It is being used to transform null values. It works sane as nvl in sql. We even have coalesce in SQL.
I know you might be confusing it with partitioning coalesce. But currently its a column transformation to fix null values. Partitioning one is applied on table level.
@@easewithdata Thank you..