Understanding Data Cleaning | Google Data Analytics Certificate
Vložit
- čas přidán 23. 07. 2024
- Data cleaning is essential for successful analysis. If a piece of data is entered into a spreadsheet or database incorrectly, or if data formats are inconsistent, the result is dirty data. Let's go through why and how to clean data.
0:00 Getting Started with Data Cleansing
3:20 Why Data Cleaning is Important
9:05 Identify Dirty Data
14:31 Starting the Data Cleansing Process
20:51 Cleaning Data from Multiple Sources
26:37 Data Cleaning Features
34:43 Optimize the Data Cleaning Process
48:50 Data Perspectives
59:12 Even More Data Cleaning Techniques
This video is part of the Google Data Analytics Certificate which teaches learners how to prepare, process, analyze, share, and act on data.
The program, created by Google employees in the field, is designed to provide you with job-ready skills in about 6 months to start or advance your career in data analytics.
Take the Certificate HERE: goo.gle/3YZJx1Z
Subscribe HERE: bit.ly/SubscribeGCC
#GrowWithGoogle #GoogleCareerCertificate #DataAnalytics
Why earn a Google Career Certificate?
► No experience necessary: Learn job-ready skills, with no college degree required.
► Learn at your own pace: Complete the 100% online courses on your own terms.
► Stand out to employers: Make your resume competitive with a credential from Google.
► A path to in-demand jobs: Connect with top employers who are currently hiring. - Věda a technologie
With clarity, eloquently and in simple language concepts are explained. Much appreciated and anyone interested in data analytics & related disciplines could learn a lot.
* Starting the data cleansing process: 18:57
* Cleaning Data from multiple sources: 25:16, 25:48
* Data cleaning features: Conditional format rules, Remove duplicates, Split, Concatenate
* Optimize the Data Cleaning Process: 35:11, 38:27, 39:36, 40:00, 41:18
(Functions: COUNTIF, LEN, & Conditional formatting, LEFT, RIGHT, MID, TRIM, CONCATENATE,..)
* Data perspectives: Sorting, Filtering, Pivot Table, VLOOKUP, Plotting
* Even more data cleaning techniques: Data mapping, Compatibility, Schema, Primary Key, Foreign Key
Excellent summary of key functions on Spreadsheets for data cleaning!
When you are able to clean you data, your analysis begins and that is where it all starts, data cleaning. cleaning data is critical to a good analysis
So easy to understand. Thank you so much!
Nicely explained. Very clear vocabulary and examples. I've not really used Google Sheets but Excel is the King of tools I use. Some great tips were given, thank you!
I've past through the entire course and I had completed it in period between July and November 2021, this year, and I earned certificate. I would say that the course is really awesome in many aspects, one of which is data cleaning jobs searching!
do you have an update? Did you get a job?
Thank you for your input! Are you currently employed?
I got employed
Thank you so much. I would love to see your videos more. I really needed this well explained content. Thank you. You are the best!
This videos are awesome!!!
Her section is the best of this course.
Interesting stuff. I don't use conditional formatting to find errors that much. I use filters and inspect the dropdown for inspecting anomalies. The dropdown contains unique values anyway.
Done watching. Thanks for freely sharing this Google :)
keep in mind that data cleaning is the hardest and the most time consuming phase in data analysis..
Yep. Finding this out the hard way🤯
This session is eazy n engaging, it's can be more effective if the whole screen in the vedio becomes the computer screen n the instructor holds a small place at the corner...it's solves the problem for those who are using it in mobile , and sometimes the text becomes hazy n hard to read.
Beauty with the brain. Clearly explanation!
Thank you so much!
Easy to understand even for a very beginner like me!
Thank you !!!!
Thank you so much I get my certificate last month!!
This is good!
Thank you, Google.
Thank you
Plz provide this Data Sets where I can access the MULTIPLE Data Sources for DATA CLEANING Portfolio Projects.
how to get this data u are clearing so we can follow ?
Can We Have A Video On Cleaning A Database With Multiple Column Header Like, Subject, Term1 & 2, Unit tests For Yearly Resuult Of a School
Please share practice file..
where are these excel files???? can find them anwhere...
any help is welcome
please provide this dataset
How do we deal with false positives?
Is this included in Google data analytic certificate offered by coursera?
Yep! All of our lessons on CZcams are the same as the lessons offered on Coursera.
We have an Excel file with 10 columns and 500 rows. Last.4 columns have years and sales. There are blanks there but are randomly distributed. We want to delete the rows that have 4 blanks in a row. How would you go on about it?
Hi! Add filters to the columns using the filter button, then filter on blank cells on each of the 4 columns, you should find your empty lines in a second. Hope this helps!
@@MrEveloff how to isolate, select en mass, and delete Only those who have all 4 horizontally cells in a row blank? Cause in that whole range you can have rows which have from 0-4 blank ones randomly distributed.
@@keylanoslokj1806 Select the columns you want to filter, add a filter by clicking the filter button, then on each column select the blank cell to filter on empty cells - the order does not matter as you want to select only rows with all 4 cells empty.
There you have all the empty rows appearing.
-> If you're using Google Sheets: select the whole lines then right click and click "Delete the selected lines" which will delete only the lines you selected.
-> If you're using Excel you will need to select the filtered lines then type Alt+; to select only the filtered lines, then right click and delete the lines.
This is a quick solution, there are other more complex ones but this one seems the most adapted to the situation you described ;)
12:50 The rows #27 and #31 are not duplicates, they have different SO number and different amount of dollars transfered
Needs more upvotes!
If you paid more attention you would've heard that the data she was looking for at that point was "How many users you have", in that sense both accounts for Elaine, regardless of expenditure or any other variable, belong to the same user, therefore they are in fact duplicates.
@@pablovelasquez9806 as long as the column values in those 2 rows are not exactly the same then they are not duplicates. The 2 rows are unique and cannot simply delete because you losing some information !!
@@pablovelasquez9806 As someone with an extremely generic name, just because the same ‘names’ occur in a dataset, doesn’t mean they’re actually the same people. Even if they share the same address! (Could be a son named after a father)
#playlist#view#all
#history#view#all
That a DBA
She's too cute to sound this serious. 😶😅🥰
why is google saying to fix problems manually in a spreadsheet? such a bad practice....... give me the way to do it like a professional using 100k records
That's exactly what I'm saying. What a waste of time. Just do everything manually? Wow thanks for nothing
This course is on CZcams for free?? Why am i paying on coursera then lol
Hi! On our CZcams channel, we offer the videos so you can get a preview of the content before enrolling, but it is only through Coursera that you will receive your certificate for completing the program. The full certificate program on Coursera also includes assessments, readings, and hands-on labs.
All the errors mentioned here can be avoided using a database!
0:50 Who is she?
Im ready for my bath 🛁
is she robot??
Better to say corrupt data.
Why she is not blinking her eyes..
1.30 she did 😂
She is a robot
She did. You didn't notice
Why are you noticing nonsense things.
Looks like u are blinking too much
is the instructor human??
Yeap
A very poor video on data cleaning by Google. Expected way more from them. Using a spreadsheet is just not acceptable.
dude just show, stop talking that much