Automated data profiling and quality scan via Dataplex
Vložit
- čas přidán 25. 07. 2024
- Data quality is a critical concern within a complex data environment, particularly when dealing with a substantial volume of data distributed across multiple locations. To systematically identify and visualise potential issues, establish periodic scans, and notify the relevant teams at an organisational level on a significant scale, where should one begin?
This is precisely where the automated data profiling and data quality scanning capabilities of Dataplex on Google Cloud can prove invaluable. Requiring no infrastructure setup and offering a straightforward method for defining and implementing rules for data profiling and quality checks, it could serve as an excellent foundation for your large-scale data quality framework.
01:16 - Data Profiling vs Data Quality Scan
02:37 - Dataplex auto profiling
08:15 - Dataplex auto data quality scan
10:47 - Profiling hinted quality rules & YAML via CLI
18:36 - Other options to create scans
21:08 - Sensitive data considerations
22:02 - Summary
Slide: drive.google.com/file/d/13khs...
Repo: github.com/rocketechgroup/dat... - Věda a technologie
Great video!!
Great video, will he good if you do a basic tutorial on how to set up a catalog start to finish
I appreciate this video Sir, 😍 (Subscribed and liked) will share too with my team.
Thanks so much for you support ❤
@@practicalgcp2780 Of course man, thank you for sharing this informations in a simpler way
Hi, After you have created data profile scan and data quality scan, is the insights tab displayed? I don't see the insights tab in your video. Please explain to me! Thanks!
great video can you make 1 video for data lineage api where dataplex can't be enabled directly and lineage api data can be used manually to reflect lineage on dataplex
thanks for the comment, can you clarify what do you mean by dataplex cannot be enabled directly? I've not used the lineage API yet but my understanding of how it works is the lineage would be automatically generated as long as you enable the data lineage API, and BigQuery does it via SQL parsing through audit logs. I do believe there is an option if you want to add your own lineage via the API for the ones are outside of the context of BigQuery, are you referring to that one. I've not tried it yet as there hasn't been an use case I need it.