Automated data profiling and quality scan via Dataplex

Sdílet
Vložit
  • čas přidán 25. 07. 2024
  • Data quality is a critical concern within a complex data environment, particularly when dealing with a substantial volume of data distributed across multiple locations. To systematically identify and visualise potential issues, establish periodic scans, and notify the relevant teams at an organisational level on a significant scale, where should one begin?
    This is precisely where the automated data profiling and data quality scanning capabilities of Dataplex on Google Cloud can prove invaluable. Requiring no infrastructure setup and offering a straightforward method for defining and implementing rules for data profiling and quality checks, it could serve as an excellent foundation for your large-scale data quality framework.
    01:16 - Data Profiling vs Data Quality Scan
    02:37 - Dataplex auto profiling
    08:15 - Dataplex auto data quality scan
    10:47 - Profiling hinted quality rules & YAML via CLI
    18:36 - Other options to create scans
    21:08 - Sensitive data considerations
    22:02 - Summary
    Slide: drive.google.com/file/d/13khs...
    Repo: github.com/rocketechgroup/dat...
  • Věda a technologie

Komentáře • 8

  • @user-bl6kx6ld4y
    @user-bl6kx6ld4y Před 8 měsíci +1

    Great video!!

  • @rubelahmed-je6bo
    @rubelahmed-je6bo Před 8 měsíci +1

    Great video, will he good if you do a basic tutorial on how to set up a catalog start to finish

  • @DExpertz
    @DExpertz Před 2 měsíci +1

    I appreciate this video Sir, 😍 (Subscribed and liked) will share too with my team.

    • @practicalgcp2780
      @practicalgcp2780  Před 2 měsíci +1

      Thanks so much for you support ❤

    • @DExpertz
      @DExpertz Před 2 měsíci

      @@practicalgcp2780 Of course man, thank you for sharing this informations in a simpler way

  • @QuynhNguyen-zy2rs
    @QuynhNguyen-zy2rs Před 3 měsíci

    Hi, After you have created data profile scan and data quality scan, is the insights tab displayed? I don't see the insights tab in your video. Please explain to me! Thanks!

  • @yogeshsahu4943
    @yogeshsahu4943 Před 7 měsíci +1

    great video can you make 1 video for data lineage api where dataplex can't be enabled directly and lineage api data can be used manually to reflect lineage on dataplex

    • @practicalgcp2780
      @practicalgcp2780  Před 6 měsíci

      thanks for the comment, can you clarify what do you mean by dataplex cannot be enabled directly? I've not used the lineage API yet but my understanding of how it works is the lineage would be automatically generated as long as you enable the data lineage API, and BigQuery does it via SQL parsing through audit logs. I do believe there is an option if you want to add your own lineage via the API for the ones are outside of the context of BigQuery, are you referring to that one. I've not tried it yet as there hasn't been an use case I need it.