What Is A Data Catalog And Why Do People Use Them?

Sdílet
Vložit
  • čas přidán 20. 10. 2022
  • Special Thanks To Atlan For Partnering With Me On This Video. Learn more about them here: bit.ly/3VMCCXV
    What is a data catalog?
    iData was Facebook’s data discoverability tool. It provided a lot of functionality that I have started to miss. This included the baseline functions you would expect including the ability to find tables, trace lineage, and track down owners of said tables.
    But there were also other beneficial features like cost tracking, data quality assessments, and table certification. All of these features made it easy for a new data engineer to quickly orient themselves as they started on new projects.
    My Favorite iData Feature
    My favorite features involved being able to see how other users were using the data on a query level. This provided a lot more context than just commented fields. ERDs and data lineage are all great. But seeing exactly how other users were using the data made it easy to understand(also they were great people to ping if you had questions).
    It was so easy to quickly understand how the data was already being used. This provided several benefits including:
    Reducing the duplication of work
    Providing context on how data could join together(even across multiple data sources)
    It would let you know who to ask questions about the data. Sure, the owner is one great place to start, but sometimes owners, over time, move away from datasets
    Upon leaving the company formerly known as Facebook I felt like I kept stumbling on a new data catalog or discoverability tool every week. At this point, I am sure I have come across at least 3-5 dozen data discovery tools all of which add their own flair to helping teams manage their metadata.
    If you enjoyed this video, check out some of my other top videos.
    Top Courses To Become A Data Engineer In 2022
    • Top Courses To Become ...
    What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
    • What Is The Modern Dat...
    If you would like to learn more about data engineering, then check out Googles GCP certificate
    bit.ly/3NQVn7V
    If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
    seattledataguy.substack.com/​​
    Or check out my blog
    www.theseattledataguy.com/
    And if you want to support the channel, then you can become a paid member of my newsletter
    seattledataguy.substack.com/s...
    Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
    _____________________________________________________________
    Subscribe: / @seattledataguy
    _____________________________________________________________
    About me:
    I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
    *I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
  • Zábava

Komentáře • 24

  • @dumisaralane
    @dumisaralane Před rokem +4

    Awesome video - thanks.
    I have started our organisation's data catalog. We are using Microsoft Purview.
    One thing I have already realised is that it takes time to document your enterprise data in a data catalog, you have to be patient and perhaps take it one business domain at a time depending on you organisation size.
    Happy cataloging everyone!

    • @SeattleDataGuy
      @SeattleDataGuy  Před rokem +1

      Yeah, modern ones try to automate the process but someone always has to put in the metadata

  • @juliustuckayo8973
    @juliustuckayo8973 Před rokem +3

    Another great nugget Ben.

  • @advaitchabukswar4163
    @advaitchabukswar4163 Před 6 měsíci +2

    Really great videos. Learning a lot.

  • @ArtemioP
    @ArtemioP Před rokem +3

    What are your thoughts on Open Metadata? It's a interesting one to me because of recent automatic Spark Lineage (Spline) integration.

  • @ToToDarKDu59
    @ToToDarKDu59 Před rokem +3

    Interesting video Ben, thanks ! I'm curious on what is your vision on how to do the change management with the business part of the company (subject for another video ?)

  • @lucashoww
    @lucashoww Před rokem +2

    LOVE THY DATA!

  • @Neferfifi21
    @Neferfifi21 Před rokem +8

    Hey Ben, thanks for this video. I was wondering if you have good data management book recommandations?

  • @rafaaferid1789
    @rafaaferid1789 Před 2 měsíci

    I have a question 🙋
    Would it be helpful to implement data catalog for application data? (Not analytics data)?

  • @kopiking352
    @kopiking352 Před rokem +1

    iData is open source or just Facebook proprietary? if not, any data catalog open source to recommend?

    • @SeattleDataGuy
      @SeattleDataGuy  Před rokem

      It is not open source, most people will use datahub for the opensource side of data catalogs

  • @christinahumtsoe1262
    @christinahumtsoe1262 Před 9 měsíci

    What about Microsoft Purview?

  • @picious
    @picious Před rokem +2

    is MS Purview a tool for Data Catalog?

    • @JLRocco43
      @JLRocco43 Před rokem +2

      purview works very similar to informatica EDC where it "scans" locations and provides a data lineage in the end--so, its in that realm

    • @dumisaralane
      @dumisaralane Před rokem +2

      Yes. We use it in our organisation.

    • @SeattleDataGuy
      @SeattleDataGuy  Před rokem

      Yeah some people do use it for DC

  • @Buhlebendalo_Mavika
    @Buhlebendalo_Mavika Před rokem

    If I could see an actual catalogue it will help. Can anybody help?

  • @Dave-nz5jf
    @Dave-nz5jf Před 10 měsíci +1

    Ugggg nothing is more impotent than a data catalog. Data engineers hate it because it's not needed for replication / DE transformation , and business hates it because it puts governance around what they're trying to do . And it's in the nature of analysts to hate all kinds of governance / enforcement. Yuck.