Kahan Data Solutions
Kahan Data Solutions
  • 205
  • 3 986 071
Managing the "End" of a Data Pipeline
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide
If there is such thing as the "end" of a data pipeline, it's typically a report.
But if you've ever built one of these reports you know that it's never just a simple handoff.
There's almost always some sort of back and forth.
Or future requests for adjustments.
So what I want to talk about in this video is where you decide to actually make those logic changes.
And the secondary impacts of that decision.
The two common schools of thought here are:
1 - Making changes directly inside the report
2 - Keeping changes in the database (transformation layer code)
In this video, I want to make a case for why I personally think you're better off going with option 2.
But whether or not you agree with me...
Hopefully it'll encourage you to consider what's best for your team (& company) going forward.
Enjoy!
Timestamps:
0:00 - Intro
0:44 - Visibility
3:45 - Consistency
5:15 - Control
Title & Tags:
Managing the "End" of a Data Pipeline
#kahandatasolutions #dataengineering #reportingtool
zhlédnutí: 1 617

Video

The Missing Piece in Many Data Pipelines
zhlédnutí 3,6KPřed 21 dnem
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide All data teams (large & small) have at least one thing in common. Source data. But not everyone handles it the same way in their pipelines. For some, they'll reference raw source tables directly in many queries. For others, they'll create ad-hoc custom tables to address s...
How to Pick Tools for a Data Stack (considerations)
zhlédnutí 1,5KPřed měsícem
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Every data team (at some point) needs to make a decision on tooling for their data stack. Nobody gets to skip this step. This is naturally most important at the very beginning. But can be a huge factor when considering a data migration as well. Unfortunately, this decisio...
The Power of Naming in Data Projects (especially dbt)
zhlédnutí 1,2KPřed měsícem
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide For most data teams, one of the more challenging discussions you'll have is that of naming conventions. This is especially true with the popular data transformation tool, dbt. It's something where there's a lot of ego & strongly help opinions involved. But it's also somet...
Getting Buy-In for Data & Analytics Projects (what to focus on)
zhlédnutí 994Před měsícem
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide As data professionals, we like to assume everyone values data & analytics. But unfortunately that's not always the case. Getting buy-in on new data initiatives from your organization or team can sometimes become a huge challenge. Especially when it's only viewed as a huge...
You Don't Need to Learn Every Data Tool & Skill
zhlédnutí 1,5KPřed 2 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Many data engineers feel overwhelmed by the need to know every tool and skill. I compare this to learning a new language-it's not about trying to learn every word or phrase. Instead, the real goal is being FLUENT enough to communicate effectively. This mindset has guided ...
How to Use Jinja w/ dbt Macros (3 Examples)
zhlédnutí 1,8KPřed 2 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide If you're using dbt, at some point you'll need to create a Macro. If you're not familiar with Macros, essentially they are like "functions" in other programming languages. They're used to create re-usable bits of code and make your project much more dynamic. It's truly on...
4 Reasons Why You Should STILL Learn SQL
zhlédnutí 857Před 2 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide SQL isn't new. But I still believe it's the most important skill data professionals (engineers, analysts, etc.) should know. In this video, I'll explain 4 reasons why I believe you absolutely need to still focus on this skill. Despite all of the other headline-grabbing to...
Data Warehouse Security w/ 4 Simple Roles
zhlédnutí 901Před 2 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Although security isn't the most glamorous part of being a data engineer, it's arguably one of the most important. While each team will have their own unique strategy & naming conventions, this week I want to share what's worked well for me. It's a simple approach based a...
How to Create a Data Modeling Pipeline (3 Layer Approach)
zhlédnutí 4,5KPřed 2 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide A data warehouse acts as the main hub for most data teams, yet it often becomes a mess. While there are many different strategies to handle this, in this video I want to share the approach I follow. It's based around a simple 3-layered design to take raw source data into ...
What is Data Mesh?
zhlédnutí 6KPřed 7 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Over the last few years a new buzzword has entered the data world - Data Mesh. Just a simple Google search returns tons of articles on what it is, the key principles & how to use it. Tools are even incorporating this concept directly into the product. Yet despite the hype...
How to Build Incremental Models | dbt tutorial
zhlédnutí 9KPřed 7 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide You don't need to process every record in a table, every time. Fortunately, dbt has a great solution for this scenario with their "incremental" materialization option. When setup properly, they can help you significantly cut costs & processing time. This is because increm...
Modern Data Engineering Workflows, Explained
zhlédnutí 5KPřed 8 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Modern data engineering isn't all about tools & technologies. One area that's often overlooked is the concept of "workflows". In particular, data team workflows for continuously building projects. This includes everything from environments, naming conventions, automation ...
Data Architecture 101: Kappa (Real-Time Data)
zhlédnutí 3,7KPřed 8 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide All things being equal, I think we'd all want access to source data in real time. This is the holy-grail of data engineering and removes most delays to insights. One architecture approach that makes this possible is known as the Kappa Architecture. It focuses on real-time...
How to Refresh Data Models Daily (for free)
zhlédnutí 2,2KPřed 9 měsíci
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide It's one thing to build a data architecture...it's another to keep it updated every day. Sure, there are many great tools out there that handle complex orchestrating, monitoring & scheduling. But sometimes, all you really need is just a simple daily job to keep your data ...
How to Sync PostgreSQL Data w/ Airbyte
zhlédnutí 6KPřed 10 měsíci
How to Sync PostgreSQL Data w/ Airbyte
Data Architecture 101: The Lambda Strategy
zhlédnutí 7KPřed 11 měsíci
Data Architecture 101: The Lambda Strategy
How to Run Queries from the Terminal (dbt Show Command)
zhlédnutí 3,4KPřed rokem
How to Run Queries from the Terminal (dbt Show Command)
Common Data Team Structures (Engineer vs Analyst vs Scientist)
zhlédnutí 4KPřed rokem
Common Data Team Structures (Engineer vs Analyst vs Scientist)
Data Architecture 101: The Modern Data Warehouse
zhlédnutí 22KPřed rokem
Data Architecture 101: The Modern Data Warehouse
Why Data Migrations Go Wrong (3 reasons)
zhlédnutí 3,3KPřed rokem
Why Data Migrations Go Wrong (3 reasons)
What are "intermediate" models in dbt?
zhlédnutí 4,2KPřed rokem
What are "intermediate" models in dbt?
dbt Environments vs Targets | What's the Difference?
zhlédnutí 6KPřed rokem
dbt Environments vs Targets | What's the Difference?
Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)
zhlédnutí 23KPřed rokem
Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)
Data Automation (CI/CD) with a Real Life Example
zhlédnutí 10KPřed rokem
Data Automation (CI/CD) with a Real Life Example
3 Ways to Deploy Data Projects
zhlédnutí 4,7KPřed rokem
3 Ways to Deploy Data Projects
The Importance of Virtual Environments for Data Engineers
zhlédnutí 3KPřed rokem
The Importance of Virtual Environments for Data Engineers
How to Create a Virtual Machine (VM) on Google Cloud Platform (GCP)
zhlédnutí 45KPřed rokem
How to Create a Virtual Machine (VM) on Google Cloud Platform (GCP)
Data Engineer | Employee vs Consultant (4 differences)
zhlédnutí 2,6KPřed rokem
Data Engineer | Employee vs Consultant (4 differences)
Modern vs Traditional Data Stacks (3 differences)
zhlédnutí 6KPřed rokem
Modern vs Traditional Data Stacks (3 differences)

Komentáře

  • @androkublashvili2738
    @androkublashvili2738 Před 16 hodinami

    fatal: not a git repository (or any of the parent directories): .git I am getting this error, why not mention " if you are getting errors, then do this..." ?!!!!!! you should have thought about it !

  • @maxihui2896
    @maxihui2896 Před 2 dny

    Thanks for this video! Generally agree with all the points you mention, but what are your thoughts on allowing stakeholders to self-serve versus implementing controls such that most, if not all, data reporting should always be through the data team?

  • @andreranulfo-dev8607

    OMG! No Cheap chat?! A very straight forward tutorial!

  • @dominhquanho9319
    @dominhquanho9319 Před 8 dny

    So whats the difference between Inkon data warehouse vs traditional relational db?

  • @sergioi92
    @sergioi92 Před 8 dny

    Very good and easy explanation. GJ

  • @SheranneTan-n1p
    @SheranneTan-n1p Před 9 dny

    Great content! I had a question - why would companies choose to use standalone ELT / ETL providers (e.g. Stitch, Matillion) over the native Amazon Glue / Azure data factory? Wouldn’t it be easier to use the cloud provides as it would be more integrated?

  • @senarl
    @senarl Před 10 dny

    Totally agree! At my job we create all the business logic in the Gold Layer, there we can have it commented, keep track of changes, its way easier to debug problems, I've been the whole week checking some columns and updating them with the business team based on the way they want it changed. Data analysts just need to create reports and understand the data to present it to stakeholders/other teams, not that this is an easy task since they are essentially our bridge with the business team

  • @microscorpi0n
    @microscorpi0n Před 10 dny

    This would work as a podcast instead of a video. No visual not examples

  • @montheralkhudairy-977

    If I have a sql server database and I want to connect to it in Metabase and create a dashboard, and I want the dashboard to be live (the visuals refresh automatically when new data comes in the database), similar to Direct Query in Power BI, is that possible in Metabase?

  • @qone2363
    @qone2363 Před 11 dny

    Hey Kahan! There were some changes to get the dbt env setup going.. perhaps it's time for an update? Had to find work arounds.. when I do it, profiles.yml file doesn't exist or cannot be found.. anyways, thanks for making the tutorial

  • @srini580
    @srini580 Před 13 dny

    thanks!

  • @ManuelDeStefano
    @ManuelDeStefano Před 14 dny

    What about the ingestion part? I mean, in which environment do you develop and test the ingestion logic? Or is this just for production?

  • @williamchurch711
    @williamchurch711 Před 15 dny

    The staging layer would be equivalent to a landing zone?

    • @senarl
      @senarl Před 10 dny

      Migh be wrong but I take that the staging layer would be a bronze layer in the Medallion architecture, so we would have landing with raw data, bronze with cleaned raw data, silver with any new columns or any enhancement to the data and Gold with the joins and business logic. But thats just how I use at work and it can be changed to fit your needs

  • @brianwoodruff1927
    @brianwoodruff1927 Před 15 dny

    Okay, I'm not crazy. The audio IS OUT OF SYNC for this particular video and it's making it nearly impossible to watch this video.

  • @SravaniK-ee8td
    @SravaniK-ee8td Před 16 dny

    How to avoid hardcoding of username and password?

  • @TomGrubbe
    @TomGrubbe Před 16 dny

    That's pretty cool. You could theoretically capture the output of a process on Linux on one job, then feed that output to another job that runs-on Windows.

  • @andresarmua
    @andresarmua Před 17 dny

    Nice! I use a staging layer as a view and then 4 more layers for the pipeline until I get to the mart. I usually alternate between views and materialized tables, but I am not quite sure how to know the optimal way to decide between tables and views at each time. How do you compare performance, storage and other practical factors?

  • @Emmanuelz7
    @Emmanuelz7 Před 18 dny

    why can't we Upload a txt file😢 i cant upload my Obfuscated script I don't want it being stolen😢

  • @Crow2525
    @Crow2525 Před 18 dny

    Can you do a follow up on how you might put some of these metrics into a database? I was under the impression that a semantic layer doesnt live in the traditional database and id love your guidance on where you might put it?

  • @morinho96
    @morinho96 Před 19 dny

    The only things that can't be pushed back to the database are the dynamic calculations like ratios that need to be recalculated in real time depending on the user's interaction with the report

  • @klttens
    @klttens Před 20 dny

    actually no more START FREE and DEPLOY OPENSOURCE buttons there

  • @Tony-uv4pd
    @Tony-uv4pd Před 20 dny

    Interestingly, user are always excited to do it themselves and later on outsource to the IT/Dev to do the change because of getting loss into the report transformation. 😂 and in database give you track of change with code repository.

  • @issameddaou1493
    @issameddaou1493 Před 20 dny

    Great insights & very informative ! I have a question : so in my dbt project, i got the staging folder populated with functional models (transformations and aggregations in fuction of source data in the yml file) and modelized with entity relationship diagram, so what is the utility of marts layer essentially ? Can I just add constraints/relationships… on this folder for specific models ?

  • @Thakurravipundir
    @Thakurravipundir Před 21 dnem

    Thanks a lot Bro 👊🏻 really appreciate ❤

  • @KahanDataSolutions
    @KahanDataSolutions Před 21 dnem

    ►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) -> www.kahandatasolutions.com/guide

  • @creative.creation
    @creative.creation Před 23 dny

    no configuration file found while cloning from github

  • @johnpower1458
    @johnpower1458 Před 23 dny

    Do you truncate the data each batch pipeline run on staging and capture the cleaned data in snapshots? If not, how do you avoid duplicates down stream if you’re using say SCD Type 2?

  • @darkerdelirium
    @darkerdelirium Před 24 dny

    fking love you dude, been racking my brains on this for the last couple of hours!!

  • @StephenRayner
    @StephenRayner Před 24 dny

    Brilliant

  • @StephenRayner
    @StephenRayner Před 24 dny

    Please make videos on “Meltano” and how to use dbt within this? 🎉

  • @mastanraomuppaneni6716

    Can you provide one video on snowflake azure oath integration for API access

  • @gauravsati1041
    @gauravsati1041 Před 26 dny

    Hi , How we can set DBT_VALID_TO default Null values to something other max number

  • @bertjanvdberg
    @bertjanvdberg Před 27 dny

    Nice! Question: Do you also use views in your warehouse and mart layers? I've been at companies where the marts were basically views based on views based on views times 10 which was terrible for the performance of getting the data.

    • @ramtadam1469
      @ramtadam1469 Před 27 dny

      We always use tables as marts and then sometimes on top build views that do things with the materialized marts data.

  • @thedavidabides
    @thedavidabides Před 27 dny

    Nice work! Where should the staging layer come when using a bronze, silver, gold medallion structure ?

    • @muhammadbadar6089
      @muhammadbadar6089 Před 27 dny

      from my understanding you would use your bronze layer as a staging layer pulling from all source systems

    • @personalbranddata
      @personalbranddata Před 27 dny

      It's the silver layer. Bronze = raw data in this video. Silver = "staging"/cleaned data in this video. Gold = Warehouse in this video. I don't like that he's using the term "staging" to refer to cleaned data because in traditional data warehousing a staging table typically refers to uncleaned data straight after you've loaded it from a source system and the cleaning happens later.

    • @ArmandsPutnis
      @ArmandsPutnis Před 27 dny

      it does not really matter how you call them if you have agreed on the purpose. Bronze layer can be raw_source or it can be staging. personally i like to keep the source out of the way and use bronze for staging - cleaning/transforming. silver for joining multiple bronze tables, what i know can be reused for multiple use cases in a gold layer. gold layer for the final solution/consumption joining some silver and bronze tables.

    • @gatorpika
      @gatorpika Před 23 dny

      @@ArmandsPutnis yeah, this. Bronze, silver and gold is an abstraction to help you think about your structure, not something with set rules you have to follow dogmatically. Figure out what layers you need to solve your problems and then just structure your layers appropriately. Staging serves a purpose to help you shift the transforms left so changes are easier down the road given they will propagate through all your downstream transforms. Then transform on top of that assuming the stage takes care of most of the cleaning/formatting for you. If your management makes you pick a metal, I suggest the titanium layer.

  • @Milhouse77BS
    @Milhouse77BS Před 27 dny

    Stage All the Things

  • @KahanDataSolutions
    @KahanDataSolutions Před 27 dny

    ►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide

  • @MohamedMontaser91
    @MohamedMontaser91 Před 29 dny

    i don't understand why dbt created a view and not a table?

  • @MrZackmedia
    @MrZackmedia Před měsícem

    watching this videos as a BE dev, what a waste of my time. uselessness waste of my time

  • @johnflanagan6367
    @johnflanagan6367 Před měsícem

    I just discovered your videos. They are excellent. Clear, concise and to the point. Great content! Thanks so much!

  • @ligiaimusic
    @ligiaimusic Před měsícem

    Thank you so much for this video! Really helpful!

  • @Dabunni6398
    @Dabunni6398 Před měsícem

    Thinking of pet projects, I've got a lot of workout data in google sheets i'd like to automate loading into a database (I've manually inserted data into db before so trying to broaden more than that). What would you use/learn to automate loading data. Does airbyte -> dbt -> postgress make sense?

  • @LoganMajor-bd5ez
    @LoganMajor-bd5ez Před měsícem

    so there are absolutely no industry standards what so ever, and we should just make up our own internal rules. I agree having internal rules is the most important, but teams aren't closed ecosystems, it's preferable to talk about industry standards. action words should be all capitalized. most of these have preferred implementations.

  • @Blahnik1182
    @Blahnik1182 Před měsícem

    The content is fantastic, but your editing needs work. It's really hard to listen and watch when the video looks and sounds so glitchy.