205
3 986 071

The Missing Piece in Many Data Pipelines

9:55

How to Pick Tools for a Data Stack (considerations)

10:45

The Power of Naming in Data Projects (especially dbt)

8:58

Getting Buy-In for Data & Analytics Projects (what to focus on)

12:00

You Don't Need to Learn Every Data Tool & Skill

6:46

How to Use Jinja w/ dbt Macros (3 Examples)

11:34

Managing the "End" of a Data Pipeline

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide
If there is such thing as the "end" of a data pipeline, it's typically a report.
But if you've ever built one of these reports you know that it's never just a simple handoff.
There's almost always some sort of back and forth.
Or future requests for adjustments.
So what I want to talk about in this video is where you decide to actually make those logic changes.
And the secondary impacts of that decision.
The two common schools of thought here are:
1 - Making changes directly inside the report
2 - Keeping changes in the database (transformation layer code)
In this video, I want to make a case for why I personally think you're better off going with option 2.
But whether or not you agree with me...
Hopefully it'll encourage you to consider what's best for your team (& company) going forward.
Enjoy!
Timestamps:
0:00 - Intro
0:44 - Visibility
3:45 - Consistency
5:15 - Control
Title & Tags:
Managing the "End" of a Data Pipeline
#kahandatasolutions #dataengineering #reportingtool

zhlédnutí: 1 617

Video

The Missing Piece in Many Data Pipelines

9:55

The Missing Piece in Many Data Pipelines

zhlédnutí 3,6KPřed 21 dnem

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide All data teams (large & small) have at least one thing in common. Source data. But not everyone handles it the same way in their pipelines. For some, they'll reference raw source tables directly in many queries. For others, they'll create ad-hoc custom tables to address s...

How to Pick Tools for a Data Stack (considerations)

10:45

How to Pick Tools for a Data Stack (considerations)

zhlédnutí 1,5KPřed měsícem

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Every data team (at some point) needs to make a decision on tooling for their data stack. Nobody gets to skip this step. This is naturally most important at the very beginning. But can be a huge factor when considering a data migration as well. Unfortunately, this decisio...

The Power of Naming in Data Projects (especially dbt)

8:58

The Power of Naming in Data Projects (especially dbt)

zhlédnutí 1,2KPřed měsícem

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide For most data teams, one of the more challenging discussions you'll have is that of naming conventions. This is especially true with the popular data transformation tool, dbt. It's something where there's a lot of ego & strongly help opinions involved. But it's also somet...

Getting Buy-In for Data & Analytics Projects (what to focus on)

12:00

Getting Buy-In for Data & Analytics Projects (what to focus on)

zhlédnutí 994Před měsícem

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide As data professionals, we like to assume everyone values data & analytics. But unfortunately that's not always the case. Getting buy-in on new data initiatives from your organization or team can sometimes become a huge challenge. Especially when it's only viewed as a huge...

You Don't Need to Learn Every Data Tool & Skill

6:46

You Don't Need to Learn Every Data Tool & Skill

zhlédnutí 1,5KPřed 2 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Many data engineers feel overwhelmed by the need to know every tool and skill. I compare this to learning a new language-it's not about trying to learn every word or phrase. Instead, the real goal is being FLUENT enough to communicate effectively. This mindset has guided ...

How to Use Jinja w/ dbt Macros (3 Examples)

11:34

How to Use Jinja w/ dbt Macros (3 Examples)

zhlédnutí 1,8KPřed 2 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide If you're using dbt, at some point you'll need to create a Macro. If you're not familiar with Macros, essentially they are like "functions" in other programming languages. They're used to create re-usable bits of code and make your project much more dynamic. It's truly on...

4 Reasons Why You Should STILL Learn SQL

4:23

4 Reasons Why You Should STILL Learn SQL

zhlédnutí 857Před 2 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide SQL isn't new. But I still believe it's the most important skill data professionals (engineers, analysts, etc.) should know. In this video, I'll explain 4 reasons why I believe you absolutely need to still focus on this skill. Despite all of the other headline-grabbing to...

Data Warehouse Security w/ 4 Simple Roles

11:49

Data Warehouse Security w/ 4 Simple Roles

zhlédnutí 901Před 2 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Although security isn't the most glamorous part of being a data engineer, it's arguably one of the most important. While each team will have their own unique strategy & naming conventions, this week I want to share what's worked well for me. It's a simple approach based a...

How to Create a Data Modeling Pipeline (3 Layer Approach)

9:41

How to Create a Data Modeling Pipeline (3 Layer Approach)

zhlédnutí 4,5KPřed 2 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide A data warehouse acts as the main hub for most data teams, yet it often becomes a mess. While there are many different strategies to handle this, in this video I want to share the approach I follow. It's based around a simple 3-layered design to take raw source data into ...

4:28

What is Data Mesh?

zhlédnutí 6KPřed 7 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Over the last few years a new buzzword has entered the data world - Data Mesh. Just a simple Google search returns tons of articles on what it is, the key principles & how to use it. Tools are even incorporating this concept directly into the product. Yet despite the hype...

How to Build Incremental Models | dbt tutorial

10:51

How to Build Incremental Models | dbt tutorial

zhlédnutí 9KPřed 7 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide You don't need to process every record in a table, every time. Fortunately, dbt has a great solution for this scenario with their "incremental" materialization option. When setup properly, they can help you significantly cut costs & processing time. This is because increm...

Modern Data Engineering Workflows, Explained

6:38

Modern Data Engineering Workflows, Explained

zhlédnutí 5KPřed 8 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide Modern data engineering isn't all about tools & technologies. One area that's often overlooked is the concept of "workflows". In particular, data team workflows for continuously building projects. This includes everything from environments, naming conventions, automation ...

Data Architecture 101: Kappa (Real-Time Data)

4:36

Data Architecture 101: Kappa (Real-Time Data)

zhlédnutí 3,7KPřed 8 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide All things being equal, I think we'd all want access to source data in real time. This is the holy-grail of data engineering and removes most delays to insights. One architecture approach that makes this possible is known as the Kappa Architecture. It focuses on real-time...

How to Refresh Data Models Daily (for free)

6:51

How to Refresh Data Models Daily (for free)

zhlédnutí 2,2KPřed 9 měsíci

►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide It's one thing to build a data architecture...it's another to keep it updated every day. Sure, there are many great tools out there that handle complex orchestrating, monitoring & scheduling. But sometimes, all you really need is just a simple daily job to keep your data ...

11:00

How to Sync PostgreSQL Data w/ Airbyte

zhlédnutí 6KPřed 10 měsíci

How to Sync PostgreSQL Data w/ Airbyte

Data Architecture 101: The Lambda Strategy

4:57

Data Architecture 101: The Lambda Strategy

zhlédnutí 7KPřed 11 měsíci

Data Architecture 101: The Lambda Strategy

How to Run Queries from the Terminal (dbt Show Command)

4:36

How to Run Queries from the Terminal (dbt Show Command)

zhlédnutí 3,4KPřed rokem

How to Run Queries from the Terminal (dbt Show Command)

Common Data Team Structures (Engineer vs Analyst vs Scientist)

5:17

Common Data Team Structures (Engineer vs Analyst vs Scientist)

zhlédnutí 4KPřed rokem

Common Data Team Structures (Engineer vs Analyst vs Scientist)

Data Architecture 101: The Modern Data Warehouse

5:48

Data Architecture 101: The Modern Data Warehouse

zhlédnutí 22KPřed rokem

Data Architecture 101: The Modern Data Warehouse

Why Data Migrations Go Wrong (3 reasons)

4:42

Why Data Migrations Go Wrong (3 reasons)

zhlédnutí 3,3KPřed rokem

Why Data Migrations Go Wrong (3 reasons)

8:27

What are "intermediate" models in dbt?

zhlédnutí 4,2KPřed rokem

What are "intermediate" models in dbt?

dbt Environments vs Targets | What's the Difference?

10:21

dbt Environments vs Targets | What's the Difference?

zhlédnutí 6KPřed rokem

dbt Environments vs Targets | What's the Difference?

Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)

3:51

Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)

zhlédnutí 23KPřed rokem

Comparing 3 Types of Data Modeling (Normalized vs Star Schema vs Data Vault)

Data Automation (CI/CD) with a Real Life Example

5:23

Data Automation (CI/CD) with a Real Life Example

zhlédnutí 10KPřed rokem

Data Automation (CI/CD) with a Real Life Example

3:29

3 Ways to Deploy Data Projects

zhlédnutí 4,7KPřed rokem

3 Ways to Deploy Data Projects

The Importance of Virtual Environments for Data Engineers

7:56

The Importance of Virtual Environments for Data Engineers

zhlédnutí 3KPřed rokem

The Importance of Virtual Environments for Data Engineers

How to Create a Virtual Machine (VM) on Google Cloud Platform (GCP)

10:34

How to Create a Virtual Machine (VM) on Google Cloud Platform (GCP)

zhlédnutí 45KPřed rokem

How to Create a Virtual Machine (VM) on Google Cloud Platform (GCP)

Data Engineer | Employee vs Consultant (4 differences)

7:10

Data Engineer | Employee vs Consultant (4 differences)

zhlédnutí 2,6KPřed rokem

Data Engineer | Employee vs Consultant (4 differences)

Modern vs Traditional Data Stacks (3 differences)

4:38

Modern vs Traditional Data Stacks (3 differences)

zhlédnutí 6KPřed rokem

Modern vs Traditional Data Stacks (3 differences)

Komentáře

@androkublashvili2738 Před 16 hodinami
fatal: not a git repository (or any of the parent directories): .git I am getting this error, why not mention " if you are getting errors, then do this..." ?!!!!!! you should have thought about it !
@maxihui2896 Před 2 dny
Thanks for this video! Generally agree with all the points you mention, but what are your thoughts on allowing stakeholders to self-serve versus implementing controls such that most, if not all, data reporting should always be through the data team?
@andreranulfo-dev8607 Před 3 dny
OMG! No Cheap chat?! A very straight forward tutorial!
@dominhquanho9319 Před 8 dny
So whats the difference between Inkon data warehouse vs traditional relational db?
@sergioi92 Před 8 dny
Very good and easy explanation. GJ
@SheranneTan-n1p Před 9 dny
Great content! I had a question - why would companies choose to use standalone ELT / ETL providers (e.g. Stitch, Matillion) over the native Amazon Glue / Azure data factory? Wouldn’t it be easier to use the cloud provides as it would be more integrated?
@senarl Před 10 dny
Totally agree! At my job we create all the business logic in the Gold Layer, there we can have it commented, keep track of changes, its way easier to debug problems, I've been the whole week checking some columns and updating them with the business team based on the way they want it changed. Data analysts just need to create reports and understand the data to present it to stakeholders/other teams, not that this is an easy task since they are essentially our bridge with the business team
@microscorpi0n Před 10 dny
This would work as a podcast instead of a video. No visual not examples
@montheralkhudairy-977 Před 10 dny
If I have a sql server database and I want to connect to it in Metabase and create a dashboard, and I want the dashboard to be live (the visuals refresh automatically when new data comes in the database), similar to Direct Query in Power BI, is that possible in Metabase?
@qone2363 Před 11 dny
Hey Kahan! There were some changes to get the dbt env setup going.. perhaps it's time for an update? Had to find work arounds.. when I do it, profiles.yml file doesn't exist or cannot be found.. anyways, thanks for making the tutorial
@srini580 Před 13 dny
thanks!
@ManuelDeStefano Před 14 dny
What about the ingestion part? I mean, in which environment do you develop and test the ingestion logic? Or is this just for production?
@williamchurch711 Před 15 dny
The staging layer would be equivalent to a landing zone?
@senarl Před 10 dny
Migh be wrong but I take that the staging layer would be a bronze layer in the Medallion architecture, so we would have landing with raw data, bronze with cleaned raw data, silver with any new columns or any enhancement to the data and Gold with the joins and business logic. But thats just how I use at work and it can be changed to fit your needs
@brianwoodruff1927 Před 15 dny
Okay, I'm not crazy. The audio IS OUT OF SYNC for this particular video and it's making it nearly impossible to watch this video.
@SravaniK-ee8td Před 16 dny
How to avoid hardcoding of username and password?
@TomGrubbe Před 16 dny
That's pretty cool. You could theoretically capture the output of a process on Linux on one job, then feed that output to another job that runs-on Windows.
@andresarmua Před 17 dny
Nice! I use a staging layer as a view and then 4 more layers for the pipeline until I get to the mart. I usually alternate between views and materialized tables, but I am not quite sure how to know the optimal way to decide between tables and views at each time. How do you compare performance, storage and other practical factors?
@Emmanuelz7 Před 18 dny
why can't we Upload a txt file😢 i cant upload my Obfuscated script I don't want it being stolen😢
@Crow2525 Před 18 dny
Can you do a follow up on how you might put some of these metrics into a database? I was under the impression that a semantic layer doesnt live in the traditional database and id love your guidance on where you might put it?
@morinho96 Před 19 dny
The only things that can't be pushed back to the database are the dynamic calculations like ratios that need to be recalculated in real time depending on the user's interaction with the report
@klttens Před 20 dny
actually no more START FREE and DEPLOY OPENSOURCE buttons there
@Tony-uv4pd Před 20 dny
Interestingly, user are always excited to do it themselves and later on outsource to the IT/Dev to do the change because of getting loss into the report transformation. 😂 and in database give you track of change with code repository.
@issameddaou1493 Před 20 dny
Great insights & very informative ! I have a question : so in my dbt project, i got the staging folder populated with functional models (transformations and aggregations in fuction of source data in the yml file) and modelized with entity relationship diagram, so what is the utility of marts layer essentially ? Can I just add constraints/relationships… on this folder for specific models ?
@Thakurravipundir Před 21 dnem
Thanks a lot Bro 👊🏻 really appreciate ❤
@KahanDataSolutions Před 21 dnem
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) -> www.kahandatasolutions.com/guide
@creative.creation Před 23 dny
no configuration file found while cloning from github
@johnpower1458 Před 23 dny
Do you truncate the data each batch pipeline run on staging and capture the cleaned data in snapshots? If not, how do you avoid duplicates down stream if you’re using say SCD Type 2?
@darkerdelirium Před 24 dny
fking love you dude, been racking my brains on this for the last couple of hours!!
@StephenRayner Před 24 dny
Brilliant
@StephenRayner Před 24 dny
Please make videos on “Meltano” and how to use dbt within this? 🎉
@mastanraomuppaneni6716 Před 24 dny
Can you provide one video on snowflake azure oath integration for API access
@gauravsati1041 Před 26 dny
Hi , How we can set DBT_VALID_TO default Null values to something other max number
@bertjanvdberg Před 27 dny
Nice! Question: Do you also use views in your warehouse and mart layers? I've been at companies where the marts were basically views based on views based on views times 10 which was terrible for the performance of getting the data.
@ramtadam1469 Před 27 dny
We always use tables as marts and then sometimes on top build views that do things with the materialized marts data.
@thedavidabides Před 27 dny
Nice work! Where should the staging layer come when using a bronze, silver, gold medallion structure ?
@muhammadbadar6089 Před 27 dny
from my understanding you would use your bronze layer as a staging layer pulling from all source systems
@personalbranddata Před 27 dny
It's the silver layer. Bronze = raw data in this video. Silver = "staging"/cleaned data in this video. Gold = Warehouse in this video. I don't like that he's using the term "staging" to refer to cleaned data because in traditional data warehousing a staging table typically refers to uncleaned data straight after you've loaded it from a source system and the cleaning happens later.
@ArmandsPutnis Před 27 dny
it does not really matter how you call them if you have agreed on the purpose. Bronze layer can be raw_source or it can be staging. personally i like to keep the source out of the way and use bronze for staging - cleaning/transforming. silver for joining multiple bronze tables, what i know can be reused for multiple use cases in a gold layer. gold layer for the final solution/consumption joining some silver and bronze tables.
@gatorpika Před 23 dny
@@ArmandsPutnis yeah, this. Bronze, silver and gold is an abstraction to help you think about your structure, not something with set rules you have to follow dogmatically. Figure out what layers you need to solve your problems and then just structure your layers appropriately. Staging serves a purpose to help you shift the transforms left so changes are easier down the road given they will propagate through all your downstream transforms. Then transform on top of that assuming the stage takes care of most of the cleaning/formatting for you. If your management makes you pick a metal, I suggest the titanium layer.
@Milhouse77BS Před 27 dny
Stage All the Things
@KahanDataSolutions Před 27 dny
►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide
@MohamedMontaser91 Před 29 dny
i don't understand why dbt created a view and not a table?
@MrZackmedia Před měsícem
watching this videos as a BE dev, what a waste of my time. uselessness waste of my time
@johnflanagan6367 Před měsícem
I just discovered your videos. They are excellent. Clear, concise and to the point. Great content! Thanks so much!
@KahanDataSolutions Před 27 dny
Glad you like them!
@ligiaimusic Před měsícem
Thank you so much for this video! Really helpful!
@KahanDataSolutions Před 27 dny
Glad it was helpful!
@Dabunni6398 Před měsícem
Thinking of pet projects, I've got a lot of workout data in google sheets i'd like to automate loading into a database (I've manually inserted data into db before so trying to broaden more than that). What would you use/learn to automate loading data. Does airbyte -> dbt -> postgress make sense?
@LoganMajor-bd5ez Před měsícem
so there are absolutely no industry standards what so ever, and we should just make up our own internal rules. I agree having internal rules is the most important, but teams aren't closed ecosystems, it's preferable to talk about industry standards. action words should be all capitalized. most of these have preferred implementations.
@Blahnik1182 Před měsícem
The content is fantastic, but your editing needs work. It's really hard to listen and watch when the video looks and sounds so glitchy.

Kahan Data Solutions

Komentáře