Understand Slowly Changing Dimensions

Bryan Cafferky

zhlédnutí 20 762

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 5. 09. 2024

Komentáře • 58

@sheldonwayne1390 Před 2 lety ⁺³
I'm from China and it's frustrating that you can't find any video doing a good job explaining knowledges about DWH, glad I found this on CZcams. Cheers!
@italomacelloneable Před 3 lety ⁺⁵
Excellent content and very well explained. Thank you
@cluebcke Před 3 lety ⁺⁶
This is really awesome, thank you so much. Just to add, one use case for Type 5 (which I had no idea was called Type 5 before watching this) is that if you have a lot of data in a record that doesn't change much, and only a few fields that change a lot, partitioning them into different dimensions can substantially reduce table bloat (as you're not creating many copies of unchanged data) and improve query performance. There are tradeoffs of course and you really have to evaluate (and test empirically) for your use case.
@BryanCafferky Před 3 lety ⁺²
Thanks. Great point too.
@sandrafield9813 Před 2 lety ⁺¹
Thanks for this description! Its better than all the other ones I found. Its more clear with extra details without being convoluted and messy.
@MuhammadMustafa-rd6vg Před 2 lety
this is a good tutorial , watched dimensional modelling & SCD tutorial, you were pretty clear , explained in detail without making it boring !
@thoroth_ Před měsícem
This is the best explanation of this concept that I’ve seen, thank you so much
@BryanCafferky Před měsícem
You're welcome!
@bestherk4522 Před 3 lety ⁺¹
Good video, Clear Explanations. Thank you.
@BryanCafferky Před 3 lety
YW and thanks for watching!
@maisamabbas6872 Před 6 měsíci
This is the best explaination for slowly changing dimensions that I've come across on CZcams. You definitely deserve my like and subscribe. Thanks for such an amazing video. I hope you make a lot more videos on Data Engineering topics.
@BryanCafferky Před 6 měsíci
Thanks
@houstonfirefox Před rokem
Great explanation Bryan w/real-world examples and analogies. New sub!
@Pasdpawn Před rokem
man I spent a fortune to get an IT degree and here i am , learning it from a youtube tutorial which is far better and easier to understand than my database lectures. thanks a lot sir for your effort.
@BryanCafferky Před rokem
YW. So glad it was helpful.
@sauravmisra6061 Před 2 lety
Thank you sir for explaining the code EDW concepts in both the videos - Dimensional modeling and SCD, so well and precisely. I also followed your Databricks and Spark and these are the best I have found so far on these topics. Thank you , sir and really appreciate your effort.
@BryanCafferky Před 2 lety
Glad it is helpful. Thanks for your kind words.
@I677000 Před 3 lety ⁺¹
Awesome, :) hop to get this question asked on my interview :) now I know how to answer it:)
@batmanbuddha Před 3 lety ⁺¹
Awesome Explanation, even sharing to my friend too,
@BryanCafferky Před 3 lety ⁺¹
Thanks. Yeah. The more that can be helped the better.
@marcosoliveira8731 Před 3 lety
Topics like this still so relevant.
I hope you have more to share about DW Architecture.
@helovesdata8483 Před 2 lety
you've helped me to understand data warehouses more
@shivam6565 Před 2 lety
Thank you Bryan for this wonderful tutorial. Concept lucidly explained and I like your teaching style. Liked and subscribed.
@guillermopalaciosgaray Před 2 lety
Thanks for your explanation! This is the best video I've found :)
@Jel.Awesh.M Před 2 lety
Thank you for sharing! A very good resource for SCDs!
@fatallny Před rokem ⁺¹
Awesome video, now everything is clear
@BryanCafferky Před rokem ⁺¹
Glad to hear it. Thanks.
@AC-hh2cb Před 3 lety ⁺¹
Great explanation, will help me at work :)
@anirudhvyas6069 Před 10 měsíci
nicely presented thank you!
@BryanCafferky Před 10 měsíci
You're welcome!
@zerovine8025 Před rokem
Thanks for the easy to understand explanation, man. (btw nice mustache)
@a29miller Před 2 lety
Well done. I really appreciate the detail
@marcosoliveira8731 Před 3 lety ⁺¹
THANK YOU! Real good explanation!
btw: Nice R mug :)
@dataofbitcoin6685 Před 2 lety
Amazing video, again! The best in biz.
@BryanCafferky Před 2 lety
Thanks
@rickstratton1714 Před 3 lety ⁺¹
Good video, clear explanations and visual examples. Have to wonder what kind of cocktail parties you attend! :)
@BryanCafferky Před 3 lety
Hmmm... Come to think of it, I was not invited back. :-) Thanks
@JimRohn-u8c Před 2 lety ⁺²
Thank you so much for this! I’d love to see more videos on this.
Btw, how does this work for Snowflake Schemas? Are those schemas only used in OLTP Databases?
@BryanCafferky Před 2 lety ⁺¹
Snowflake schemas are to be avoided if at all possible. I think the product Snowflake is causing confusion about this due to its name,
@SwapperTheFirst Před 2 lety ⁺¹
@@BryanCafferky I guess these to be avoided only if (1) compute and storage are expensive (2) compute and storage are slow (3) you don't have MPP or vectorized calculations. In current environment, compute and storage cost much less than data engineer and data architect time, so data modeling approaches changed.
What was true for Teradata or Netezza in 1999 is not true for Snowflake or Big Query in 2022.
On the topic of SCD, you can find the video of Airflow original creator, where he asserts, that all these approaches with SCD are just thing of the past, and now we should create daily or weekly full snapshots of dimensional tables. I don't fully understand his approach (for example, Kimball Type 2 for me sounds very intuitive), but I don't fully grasp how to solve the SCD reporting problem, if you have (say) daily snapshots of all your dimensional tables.
@BryanCafferky Před 2 lety ⁺¹
@@SwapperTheFirst SCD concepts I explained are based on Kimball. It's more about the functionality required rather than the technology used. Airflow has nothing to do with this subject so not show how the Airflow creator's comments apply. You should choose the best approach for your needs. Thanks
@elinadiary9357 Před 2 měsíci
Thank you sir
@JimRohn-u8c Před 2 lety ⁺²
Also why would a “NULL” be better than “9999-12-31” etc. ?
@BryanCafferky Před 2 lety ⁺²
a NULL means unknown whereas '9999-12-31' is a value. If you filter where date > 2022-01-01, the later would return TRUE but usually a NULL compare would not return TRUE. Forcing vlaues is an option but technically inaccurate with repercussions. Bear in mind the 9999-12-31 would appear in groupings on reports, etc. A text value of 'NULL' is just a glitch of the extract from a database and not a true NULL.
@gokukanishka Před 2 lety ⁺¹
Hey Bryan , can you please make videos on fast changing dimensions and how to handle them
@houstonfirefox Před rokem
Apparently Gender went from a static dimension to a fast changing dimension 🤣
@aaronvr_ Před 3 lety ⁺²
Great video, very well explained! Is there an efficient way to actually identify/locate/spot such changes in large data sets? Slowly changing dimensions help when dealing with pre-identified changes, but as to how one could locate them in the first place..I'm stuck!
@BryanCafferky Před 3 lety ⁺²
Hi. SQL Server has a feature called change data capture. docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15
@BryanCafferky Před 3 lety ⁺²
Or you can use a merge statement that compares the data. Ideally, the source tables have a modified date/time that can be used to get only changes since the last ETL load. This is easier than using Change Data Capture, i.e. you just keep track of the last date/time you did a load and only pull data that comes after that.
@javedabdool1187 Před rokem ⁺¹
great video boss. I have 2 questions though maybe a lil stupid :P if eveer i missed something:
should we put the source key in the fact table?if not then why bring the source key then?
do fact table need a fact id?
@BryanCafferky Před rokem ⁺¹
Thanks. The source dimension key like ProductKey should be stored in the dim table and the dim surrogate key (created by the BI ETL) is the Pk to teh Dim table and FK to the Fact table. You don't need a fact table key per se but SQL Tables perform much better with a primary key(pk) so an Identity column suffices. See my video on dim modeling czcams.com/video/lWPiSZf7-uQ/video.html
@javedabdool1187 Před rokem
@@BryanCafferky cools thnx!!
@nameunknown007 Před 2 lety ⁺²
Writes a book about Microsoft and sells it on Amazon like a legend 👍
@usman_oz Před 3 lety
Thanks for the explanation. I just wanted to check if the "Dim customer" table at around 13:20 playtime shows the correct Start & End dates. Shouldn't the "Start Date" be +1 day of the previous record's End Date ?
@canyurt1 Před 2 lety ⁺¹
Hey Bryan - I think Type 4 resembles Data Vault initiatives. Would you not agree?
@BryanCafferky Před 2 lety
I have not studied Data Vault modeling in depth but from the wiki on the subject, I would be careful equating concepts of the two methods as Data Vault modeling does not cleanse and optimize for querying per this link en.wikipedia.org/wiki/Data_vault_modeling
@LAPOVETS Před 2 lety
thank you!
@marcjkeppler3590 Před 20 dny
How does this video have fewer than 1k likes? 😅

Další v pořadí

Automatické přehrávání