![CMU Database Group](/img/default-banner.jpg)
- 459
- 3 739 932
CMU Database Group
United States
Registrace 15. 05. 2016
Carnegie Mellon University Database Group
S2024 #22 - Amazon Redshift Data Warehouse System (CMU Advanced Database Systems)
Andy Pavlo (www.cs.cmu.edu/~pavlo/)
Slides: 15721.courses.cs.cmu.edu/spring2024/slides/22-redshift.pdf
Notes: 15721.courses.cs.cmu.edu/spring2024/notes/22-redshift.pdf
15-721 Advanced Database Systems (Spring 2024)
Carnegie Mellon University
15721.courses.cs.cmu.edu/spring2024/
Slides: 15721.courses.cs.cmu.edu/spring2024/slides/22-redshift.pdf
Notes: 15721.courses.cs.cmu.edu/spring2024/notes/22-redshift.pdf
15-721 Advanced Database Systems (Spring 2024)
Carnegie Mellon University
15721.courses.cs.cmu.edu/spring2024/
zhlédnutí: 3 685
Video
S2024 #21 - Yellowbrick Data Warehouse System (CMU Advanced Database Systems)
zhlédnutí 2,4KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/21-yellowbrick.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/21-yellowbrick.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #20 - DuckDB Embedded Database System (CMU Advanced Database Systems)
zhlédnutí 5KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/20-duckdb.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/20-duckdb.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #19 - Snowflake Data Warehouse Internals (CMU Advanced Database Systems)
zhlédnutí 4,9KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/19-snowflake.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/19-snowflake.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #18 - Databricks Photon / Spark SQL (CMU Advanced Database Systems)
zhlédnutí 3,3KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/18-databricks.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/18-databricks.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #17 - Google BigQuery / Dremel (CMU Advanced Database Systems)
zhlédnutí 3,1KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/17-bigquery.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/17-bigquery.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #15 - Query Optimizer Implementation 3 (CMU Advanced Database Systems)
zhlédnutí 1,3KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/15-optimizer3.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/15-optimizer3.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #14 - Query Optimizer Implementation 2 (CMU Advanced Database Systems)
zhlédnutí 1,5KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/14-optimizer2.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/14-optimizer2.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #13 - Query Optimizer Implementation 1 (CMU Advanced Database Systems)
zhlédnutí 2,6KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/13-optimizer1.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/13-optimizer1.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #12 - Database Networking Protocols (CMU Advanced Database Systems)
zhlédnutí 2,2KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/12-networking.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/12-networking.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #11 - User-Defined Function Optimizations (CMU Advanced Database Systems)
zhlédnutí 1,4KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/11-udfs.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/11-udfs.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #10 - Multi-Way Join Algorithms / Worst-Case Optimal Joins (CMU Advanced Database Systems)
zhlédnutí 1,9KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/10-multiwayjoins.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/10-multiwayjoins.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #09 - Parallel Hash Join Algorithms (CMU Advanced Database Systems)
zhlédnutí 2,3KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/09-hashjoins.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/09-hashjoins.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #08 - Query Scheduling & Coordination (CMU Advanced Database Systems)
zhlédnutí 2,1KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/08-scheduling.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/08-scheduling.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #07 - JIT Query Compilation & Code Generation (CMU Advanced Database Systems)
zhlédnutí 2,6KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/07-compilation.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/07-compilation.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #06 - Vectorized Query Execution Using SIMD (CMU Advanced Database Systems)
zhlédnutí 3KPřed 5 měsíci
S2024 #06 - Vectorized Query Execution Using SIMD (CMU Advanced Database Systems)
S2024 #05 - Query Execution & Processing Part 2 (CMU Advanced Database Systems)
zhlédnutí 2,8KPřed 5 měsíci
S2024 #05 - Query Execution & Processing Part 2 (CMU Advanced Database Systems)
S2024 #04 - Query Execution & Processing Part 1 (CMU Advanced Database Systems)
zhlédnutí 4,1KPřed 6 měsíci
S2024 #04 - Query Execution & Processing Part 1 (CMU Advanced Database Systems)
S2024 #03 - Data Formats & Encoding Part 2 (CMU Advanced Database Systems)
zhlédnutí 4,2KPřed 6 měsíci
S2024 #03 - Data Formats & Encoding Part 2 (CMU Advanced Database Systems)
S2024 #02 - Data Formats & Encoding Part 1 (CMU Advanced Database Systems)
zhlédnutí 7KPřed 6 měsíci
S2024 #02 - Data Formats & Encoding Part 1 (CMU Advanced Database Systems)
S2024 #01 - Modern OLAP Database Systems (CMU Advanced Database Systems)
zhlédnutí 12KPřed 6 měsíci
S2024 #01 - Modern OLAP Database Systems (CMU Advanced Database Systems)
S2024 #00 - Course Overview & Logistics (CMU Advanced Database Systems)
zhlédnutí 13KPřed 6 měsíci
S2024 #00 - Course Overview & Logistics (CMU Advanced Database Systems)
F2023 #25 - Potpourri: Redis, CockroachDB, Snowflake, MangoDB, TabDB (CMU Intro to Database Systems)
zhlédnutí 6KPřed 8 měsíci
F2023 #25 - Potpourri: Redis, CockroachDB, Snowflake, MangoDB, TabDB (CMU Intro to Database Systems)
F2023 #24 - SingleStore Database Overview (CMU Intro to Database Systems)
zhlédnutí 3,1KPřed 8 měsíci
F2023 #24 - SingleStore Database Overview (CMU Intro to Database Systems)
F2023 #23 - Distributed Data Warehouse OLAP Databases (CMU Intro to Database Systems)
zhlédnutí 4,4KPřed 8 měsíci
F2023 #23 - Distributed Data Warehouse OLAP Databases (CMU Intro to Database Systems)
Chroma Vector Database: Retrieval for LLMs (Hammad Bashir + Liquan Pei)
zhlédnutí 2,8KPřed 8 měsíci
Chroma Vector Database: Retrieval for LLMs (Hammad Bashir Liquan Pei)
F2023 #22 - Distributed Transaction Processing Databases (CMU Intro to Database Systems)
zhlédnutí 3,9KPřed 8 měsíci
F2023 #22 - Distributed Transaction Processing Databases (CMU Intro to Database Systems)
pgvector: Stylish Hierarchical Navigable Small World Indexes (Jonathan Katz)
zhlédnutí 3,5KPřed 8 měsíci
pgvector: Stylish Hierarchical Navigable Small World Indexes (Jonathan Katz)
F2023 #21 - Intro to Distributed Databases (CMU Intro to Database Systems)
zhlédnutí 6KPřed 8 měsíci
F2023 #21 - Intro to Distributed Databases (CMU Intro to Database Systems)
F2023 #20 - Database Recovery (CMU Intro to Database Systems)
zhlédnutí 3,1KPřed 8 měsíci
F2023 #20 - Database Recovery (CMU Intro to Database Systems)
The audio is shit. DJ for shit?
Is there a discord channel for CMU 15-721?
Thank you for the lecture! In the degenerate worst case where every tuple in relations R and S has the same value for the join key, Sort-Merge Join's merge cost is M + N where M and N are the number of pages in relations R and S respectively. Since this looks like a Block Nested-Loop Join, why can't we optimize this to M + CEIL(M/B-2)xN where B is the number of available pages in the buffer pool?
Isn't Graph dbms research what keeps this topic hot? While I agree that neo4j is not good, there are multiple graph dbms integrating these techniques. I particular, duckdb pgq paper uses exactly the same techniques kuzu is using.
Well explained, thanks
Legend 🤣🤣🤣🤣
Indian English is hard to listen😂
toilet paperz?
Great lecture! Is there any advantage of having a sorted dict over an unsorted one in Dictionary compression?
AMD doesn’t downclock on AVX-512, they took their time to support it but did it right (it was a double pump design at first with dual 256 bit registers and a 512 one now). I think Intel doesn’t downclock mow, or at least not as bad (it was really bad, read on Cloudflare blog when used for terminating TLS), but their support for it is somewhat worse than AMD’s. Also AVX-512 is super fragmented :(
So why does the same query runs faster in DuckDB than in SQLite? Is it any of those answers given by ChatGPT? Columnar Storage Format: DuckDB uses a columnar storage format, which is more efficient for analytical queries that require scanning large amounts of data. This format allows for better data compression and faster data retrieval, especially for operations like aggregations and joins. SQLite uses a row-oriented storage format, which can be less efficient for these types of queries as it retrieves entire rows even if only a few columns are needed. Vectorized Execution: DuckDB employs a vectorized execution engine, which processes data in chunks (vectors) rather than row-by-row. This approach takes advantage of modern CPU architectures and allows for better CPU cache utilization and SIMD (Single Instruction, Multiple Data) optimizations. SQLite processes data row-by-row, which can be slower for large datasets. Parallel Processing: DuckDB supports parallel query execution, allowing it to utilize multiple CPU cores to perform operations concurrently. SQLite is designed for simplicity and portability, and while it can handle concurrent reads, its support for parallel query execution is limited. Optimized Query Planning: DuckDB includes advanced query optimization techniques that can generate more efficient execution plans for complex queries. SQLite has a simpler query optimizer, which might not produce as efficient plans for certain types of queries. Built-In Indexing and Compression: DuckDB automatically applies various indexing and compression techniques to improve query performance without requiring explicit indexing from the user. SQLite requires manual indexing, and its compression techniques are not as advanced as those in DuckDB.
Never knew Alex Honnold taught computer science.
Great course! Thanks for your open source.
👍👍👍
It seems like Postgres is due for big architectural overhaul. 1. Moving away from per process arch 2. Using Direct IO instead of OS page cache.
CZcams University lol
22:25 "We don't need to have a latch for the whole page table. Assuming it's fixed size, we can have latches for individual pages/locations of page table" Okay, if the latter is possible, why concern ourselves with multiple buffer pools? If these fine-grained latches for individual pages are still a bottleneck, then no matter how many buffer pools we segment our memory into, we will still suffer from contention...
being from India. Hearing about a professor who carries a kinef in bus for travelling is as fascinating story.
Thank you for making this awesome course public 🫡
28:30 Why would there be contention over data structures which are internal to the OS?
@49:12 I think a set MUST have unique values, as stated in the video "CANNOT have duplicate values", however - I have seen many tables of data with duplicate data.
حلو الراب ده
In the 25:55, it should be "WAL: before a page is written, pageLSN <= flushLSN"
could not find a way to get record id of page and slot id of tuple in mysql.
B+ tree animation link for the curious www.cs.usfca.edu/~galles/visualization/BPlusTree.html
i am going to go with bushy tree lol.
I thought my earphone or network was broken before viewing comments LOL
what is it that they are smashing at the end?
Thanks for this! I've always found the semistructured stuff hard to understand. I just want to point out, though, that the example in the referenced paper for shredding has different values in the columnar decomposition. In particular, for value 'en' in Name.Language.Code, the repetition level is 2, because it is a repetition of the 2nd repeated field (according to the paper).
best db courses ever
1:00:36
thanks a lot
great course
Motherfuckers. So calculus ended up in SQL anyway, eh? See 1:00:00. And those SQL guys said that Codd's ALPHA was too complicated??? "Subqueries are powerful" my ass. Only because they couldn't be arsed to implement something better (which they did anyway, but their version ended up being corrupt anyway).
Hah, the query planner turned it into a regular join, as it should.
To answer the RANK query question: you can't do a GROUP BY instead, because SQL is inconsistent doodoo.
Oh man, when I thought I couldn't dislike the inconsistencies about SQL any more, I find a new example why it's a shitty language.
Oh gods, SQL's natural join compares the NAMES of the columns? That's awful and another point of evidence why SQL <> Relational Model. It's why Codd hammered on the idea of using shared Domains to join on, not shared column names. SQL, what a joke! 😂
Yes, in the Relational Model there are no duplicates within any single relation, and if you join two relations the result is a new relation which as any other relation does not contain duplicate rows. That's why there is no popular RDBMS in existence, since Postgres, DB2, Oracle, etc all allow duplicate rows and thus are not truly relational.
Fun fact: E. F. "Ted" Codd, aka the Coddfather, invented the Relational Model (relations, tuples, domains; primary key, foreign key), but also the first query language for his model (ALPHA), the term "data model" and the term OLAP. He was also highly critical of SQL (calling it Fatally Flawed back in 1985) because it broke a bunch of consistency, which STILL hasn't really been fixed (like allowing duplicate rows, and returning anything thats not a relation (like a single row, a column or a single scalar/cell value). I've read all the publicly available letters he wrote BTW. Good stuff. Even his criticisms on the Entity-Relation Model from the 1976 (?) by Peter Chen, IIRC.
Concurrency control, fk yeah! Lol
Either that blockchain guy is a troll or that was one of the most entitled douches I've ever seen at a lecture.
superb bro. Loved ur lecture
For the log structure, if we still need an index for look up, how to save the index? How updating that index does not end up in random io stuff?
really well explained! thank you!
what is the outro song??
Why is the "lost updates" anomaly missing in the discussion of isolation levels?
this is f*cking awesome
BEEP
Great lecture.. but for this specific lecture, camera is moving too much and also quality is not HD.
insightful lecture
In 1:20:32, Oracle/MySQL and Postgres don't use memory as the primary storage, do they? And with that, Oracle/MySQL still beat most in-meomry DBMS? Is it because their WAL was disabled for this benchmark?