CMU Database Group
CMU Database Group
  • 459
  • 3 739 932
S2024 #22 - Amazon Redshift Data Warehouse System (CMU Advanced Database Systems)
Andy Pavlo (www.cs.cmu.edu/~pavlo/)
Slides: 15721.courses.cs.cmu.edu/spring2024/slides/22-redshift.pdf
Notes: 15721.courses.cs.cmu.edu/spring2024/notes/22-redshift.pdf
15-721 Advanced Database Systems (Spring 2024)
Carnegie Mellon University
15721.courses.cs.cmu.edu/spring2024/
zhlédnutí: 3 685

Video

S2024 #21 - Yellowbrick Data Warehouse System (CMU Advanced Database Systems)
zhlédnutí 2,4KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/21-yellowbrick.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/21-yellowbrick.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #20 - DuckDB Embedded Database System (CMU Advanced Database Systems)
zhlédnutí 5KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/20-duckdb.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/20-duckdb.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #19 - Snowflake Data Warehouse Internals (CMU Advanced Database Systems)
zhlédnutí 4,9KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/19-snowflake.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/19-snowflake.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #18 - Databricks Photon / Spark SQL (CMU Advanced Database Systems)
zhlédnutí 3,3KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/18-databricks.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/18-databricks.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #17 - Google BigQuery / Dremel (CMU Advanced Database Systems)
zhlédnutí 3,1KPřed 3 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/17-bigquery.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/17-bigquery.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #15 - Query Optimizer Implementation 3 (CMU Advanced Database Systems)
zhlédnutí 1,3KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/15-optimizer3.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/15-optimizer3.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #14 - Query Optimizer Implementation 2 (CMU Advanced Database Systems)
zhlédnutí 1,5KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/14-optimizer2.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/14-optimizer2.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #13 - Query Optimizer Implementation 1 (CMU Advanced Database Systems)
zhlédnutí 2,6KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/13-optimizer1.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/13-optimizer1.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #12 - Database Networking Protocols (CMU Advanced Database Systems)
zhlédnutí 2,2KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/12-networking.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/12-networking.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #11 - User-Defined Function Optimizations (CMU Advanced Database Systems)
zhlédnutí 1,4KPřed 4 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/11-udfs.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/11-udfs.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #10 - Multi-Way Join Algorithms / Worst-Case Optimal Joins (CMU Advanced Database Systems)
zhlédnutí 1,9KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/10-multiwayjoins.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/10-multiwayjoins.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #09 - Parallel Hash Join Algorithms (CMU Advanced Database Systems)
zhlédnutí 2,3KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/09-hashjoins.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/09-hashjoins.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #08 - Query Scheduling & Coordination (CMU Advanced Database Systems)
zhlédnutí 2,1KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/08-scheduling.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/08-scheduling.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #07 - JIT Query Compilation & Code Generation (CMU Advanced Database Systems)
zhlédnutí 2,6KPřed 5 měsíci
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/07-compilation.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/07-compilation.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #06 - Vectorized Query Execution Using SIMD (CMU Advanced Database Systems)
zhlédnutí 3KPřed 5 měsíci
S2024 #06 - Vectorized Query Execution Using SIMD (CMU Advanced Database Systems)
S2024 #05 - Query Execution & Processing Part 2 (CMU Advanced Database Systems)
zhlédnutí 2,8KPřed 5 měsíci
S2024 #05 - Query Execution & Processing Part 2 (CMU Advanced Database Systems)
S2024 #04 - Query Execution & Processing Part 1 (CMU Advanced Database Systems)
zhlédnutí 4,1KPřed 6 měsíci
S2024 #04 - Query Execution & Processing Part 1 (CMU Advanced Database Systems)
S2024 #03 - Data Formats & Encoding Part 2 (CMU Advanced Database Systems)
zhlédnutí 4,2KPřed 6 měsíci
S2024 #03 - Data Formats & Encoding Part 2 (CMU Advanced Database Systems)
S2024 #02 - Data Formats & Encoding Part 1 (CMU Advanced Database Systems)
zhlédnutí 7KPřed 6 měsíci
S2024 #02 - Data Formats & Encoding Part 1 (CMU Advanced Database Systems)
S2024 #01 - Modern OLAP Database Systems (CMU Advanced Database Systems)
zhlédnutí 12KPřed 6 měsíci
S2024 #01 - Modern OLAP Database Systems (CMU Advanced Database Systems)
S2024 #00 - Course Overview & Logistics (CMU Advanced Database Systems)
zhlédnutí 13KPřed 6 měsíci
S2024 #00 - Course Overview & Logistics (CMU Advanced Database Systems)
F2023 #25 - Potpourri: Redis, CockroachDB, Snowflake, MangoDB, TabDB (CMU Intro to Database Systems)
zhlédnutí 6KPřed 8 měsíci
F2023 #25 - Potpourri: Redis, CockroachDB, Snowflake, MangoDB, TabDB (CMU Intro to Database Systems)
F2023 #24 - SingleStore Database Overview (CMU Intro to Database Systems)
zhlédnutí 3,1KPřed 8 měsíci
F2023 #24 - SingleStore Database Overview (CMU Intro to Database Systems)
F2023 #23 - Distributed Data Warehouse OLAP Databases (CMU Intro to Database Systems)
zhlédnutí 4,4KPřed 8 měsíci
F2023 #23 - Distributed Data Warehouse OLAP Databases (CMU Intro to Database Systems)
Chroma Vector Database: Retrieval for LLMs (Hammad Bashir + Liquan Pei)
zhlédnutí 2,8KPřed 8 měsíci
Chroma Vector Database: Retrieval for LLMs (Hammad Bashir Liquan Pei)
F2023 #22 - Distributed Transaction Processing Databases (CMU Intro to Database Systems)
zhlédnutí 3,9KPřed 8 měsíci
F2023 #22 - Distributed Transaction Processing Databases (CMU Intro to Database Systems)
pgvector: Stylish Hierarchical Navigable Small World Indexes (Jonathan Katz)
zhlédnutí 3,5KPřed 8 měsíci
pgvector: Stylish Hierarchical Navigable Small World Indexes (Jonathan Katz)
F2023 #21 - Intro to Distributed Databases (CMU Intro to Database Systems)
zhlédnutí 6KPřed 8 měsíci
F2023 #21 - Intro to Distributed Databases (CMU Intro to Database Systems)
F2023 #20 - Database Recovery (CMU Intro to Database Systems)
zhlédnutí 3,1KPřed 8 měsíci
F2023 #20 - Database Recovery (CMU Intro to Database Systems)

Komentáře

  • @greielts75331
    @greielts75331 Před 19 hodinami

    The audio is shit. DJ for shit?

  • @ibrahimrabbani94
    @ibrahimrabbani94 Před dnem

    Is there a discord channel for CMU 15-721?

  • @ibrahimrabbani94
    @ibrahimrabbani94 Před dnem

    Thank you for the lecture! In the degenerate worst case where every tuple in relations R and S has the same value for the join key, Sort-Merge Join's merge cost is M + N where M and N are the number of pages in relations R and S respectively. Since this looks like a Block Nested-Loop Join, why can't we optimize this to M + CEIL(M/B-2)xN where B is the number of available pages in the buffer pool?

  • @himurakno
    @himurakno Před 2 dny

    Isn't Graph dbms research what keeps this topic hot? While I agree that neo4j is not good, there are multiple graph dbms integrating these techniques. I particular, duckdb pgq paper uses exactly the same techniques kuzu is using.

  • @rongtang4385
    @rongtang4385 Před 4 dny

    Well explained, thanks

  • @rachelryan5231
    @rachelryan5231 Před 5 dny

    Legend 🤣🤣🤣🤣

  • @guilaidai7596
    @guilaidai7596 Před 5 dny

    Indian English is hard to listen😂

  • @vaibhaves2111
    @vaibhaves2111 Před 8 dny

    toilet paperz?

  • @AbhishekRaj-do8kk
    @AbhishekRaj-do8kk Před 8 dny

    Great lecture! Is there any advantage of having a sorted dict over an unsorted one in Dictionary compression?

  • @LtdJorge
    @LtdJorge Před 10 dny

    AMD doesn’t downclock on AVX-512, they took their time to support it but did it right (it was a double pump design at first with dual 256 bit registers and a 512 one now). I think Intel doesn’t downclock mow, or at least not as bad (it was really bad, read on Cloudflare blog when used for terminating TLS), but their support for it is somewhat worse than AMD’s. Also AVX-512 is super fragmented :(

  • @hugolatendresse7617
    @hugolatendresse7617 Před 12 dny

    So why does the same query runs faster in DuckDB than in SQLite? Is it any of those answers given by ChatGPT? Columnar Storage Format: DuckDB uses a columnar storage format, which is more efficient for analytical queries that require scanning large amounts of data. This format allows for better data compression and faster data retrieval, especially for operations like aggregations and joins. SQLite uses a row-oriented storage format, which can be less efficient for these types of queries as it retrieves entire rows even if only a few columns are needed. Vectorized Execution: DuckDB employs a vectorized execution engine, which processes data in chunks (vectors) rather than row-by-row. This approach takes advantage of modern CPU architectures and allows for better CPU cache utilization and SIMD (Single Instruction, Multiple Data) optimizations. SQLite processes data row-by-row, which can be slower for large datasets. Parallel Processing: DuckDB supports parallel query execution, allowing it to utilize multiple CPU cores to perform operations concurrently. SQLite is designed for simplicity and portability, and while it can handle concurrent reads, its support for parallel query execution is limited. Optimized Query Planning: DuckDB includes advanced query optimization techniques that can generate more efficient execution plans for complex queries. SQLite has a simpler query optimizer, which might not produce as efficient plans for certain types of queries. Built-In Indexing and Compression: DuckDB automatically applies various indexing and compression techniques to improve query performance without requiring explicit indexing from the user. SQLite requires manual indexing, and its compression techniques are not as advanced as those in DuckDB.

  • @sehajpreetsingh6266
    @sehajpreetsingh6266 Před 14 dny

    Never knew Alex Honnold taught computer science.

  • @LetianRuan
    @LetianRuan Před 15 dny

    Great course! Thanks for your open source.

  • @quang.luu.179
    @quang.luu.179 Před 16 dny

    👍👍👍

  • @indavarapuaneesh2871
    @indavarapuaneesh2871 Před 20 dny

    It seems like Postgres is due for big architectural overhaul. 1. Moving away from per process arch 2. Using Direct IO instead of OS page cache.

  • @lesleydowney6688
    @lesleydowney6688 Před 20 dny

    CZcams University lol

  • @mystmuffin3600
    @mystmuffin3600 Před 22 dny

    22:25 "We don't need to have a latch for the whole page table. Assuming it's fixed size, we can have latches for individual pages/locations of page table" Okay, if the latter is possible, why concern ourselves with multiple buffer pools? If these fine-grained latches for individual pages are still a bottleneck, then no matter how many buffer pools we segment our memory into, we will still suffer from contention...

  • @Avinashk-gq3pl
    @Avinashk-gq3pl Před 23 dny

    being from India. Hearing about a professor who carries a kinef in bus for travelling is as fascinating story.

  • @rayudua.l.p1905
    @rayudua.l.p1905 Před 26 dny

    Thank you for making this awesome course public 🫡

  • @mystmuffin3600
    @mystmuffin3600 Před 26 dny

    28:30 Why would there be contention over data structures which are internal to the OS?

  • @JohnSundberg
    @JohnSundberg Před 26 dny

    @49:12 I think a set MUST have unique values, as stated in the video "CANNOT have duplicate values", however - I have seen many tables of data with duplicate data.

  • @fakh99
    @fakh99 Před 27 dny

    حلو الراب ده

  • @jauhararifin10
    @jauhararifin10 Před 28 dny

    In the 25:55, it should be "WAL: before a page is written, pageLSN <= flushLSN"

  • @aakarshanraj1176
    @aakarshanraj1176 Před měsícem

    could not find a way to get record id of page and slot id of tuple in mysql.

  • @yashthakkar4499
    @yashthakkar4499 Před měsícem

    B+ tree animation link for the curious www.cs.usfca.edu/~galles/visualization/BPlusTree.html

  • @yashthakkar4499
    @yashthakkar4499 Před měsícem

    i am going to go with bushy tree lol.

  • @break1145
    @break1145 Před měsícem

    I thought my earphone or network was broken before viewing comments LOL

  • @user-vg7os9hf6u
    @user-vg7os9hf6u Před měsícem

    what is it that they are smashing at the end?

  • @kevinkristensen8939
    @kevinkristensen8939 Před měsícem

    Thanks for this! I've always found the semistructured stuff hard to understand. I just want to point out, though, that the example in the referenced paper for shredding has different values in the columnar decomposition. In particular, for value 'en' in Name.Language.Code, the repetition level is 2, because it is a repetition of the 2nd repeated field (according to the paper).

  • @energy-tunes
    @energy-tunes Před měsícem

    best db courses ever

  • @user-lv2ht3qv2l
    @user-lv2ht3qv2l Před měsícem

    1:00:36

  • @user-lv2ht3qv2l
    @user-lv2ht3qv2l Před měsícem

    thanks a lot

  • @llight1635
    @llight1635 Před měsícem

    great course

  • @NostraDavid2
    @NostraDavid2 Před měsícem

    Motherfuckers. So calculus ended up in SQL anyway, eh? See 1:00:00. And those SQL guys said that Codd's ALPHA was too complicated??? "Subqueries are powerful" my ass. Only because they couldn't be arsed to implement something better (which they did anyway, but their version ended up being corrupt anyway).

    • @NostraDavid2
      @NostraDavid2 Před měsícem

      Hah, the query planner turned it into a regular join, as it should.

  • @NostraDavid2
    @NostraDavid2 Před měsícem

    To answer the RANK query question: you can't do a GROUP BY instead, because SQL is inconsistent doodoo.

  • @NostraDavid2
    @NostraDavid2 Před měsícem

    Oh man, when I thought I couldn't dislike the inconsistencies about SQL any more, I find a new example why it's a shitty language.

  • @NostraDavid2
    @NostraDavid2 Před měsícem

    Oh gods, SQL's natural join compares the NAMES of the columns? That's awful and another point of evidence why SQL <> Relational Model. It's why Codd hammered on the idea of using shared Domains to join on, not shared column names. SQL, what a joke! 😂

  • @NostraDavid2
    @NostraDavid2 Před měsícem

    Yes, in the Relational Model there are no duplicates within any single relation, and if you join two relations the result is a new relation which as any other relation does not contain duplicate rows. That's why there is no popular RDBMS in existence, since Postgres, DB2, Oracle, etc all allow duplicate rows and thus are not truly relational.

  • @NostraDavid2
    @NostraDavid2 Před měsícem

    Fun fact: E. F. "Ted" Codd, aka the Coddfather, invented the Relational Model (relations, tuples, domains; primary key, foreign key), but also the first query language for his model (ALPHA), the term "data model" and the term OLAP. He was also highly critical of SQL (calling it Fatally Flawed back in 1985) because it broke a bunch of consistency, which STILL hasn't really been fixed (like allowing duplicate rows, and returning anything thats not a relation (like a single row, a column or a single scalar/cell value). I've read all the publicly available letters he wrote BTW. Good stuff. Even his criticisms on the Entity-Relation Model from the 1976 (?) by Peter Chen, IIRC.

  • @7th_CAV_Trooper
    @7th_CAV_Trooper Před měsícem

    Concurrency control, fk yeah! Lol

  • @digitulized459
    @digitulized459 Před měsícem

    Either that blockchain guy is a troll or that was one of the most entitled douches I've ever seen at a lecture.

  • @m.imranzaheer1368
    @m.imranzaheer1368 Před měsícem

    superb bro. Loved ur lecture

  • @chenqiang19860101
    @chenqiang19860101 Před měsícem

    For the log structure, if we still need an index for look up, how to save the index? How updating that index does not end up in random io stuff?

  • @aliasonline1493
    @aliasonline1493 Před měsícem

    really well explained! thank you!

  • @akashkulkarni832
    @akashkulkarni832 Před měsícem

    what is the outro song??

  • @njgarg
    @njgarg Před měsícem

    Why is the "lost updates" anomaly missing in the discussion of isolation levels?

  • @tylerrongione6696
    @tylerrongione6696 Před měsícem

    this is f*cking awesome

  • @njgarg
    @njgarg Před měsícem

    Great lecture.. but for this specific lecture, camera is moving too much and also quality is not HD.

  • @indavarapuaneesh2871
    @indavarapuaneesh2871 Před 2 měsíci

    insightful lecture

  • @jauhararifin10
    @jauhararifin10 Před 2 měsíci

    In 1:20:32, Oracle/MySQL and Postgres don't use memory as the primary storage, do they? And with that, Oracle/MySQL still beat most in-meomry DBMS? Is it because their WAL was disabled for this benchmark?