Are SQL joins bad for performance?

Sdílet
Vložit
  • čas přidán 8. 07. 2024
  • 📝 Get my free SQL Cheat Sheets: www.databasestar.com/get-sql-...
    🎓 Learn how to improve the performance of your SQL: databasestar.mykajabi.com/get...
    Have you heard that SQL joins are bad and can slow down your queries? I've heard this too.
    In this video, I'll demonstrate a query that joins three tables together along with the execution plan.
    I'll also demonstrate the same query except all of the data is from one table, and the execution plan.
    We'll then add some indexes and see what the execution plan looks like.
    ⏱ TIMESTAMPS:
    00:00 - Our test
    01:18 - Query 1
    03:27 - Query 2
    04:28 - Indexes
    06:00 - Check both queries
    LINKS:
    The sample database (olympics) is available on GitHub here: github.com/bbrumm/databasesta...
    The script for the queries used in this video: github.com/bbrumm/databasesta...
  • Věda a technologie

Komentáře • 33

  • @sdmagic
    @sdmagic Před 10 dny +1

    Very well done, sir. I do get asked about the effort made to normalize a database. The insight offered by your very clear explanation will go a long way in helping to answer those queries.
    Your videos and website are a great resource for the database developer community.

    • @DatabaseStar
      @DatabaseStar  Před 8 dny +1

      Thanks for the kind words! I'm glad you like the video and my channel.

  • @dav.R7
    @dav.R7 Před 7 dny +1

    I've always had this question: Does the number of joins affect performance or not? This video answered all my questions.

  • @alifawzi8197
    @alifawzi8197 Před 9 měsíci +2

    for the last two day's i was watching your content,
    i genuinely appreciate the time and the effort that you put into this vidoes, your content is amazing and it's well explained,
    Thank you so much for sharing such a content.

    • @DatabaseStar
      @DatabaseStar  Před 9 měsíci +1

      Thanks for the comment! I'm glad you like the videos!

  • @higiniofuentes2551
    @higiniofuentes2551 Před 7 měsíci

    Thank you for this very useful video!

  • @TheCodeConnoisseur
    @TheCodeConnoisseur Před 9 měsíci +1

    Excellent

  • @CanRau
    @CanRau Před 7 měsíci

    Very insightful thanks a lot ❤ what does noc stand for?

    • @DatabaseStar
      @DatabaseStar  Před 7 měsíci

      Thanks! NOC stands for National Olympic Committee.

  • @NotMeEitherOfficial
    @NotMeEitherOfficial Před 3 měsíci

    In your case separated tables win againts the single table with no indexes, in my case separated tables make more cost since each table has about 40 columns, but the single table on provide the columns that needed for the user, lets said 40x4 is 160 columns but in single table we approximitely only combine 20 columns each. I will try to implement this indexes with my company databases, as we seems need this indexes to be implemented. Working on old databases that been laying around for decades with MyISAM engine with millions of rows and try to make the performance faster as it getting slower every single month. Thanks for the video its really helpful. I will also considering to ask management to migrate to other database engine or even other DBMS like PostgreSQL, using Cache is kinda eating too much memory considering our company budget that run all the apps in one server.

    • @DatabaseStar
      @DatabaseStar  Před 3 měsíci

      Good point, it also depends on how many columns you need to return. Just because the separate tables have 40 columns, doesn't mean you necessarily need to select all 40 columns. But if you need all these columns, then a single table may make more sense like you are using.

  • @Sdirimohamedsalah
    @Sdirimohamedsalah Před 2 měsíci

    Thank very much for this constructive demo.
    I have question: when you indexed the columns, it reduced the total cost. But not the total execution time. Why you prefer reducing the total cost over the execution time which is crucial to for an applications in production ?

    • @DatabaseStar
      @DatabaseStar  Před 2 měsíci

      Thanks! Good question. I believe it's because the data set was so small and the execution time was small that it didn't really impact the time. On a larger data set you may see a bigger difference in execution time.

    • @Sdirimohamedsalah
      @Sdirimohamedsalah Před 2 měsíci

      @@DatabaseStarthank you for your response.

  • @tanzimibthesam5861
    @tanzimibthesam5861 Před 7 měsíci

    Do you think joins can have impact on scalability? Thanks

    • @DatabaseStar
      @DatabaseStar  Před 7 měsíci

      No I don't think so. However, once your database gets pretty large, you'll be looking into all kinds of techniques to improve performance, and one of them may involve caching or creating summary tables which means fewer joins - but it has tradeoffs.

  • @milenfrom
    @milenfrom Před 9 měsíci +1

    Hi,
    Thank you for the great insight.
    What is the software you use to run the queries and the explain feature?

    • @DatabaseStar
      @DatabaseStar  Před 9 měsíci +2

      Thanks! I'm using a tool called pgAdmin, which is a common SQL editor for Postgres databases.

  • @ftet1
    @ftet1 Před 8 dny

    Hi, can you offer us the DDL scripts that you used to set up your example database? Then we can recreate it directly. That would be great. Thanks and regards \sdohn

    • @DatabaseStar
      @DatabaseStar  Před 8 dny

      Good idea! The sample database (olympics) is available on GitHub here: github.com/bbrumm/databasestar/tree/main/sample_databases/sample_db_olympics
      The script for the queries used in this video is now on GitHub as well: github.com/bbrumm/databasestar/tree/main/videos/100_joins

  • @maxyudin
    @maxyudin Před 9 měsíci

    👍

  • @ChinweOge
    @ChinweOge Před 9 měsíci

    🤔

  • @romanbunshaft8412
    @romanbunshaft8412 Před 5 měsíci

    now try ordering by something )

    • @DatabaseStar
      @DatabaseStar  Před 5 měsíci

      We can add an Order By but the point still stands.

    • @romanbunshaft8412
      @romanbunshaft8412 Před 5 měsíci

      its VERY depends on a query itself. joins are not always faster and better. @@DatabaseStar

  • @luco-games
    @luco-games Před 7 měsíci

    This is completely wrong.
    0) Operating in "less than a second" units for 100-200k tables is like saying this car was not expensive, it's was less than a million dollars. You need to show execution times millis from the query plan (EXPLAIN ANALYZE).
    1) You execution time seemed to be 5 times faster for the denormalized query.
    2) Joining will ALWAYS be slower than one tabel if you join big tables (sorry but in 2023 few hundred k rows is nothing, even on a local machine).
    3) You didn't explain how actually JOIN works behind the scenes but I get it because it would ruin the whole video.

    • @DatabaseStar
      @DatabaseStar  Před 7 měsíci +2

      Thanks for the feedback. I wouldn't say it's "completely wrong" because the video demonstrates the concept step-by-step and shows numbers.
      I can create another video that demostrates this with larger tables, as I think it would be more beneficial.
      0) That's a good point, which is why I didn't refer to the time taken when talking about the query, I referred to the cost from the execution plan. For larger tables & longer queries I would have also used the time taken.
      1) I don't think it was 5 times faster, the cost comparison (after indexes) was 822 vs 1,239, so it's a little faster.
      2) I don't think joining will always be slower than one table if working with big tables. That's the point of this video - joining is not always slower if you have a normalised design and indexes, BUT it depends on the query.
      3) I don't need to explain how join works behind the scenes for this video to be useful.

    • @NotMeEitherOfficial
      @NotMeEitherOfficial Před 3 měsíci

      0) they does point that with "cost", as you typed "few k rows is nothing even on local machine", so the execution time will be really thin and can't be use for the actually benchmark since probably other apps consume the gap, as also now days computer has something called multi threading processor.
      1) it does, i guessed the first answer could answer the second point too.
      2) "always" is not the perfect fit, cause in some cases its actually work as he also provided the cost, unindexed tables is one of the problems.
      3) he explain with the flow pgadmin provided, well when people "trying" to explain about something and then it not satisfy you, you will just criticize them?