From 2.5 million row reads to 1 (optimizing my database performance)

Web Dev Cody

zhlédnutí 34 392

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 12. 08. 2023
📘 T3 Stack Tutorial: 1017897100294.gumroad.com/l/j...
🤖 SaaS I'm Building: www.icongeneratorai.com/
✂️ Background Cutter: www.backgroundcutter.com/
💬 Discord: / discord
🔔 Newsletter: newsletter.webdevcody.com/
📁 GitHub: github.com/webdevcody
📺 Twitch: / webdevcody
🤖 Website: webdevcody.com
🐦 Twitter: / webdevcody

Komentáře • 128

@Botshxlo Před 10 měsíci ⁺⁹⁴
Please do more performance content, this was really helpful!
@parlor3115 Před 10 měsíci
Second
@cas818028 Před 10 měsíci
Third
@aymaneu3723 Před 10 měsíci
forth
@idk23535 Před 10 měsíci
fifth
@behindthescenex Před 9 měsíci
sixth
@willmakk Před 10 měsíci ⁺⁴²
You've become my favorite web dev content creator in no time, no fancy intros and straight to the point with key subjects.
@k4f Před 10 měsíci
Same, I love this style of content. Thanks, Cody
@idk23535 Před 10 měsíci
For Real!
@avtandilkheladze521 Před 10 měsíci ⁺⁷
When you index a column in a table (doesn't matter if its thru prisma) it creates a b-tree on a disk (balanced binary search tree), rule of thumb is to not index a lot of your columns in the table and try to know your access patterns on data beforehand (or just access data using primary keys which are already indexed by default).
If you start stacking up indexes on everything, your write performance will be terrible overtime and you will lock out tables, because every index has to overwrite several pages in the multiple trees.
If you still plan to run analytical queries on your OLTP database (you shouldn't) at least create read replica and index columns that you want to query there, by doing that you will get eventually consistent storage (considering replication lag).
@salman0ansari Před 10 měsíci
this should be pinned
@amorto3420 Před 10 měsíci
I knew about all of these but this was a good watch anyway. Your videos are very informative and not very long so people can actually learn something quickly.
@notfadeaway6617 Před 4 měsíci
performance tuning content is really helpful. thank you!
@DomskiPlays Před 10 měsíci ⁺¹
At my job I had to create a data warehouse that stores so much data and gets frequent queries so I had to learn all about clustered and non-clustered indices. Good stuff!
@shadyworld1 Před 10 měsíci
I’d love to see a series of performance optimizations for SaaS ❤
Amazing one bro 🤜🏽🤛🏻
@antoninpire8451 Před 10 měsíci
lots of great advices throughout the video, awesome vid !
@mahdiboughanmi6781 Před 10 měsíci ⁺¹
You are awesome tbh you deserver millions of subscribers !!
@antoninoadornetto6958 Před 10 měsíci
Great video! I was just thinking about this topic yesterday. I’m using planet scale as well so the video was very helpful. Also, I found icon gen quite awesome. I purchased credits and was able to gen some kick ass icons for my app. It definitely boosted the user experience and gave it the missing flavor I needed.
@WebDevCody Před 10 měsíci
Nice glad to hear that! What’s your app?
@last.journey Před 10 měsíci
I'm happy to see how much people vists your website
@chrishabgood8900 Před 10 měsíci
For indexes, Not just the where but also the selects. Good video
@navonyt Před 10 měsíci ⁺¹⁹
We need more performance optimization videos!!
@IvanRandomDude Před 10 měsíci ⁺³
The best way to optimize is to create bad and very inefficient solution in the beginning. Then any improvement you do will be "optimization"
@navonyt Před 10 měsíci
@@IvanRandomDude need to keep that in my mind before I work on new functionalities at my work🤓
@parlor3115 Před 10 měsíci ⁺¹
@@navonyt Don't over-optimize, though. Otherwise, they'll kick you out because everything works perfectly.
@navonyt Před 10 měsíci
@@parlor3115 Time to introduce new bugs in the system ;)
@IvanRandomDude Před 10 měsíci ⁺²
@@navonyt Write such a terrible code that only you can maintain and fix it. Job security
@hyperprotagonist Před 10 měsíci ⁺⁸
I think your audience will grow when you touch on nuance subjects such as this. At least, it drew my attention.
@SyntaxLexx Před 10 měsíci
Direct to the point! 👍
@SteakFisher Před 10 měsíci ⁺²
planetscale also has built in caching (currently in beta I think) think having the database provider handle caching is a great step considering they can simplify the process down considerably
@tomyamado Před 10 měsíci
If you download the Prisma plugin, I think it suggests in your schema that you are missing an index on the model. Specially if you have relations. Nice video by the way!
@illiachalyk195 Před 10 měsíci
Great video, thank you for doing content that is educative and entertaining at the same time!
And I just wanted to add that the downside of creating an index is not that it takes up more space, the space overhead is minor, but the actual downside is that writes will take more time since it has to update multiple indexes instead of just one.
@haziqmasud532 Před 10 měsíci
You are right, but that usually depends on the situation or use case. If your database is facing heavy reads then you should definitely add index and your services are facing latency , then adding an index is the right option.
@illiachalyk195 Před 10 měsíci
Sure, I mean this is not the reason why you shouldn't use indexes. This is the thing that you have to consider before using them.
@WebDevCody Před 10 měsíci
Great point!
@klapaucius515 Před 10 měsíci ⁺¹
Good info!
@basedad Před 10 měsíci ⁺⁸
You have found the secret sauce to high view videos. Showing people stuff like this A) to the point B) well explained C) not the usual everyday content
@haziqmasud532 Před 10 měsíci ⁺¹
The second thing that Cody mentioned regarding counting the number of rows in a table, I am not sure what type of database he is using MySQL or Postgres.But there is schema in the databases by the name of information_schema and that stores the information regarding the whole database, like when you execute SHOW DATABASES or SHOW TABLES; It fetch that information from information_schema. And one of the things that schema stores is information about tables and you can find out the number of records from the information_schema.TABLE table instead of using the COUNT(*) function. That number might not be the latest but can save you lot of execution time. Hope it helps. ☺
@WebDevCody Před 10 měsíci ⁺¹
Good advice! It’s mysql
@martiananomaly Před 10 měsíci
Great video.
@huzaifamalik2346 Před 10 měsíci ⁺¹
sir need more performance content
@ThomazMartinez Před 10 měsíci
I wish supabase had same reporting UI as supabase damn that looks really nice
@dandogamer Před 10 měsíci ⁺¹
Whilst next does offer a way to do caching I think it's important to keep in mind that its building upon basic http or perhaps external caching principles which are applicable to any server application
@WebDevCody Před 10 měsíci
Right, but next caches at the server level which means all users will get the same data when hitting the api, the browser cache is mainly useful for users who return often to you app
@StephenRayner Před 10 měsíci
Excellent
@sheldonfourie5959 Před 10 měsíci
Really helpful
@cameronysidron2662 Před 10 měsíci ⁺¹
Hey Cody, cool video. Could you talk about in what scenarios someone should reach for prisma over something like a supabase postgresql database? Love your videos!
@WebDevCody Před 10 měsíci
Prisma is an orm, so do you mean planetscale vs supabase? Planetscale provides scaling for very large data out of the box from what I understand, and their dashboard provides good insights (as seen in this video) that could help you performance tune like i showed
@cameronysidron2662 Před 10 měsíci
@@WebDevCody oh my bad, I guess my real question then is why should I use an orm 😂 - but yeah what you showed in this video looks pretty powerful! I love the insights on the query level that planetscale provides
@handikaharianto Před 10 měsíci
Useful video as usual! Could you also cover how to fetch data from database through REST api for admin dashboard that basically contains some graphs and statistics? For context, i have admin dashboard for my FYP project which is built using the MERN stack. The problem is, I dont think i did it properly, especially on the api endpoints and the way i fetch the data from the database using MongoDB aggregation
@WebDevCody Před 10 měsíci
It depends on what data you’re trying to display. Your api would just need to return useful data you could plot in a time series or bar chart, etc. either you fetch all records from your db an aggregate in your code, or you find ways to write aggregate with groupBy statements by time windows probably. It might be easier to just do it at the api layer
@CanRau Před 10 měsíci
PlanetScale insights is fantastic just recently did the same after realizing I missed 1 important index leading to hundreds of thousands of routed read in every request 😅 note down to 1 row read as well 🎉
@SeanCassiere Před 10 měsíci ⁺⁴
Proper schema design (with indexes) and dropping Prisma for Drizzle made a big difference for me. For me since I had a bunch relationship based queries, which Prisma does a horrible job at, is probably what made the difference.
@WebDevCody Před 10 měsíci
I need to try it out, but drizzle looks like I'm just writing sql using a builder pattern which isn't that interesting to me
@SeanCassiere Před 10 měsíci ⁺¹
@@WebDevCody That's a pretty good summary of it. The only thing I'd add in is the type safety brings to your project.
Plus it always runs a single query to your db instance whilst Prisma does a bunch of Rust-Graphql-Db emulation for relationships.
Your Planetscale instance should be feeling this hit since Prisma's engine makes multiple calls in a transaction to fulfill any requests with relations.
@SeanCassiere Před 10 měsíci
Then again you need to consider the cost of your time, and until you are properly hitting scale you'll probably not need to worry about it.
@zivtamary Před 10 měsíci
@@WebDevCody that was my initial thought too, but I did eventually make the switch to first get rid of the ridiculous heavy prisma client, and second maybe understand SQL deeper instead of prisma black magic and whatever it does behind the scenes and the spaghetti sql code it spits. I think you should really consider it. I'm currently using Kysely as a query builder and prisma just for the db schema with prisma-kysely, and the performance is incomparable!
@sincethatmoment Před 10 měsíci ⁺¹
doesn't getStaticProps run at build time? How does it know to revalidate data if it's already deployed? Or does it get built every hour?
@WebDevCody Před 10 měsíci ⁺²
When you revalidate it sets up ISG for incremental static generation
@erade4051 Před 10 měsíci
Hey Cody I've been thinking of getting back to doing frontend and was wondering what you would recommend in terms of libraries etc for building like a facebook or twitter clone in 2023.
@WebDevCody Před 10 měsíci
since I use react for everything, my biased opinion would be to use next with pages router. Remix seems nice as well. If you don't like react, checkout sveltekit.
@monstajoedev Před 10 měsíci ⁺¹
What about adding an in memory cache like redis
@septa6459 Před 10 měsíci
Is @@index can be applied to any data type?
@hamm8934 Před 10 měsíci
Hate to be that guy, but which theme are you using?
Been looking for a new one and I like this one
@WebDevCody Před 10 měsíci
Bearded theme stained blue
@HexapoDD Před 10 měsíci
hi, I agree, prisma is great and table scans are to be avoided. much love
@fadilnatakusumah195 Před 10 měsíci
Do you prefer PlanetScale or Supabase for production app?
@WebDevCody Před 10 měsíci ⁺¹
Probably planetscale at this point
@gadgetboyplaysmc Před 2 měsíci
This might be a stupid question since I lack fundamentals on SQL. But what specifically does an index even do the under the hood that it's able to optimize this?
It's pretty cool that literally the solution is just @@index(column) and it magically fixes it.
@real23lions Před 10 měsíci
Cody, I’ve wondered if there is much difference moving from using ORM to just SQL. what are the pros vs cons in terms of speed of requests and DX?
@WebDevCody Před 10 měsíci ⁺¹
if you write raw sql, you're basically coupling all your code to whatever sql engine you use. an ORM helps abstract away the implementation (postgres, mysql, etc) which means less coupling. IMO, Raw sql gives you the power to write optimized queries at the cost of development speed. I haven't written a raw query in years, and I really don't care to do so again unless I really need it.
@real23lions Před 10 měsíci
@@WebDevCody thanks for the feedback. useful to note if i decide to look further into it. thanks!
@jishnudesarkar Před 10 měsíci ⁺³
Here's an idea regarding showing user and icon count in your homepage. You can use redis to maintain the counts and every time a users gets added you just do an INCR to that key in redis. This will be way faster the counting all the rows in the DB and also would reduce a DB call.
@RemotHuman Před 10 měsíci
indexes will also slightly reduce write times so that's another downside, but still worth it usually to increase read times by not reading as many rows
@WebDevCody Před 10 měsíci
Good points!
@jacksonbourne Před 10 měsíci ⁺¹
Since (I'm assuming) the email field is unique, I think it would improve performance even more by using a @@unique(email) index, or at that point just using an @unique attribute directly on the email field. If the identifier is strictly internal, using an `Int` with autoincrement would be even better (obviously don't use it if it's revealed to the user as it would give them an estimate of your user count at the very least, and open you up to scraping if you can view user profiles)
Also, instead of using String and @default(cuid()), you'd save a lot of space with a `String @db.Uuid` and using a @default(uuid()) instead if you don't care between uuid/cuid. A uuid is just 128 bits to store, versus storing a cuid in a stringified form which 32 bits per character (at least 768 bits according to the spec's minimum length).
@WebDevCody Před 10 měsíci ⁺¹
Yeah a unique might be a better option
@jacksonbourne Před 10 měsíci
@@WebDevCody Regardless, lots of good info and a great video overall, well done :)
@bhavyajain638 Před 10 měsíci
I'm wondering how would one perform the user and icon count caching in Reactjs? Maybe storing the counts in database?
@WebDevCody Před 10 měsíci ⁺¹
I think sql is pretty fast at counting rows, but yeah you could store a computer count and store it in a metadata field somewhere
@bhavyajain638 Před 10 měsíci
@@WebDevCody Yes. Thankyou so much for the reply. Didn't expect response this quick... I'm delighted.
@RegalWK Před 10 měsíci
ID by default is index column so you could also look by index if logic in your code allow for this
@WebDevCody Před 10 měsíci
I don't think I know the ID which is why I need to lookup by email. I refactored the column to only hold a list of emails
@RegalWK Před 10 měsíci
@@WebDevCody good for you but remember indexes have pros and cons like everything, they are not free
@yousafwazir3167 Před 10 měsíci ⁺²
The main reason is because prisma reads more than it needs, pure SQL would be to at better and put indexes on your data based on how you query
@parlor3115 Před 10 měsíci
What is that Prisma can do that would over-fetch data?
@yousafwazir3167 Před 10 měsíci ⁺¹
@@parlor3115 ?
@JS_Jordan Před 10 měsíci
@@parlor3115returning data, upserts and some relational queries will stitch together multiple queries. They try turn mysql to postgresql
@jesse9999999 Před 10 měsíci
@@parlor3115 it does relations manually, so rather than doing proper joins it will do discrete fetches for each table and join the data on the way back to you
@bk1507 Před 10 měsíci
It is easier to delete indexes than add new ones especially if the table is very large. To make a new index the index tree has to be constructed based on the table, the bigger the table the slower this step takes. Removing an index is just getting rid of the tree much less work. So I would say it is better to be generous when selecting indexes on a table to be safe.
@WebDevCody Před 10 měsíci ⁺¹
Sure, maybe if your dataset is huge, mine isn’t
@kyngcytro Před 10 měsíci
I have a feeling you'll get the same performance boost. If you marked the field as unique in your schema.
@WebDevCody Před 10 měsíci ⁺¹
Yeah probably
@darylphuah Před 10 měsíci
You have another query that has potential for problems.
SELECT count(*) from (subquery that returns rows)
The DB Engine is likely optimizing this for you now, but if its get more complicated and it actually has to return rows before counting you will face problems with high row counts. I just don't see why such a simple count would need be done with a subquery to begin with.
@WebDevCody Před 10 měsíci
I’ll look into that one
@bigblambino Před 10 měsíci
When a column is being referenced using a where clauses in SQL queries there are very few reasons not to have an index on that column, so I think your recommendation that you should wait until your database is in production and doing full table scans to add indexes is generally bad advice.
@vanvanni_nl Před 10 měsíci ⁺¹
You should definitely not index everything that often is queried unless it is a small app. Bigger applications indexing all where clauses will dramatically increase diskspace, and if you do not partition that you can end up in nasty situations
@WebDevCody Před 10 měsíci
Yeah don’t index everything, which is why I said don’t premature optimize
@DunckingTest Před 10 měsíci
planet scale requires a cc. any alternatives?
@WebDevCody Před 10 měsíci
Supabase has a free tier db. Although, I didn’t add a cc when signing up for planetscale… maybe they changed their policies
@jackybanh8105 Před 10 měsíci
you forgot that the heap needs to be rebuilt when new emails are inserted
@WebDevCody Před 10 měsíci
correct, adding more indexes causes slower writes, but I'd be curious to see at what point this actually becomes a problem. With my small side project, it'll probably never be an issue.
@YeetYeetYe Před 10 měsíci
Yo what theme is that
@WebDevCody Před 10 měsíci
Bearded theme
@EvertJunior Před 10 měsíci
I didn’t understand why you dislike app router in this case. You could set it up to revalidate only the counter component instead of the entire page.
@WebDevCody Před 10 měsíci
I have no issues with using the app router to revalidate after some period, my issues with the app router are there are no ways to set it to 0 when navigating between routes
@JS_Jordan Před 10 měsíci
Switch from prisma to drizzle and it comes down an insane amount
@VincenzoCassaro Před 10 měsíci
isn't another viable solution to just store the count in another table and update it at every new user/icon and just read that one number each time instead of count * ?
@WebDevCody Před 10 měsíci ⁺²
Yeah, that works as well, but if a user deletes their account or icon, then you need to make sure you update those counts in all the various business logic locations. It’s much easier just to dynamically count it once in a while before exploding code complexity
@chawza8402 Před 10 měsíci
well nextjs has the feature and it only takes few lines so I guess this way is much way better than managing a table
@IvanRandomDude Před 10 měsíci ⁺²
That will slow down your writes. Plus, you will have more writes which costs more. Cache is the best solution if count is not of critical business importance like number of users. Of course, you would not use cached value to show users their money balance for example. But things like numbers of users, numbers of likes etc. does not have to show the latest value.
@VincenzoCassaro Před 10 měsíci
I almost agree with everything said but this would be solved with vercel key/value db so nothing convoluted just a simple value to update and read but I get the point
@chawza8402 Před 10 měsíci
@@VincenzoCassaro redis also kv db pair. Or a regular objeck (hash) key value pair also sufficient.
@buddy.abc123 Před 10 měsíci
Good clip for beginners. This is why using ORMs without prior sql knowledge is harmful.
@xpmon Před 10 měsíci
I wouldn't even revalidate every hour maybe once a day is enough
@user-tb4ig7qh9b Před 10 měsíci
The first thing you need to do just learn sql before try doing any backend stuff
@brokula1312 Před 10 měsíci
Why having client side caching is better then having the same thing dictated by the server using simple http caching mechanizm?
No need to update the client if caching policy changes and 0 code impact.
@WebDevCody Před 10 měsíci
Well for example on the home page, all new users would require a fresh load to the db, vs ISG would give everyone the same static content from a cache (much faster).
@reikooters Před 10 měsíci
I don't know your app so don't mind my ignorance, but it's not clear to me why you even need that Trial table at all. The purpose of a primary key column is to uniquely identify a row - there's no need have a column named "id" in every table just for the sake of it. If it is the case that you always query this table by the email, then you don't need an id column because the email column is what already uniquely identifies the row and that should be the primary key. Or, if you're looking up the trial id to then search a TrialId column in a different table, why not just have the email column on the other table and drop the Trial table altogether? The id column is a string anyway, so it's not like its using a smaller data type for performance benefit or something. That way you can search the other table directly and still add your index on the email column, then you can retrieve the data in 1 query instead of having to do 2 separate queries. I understand that multiple trials could share the same email, but then I again still don't get the point of the table since you could just have the email column in the other place the trial is used. Or if you just need a table to store a list of all the email addresses in your system, that's ok -- for example a list of registered users, but seems like no since you showed the 2nd table with a userid, so I guess you also have a users table -- but if yes then just make the email the only column and the primary key - then you also don't need the index, and as referred to by someone else in the comments, inserting to this table then requires only 1 write instead of 2. Anyway, for the other table which had more columns where you added the index on userid, then yeah that one makes perfect sense and you should do that.
@WebDevCody Před 10 měsíci ⁺¹
I have a trial table to track who has already signed up and received my initial trial credits. Since I allow users to delete their accounts and all data associated with the account, I can’t just check the user table for an email, so I have a separate table to track so that if it’s a new user I’ve never seen before I give them 1 free credit.
@reikooters Před 10 měsíci
@@WebDevCody That makes sense - in that case, it's as I said in my comment i.e. the part I mentioned about the email uniquely identifying the row - the only column you need is the email address which should be your primary key. You don't need the id column as it does nothing and therefore you don't need the index either. Your 2nd table example was a good one though.
Insert to your Trial table to not insert duplicate emails will look something along the lines of (MSSQL - may differ in the db you're using)
insert into Trial
(email)
select @email
where not exists
(
select *
from Trial with (updlock, rowlock)
where email = @email
)
@WebDevCody Před 10 měsíci ⁺¹
@@reikooters yeah that’s correct I can just use a unique keyword on that email column of the table and that’s good enough
@user-tb4ig7qh9b Před 10 měsíci
You need to optimize your skills with sql not your query man it is a b c sql databases use relations and joins
@user-tb4ig7qh9b Před 10 měsíci
Learn sql
Use explain keyword
Play with sql
And then say i am working as senior full stack

Další v pořadí

Automatické přehrávání