Why do Databases fail? AntiPatterns to avoid!

Gaurav Sen

zhlédnutí 111 985

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 18. 05. 2018
Databases are often used to store various types of information, but one case where it becomes an a problem is when being used as a message broker.
The database is rarely designed to deal with messaging features, and hence is a poor substitute of a specialized message queue. When designing a system, this pattern is considered an anti pattern.
Here are possible drawbacks:
1) Polling intervals have to be set correctly. Too long makes the system is inefficient. Too short makes the database undergo heavy read load.
2) Read and write operation heavy DB. Usually, they are good at one of the two.
3) Manual delete procedures to be written to remove read messages.
4) Scaling is difficult conceptually and physically.
Disadvantages of a Message Queue:
1) Adds more moving parts to the system.
2) Cost of setting up the MQ along with training is large.
3) Maybe be overkill for a small service.
In a system design interview, it is important to be able to reason why or why not a system needs a message queue. These reasons allow us to argue on the merits and demerits of the two approaches.
However, there are blogs on why Databases are perfectly fine as message queues too. A deep understanding of the pros and cons helps evaluate how effective they would be for a given scenario.
In general, for a small application, databases are fine as they bring no additional moving part to the system. For complex message sending requirements, it is useful to have an abstraction such as a message queue handle message delivery for us.
Links:
blog.codepath.com/2012/11/15/a...
softwareengineering.stackexch...
mikehadlow.blogspot.com/2012/0...
www.cloudamqp.com/blog/2015-1...

Komentáře • 123

@pythoniccypress6715 Před 6 lety ⁺¹⁹
Interesting talk. There are only 2 ways to my knowledge to persist data. One way is through file system, another is through database. Correct me if I'm wrong.
So if you are not using database because you think it does not scale, what alternative do you suggest?
@gkcs Před 6 lety ⁺²⁷
Hey Jasper,
Lol, I hear you. :)
I meant that we should use abstractions over databases. Invariably, databases lead to writing complex triggers and stored procedures to boost performance.
Message Queues handle the message pushing, persistence and other internal properties while we can focus on the business logic. They have a database internally, but we can use the whole queue as a black box.
As for the title, anything less concrete than my stance sounded a bit pansy. Like "Persistence for message passing should be abstracted using Message Queues". More accurate but less eye ball grabbing.
Thanks for the feedback 😋
@pythoniccypress6715 Před 6 lety
Hahaha, that's why! That explains a lot. Thanks!
@pythoniccypress6715 Před 6 lety ⁺⁵
I hear you! Probably the title as "Message Queue is easy. But persistent Message Queue is not easy" is more reasonable to me though.
@pha1994 Před 4 lety ⁺³
Isn't database really a file sytem?
@stoneshou Před 4 lety
You’re missing the all mighty tapes
@ChinmayKalegaonkar Před 5 lety ⁺²
Your content is amazing , please keep up the good work !
@Prashant110683 Před 5 lety ⁺³
You are doing a good a work excellently! thanks for being simple in the explanation.
@gkcs Před 5 lety
Thanks!
@AshutoshKumar-io6gi Před 6 lety ⁺²
Thanks Gaurav for a nice video!
@RICHUNCLEPENNYBAGS77 Před 4 lety ⁺²
We used to have this architecture at a place I worked (not my fault! it was there already) and it was the the number-one reason for after-hours production issues, so there's another reason to avoid it :)
@amarsrivastava7768 Před 2 lety
@Gaurav Sen , Hi the word ["Anti-Pattern"] you mentioned here is that something, is not usually the kind of pattern recommended over messaging queue, like Kafka and RabbitMQ etc. or its a good practice to use DB message queue most commonly, until or unless you doesn't encountered a need to use specific message queues
@harisridhar1668 Před 3 lety
If we have to use a database for cross server communication, in a small-scale system, is there a means to figure out the optimal polling time for each server?
Also - the one benefit of MQ is that we can set up two separate MQ queues per each server : one to read from, and one to write to. Doesn't this help optimize the system, with decoupling and SRP of each MQ queue?
@ankitpareek00 Před 3 lety ⁺²
I have query... you have mentioned that one of the drawback of database is that consumers has to poll but not in message queue, but as per my understanding many of the highly scalable messages queue are pull based for ex Kafka, SQS.
So you still hold pull based as a drawback for databases.
@praveenak Před 6 lety ⁺⁴
Hey Gaurav, keep up the good work. One suggestion - you should describe the characteristics of components you use. I saw in 2-3 videos you used message bus but I am not sure what it can support and what it can't.
1. Can multiple consumers read one message or only one customer can read one message?
2. Does it guarantee FIFO? There are few systems that require this guarantee.
3. Does it deliver "At least ones" or "At most ones" or "Only ones"?
If you clarified this already in another video, share here.
All the best!
@gkcs Před 6 lety ⁺¹
Thanks Praveen, I'll try and clarify this in future videos 😁
@mayuresh247 Před 6 lety ⁺⁴
Hi Gaurav, your system design videos are really helpful !!
Can you make a video explaining features of various databases like Oracle, Cassandra, MongoDB etc and how to choose among these DBs while designing a system?
@gkcs Před 6 lety ⁺²
Thanks Mayuresh! I'll look into this 😊
@shanaya_saanvi Před 3 lety
Deletion- Removing nodes from height balanced trees for indexing which could trigger shuffling of remaining nodes from one place.
@alexkubica Před rokem
What about databases such as PotgreSQL that support pub/sub functionality and don't require you to poll? Would it be better than message queues in small-mid size systems?
@girishanker3796 Před 3 měsíci ⁺¹
*In Rabbit MQ, the MQ pushes the messages to the consumer/consumers.
In Kafka, the subscribers pulls the records from the topic.
So there is a difference b/w pulling and pushing of messages in these different type of EDA models.
@gulshankumar-yq4uw Před 5 lety ⁺⁵
Hey Guarav , great video . One major thing that would want to add .
DB as a queue doesnt fit well for Distributed Systems . Consider an example when you have to update the state of the message when processed sucessfully (otherwise that message keeps getting polled). In disributed scenarios all servers must acknowlodge that the message has been sucessfully procesed by them and only then it can be marked as Processed . This is difficult to achive (can be done via SELECT FROM UPDATE which is costly) , might be inefficient to wait for all servers to ack. The other type of problems arise when you want to hash you messages to be read by defined servers , in those cases you might end up creating different columns for different types of messages which is again anti pattern.
@gkcs Před 5 lety ⁺²
Interesting point, thanks!
@indiansoftwareengineer4899 Před 3 lety ⁺¹
@@gkcs you are working too hard on these youtube community....
replying each comment and doubt .....
ohhh wow....
@gkcs Před 3 lety
@@indiansoftwareengineer4899 Habit 😛
@anastasianaumko923 Před rokem
Thank you so much for your work 😌
@ShoaibMohammed16940 Před 6 lety ⁺¹¹
Greetings, In the video you have mentioned that a messaging queue always pushes the data, Which to the extent of my knowledge is not true, as an example Kafka is a messaging queue where the consumers need to pull the data from the queue rather than the queue pushing the data to consumers.
@gkcs Před 6 lety ⁺³
I stand corrected, thanks :)
@pha1994 Před 4 lety ⁺²
@@gkcs How did you stand corrected? The above argument is contrary to what has been explained in the video? Do the queues push messages to consumers or do the consumers pull the messsages, which would imply polling?
@gkcs Před 4 lety ⁺⁵
@@pha1994 Kafka subscribers pull messages from the queue. Everything else mentioned in the video (queues are optimised for writes and provide abstractions over their persistence) still holds though.
@chandan9990 Před rokem
to solve the problem of load on db/ long intervals we can have a common cache b/w all the servers and db, will that be able to help?
@suyogkatekar8548 Před 6 lety ⁺²
Would like to know your thoughts if hbase can be used as message queue. As it is scalable also.
@gkcs Před 6 lety ⁺¹
Redis and HBase do seem to have reasonable persistence and performance. As a messaging queue, they should do fine :)
@rickyross8883 Před 2 lety
What about a database message queue like SQL Server Service Broker? Is that just as bad as naively using a table as a queue?
@Karmihir Před 5 lety
Could you please share the blog for Oracle AQ/DQ operations & pros and cons???
@singsarav Před 4 lety
We do not know how to use DB as Message Queue. Any example ?
@gopinathrajamanickam9475 Před 5 lety
How is Create is a Read operation ? It is also a Write DDS ( Data Distribution Service )
@stephyjacob1256 Před 5 lety ⁺¹
Hi Gaurav, can you talk about - Design of distributed locking.
@supriyantapoddar6129 Před 6 lety ⁺¹
Thanks
@artasheskhachatryan4804 Před 2 lety
What about kafka, it use pull model and provides great performance compared to RabbitMQ which use push?
@romannagel2414 Před 5 lety
Awesome!
@kirtanpatel797 Před 4 lety
Hey Gaurav! Just sharing a thought that came to my mind. Can we use something like exponential backoff for the polling interval to make it more compatible or efficient depending upon it's workload ? Like polling freequently if it's getting too many messages and increasing polling time when there is no message for it?
@gkcs Před 4 lety ⁺¹
You could, if you want to rate limit the events.
@kirtanpatel797 Před 4 lety
@@gkcs Thanks for the response. Yes, If we want to still use database like you said in case of less servers, we can combine it with this idea to manage polling interval because it's very tried and tested method whether we talk about Aloha methods for satellites or Ethernet protocols.
@randomanon1275 Před 6 lety
Do you have any plans of starting a series for Design Patterns and their implementations?
@gkcs Před 6 lety
Not yet... I'll take things one at a time 🙂
@sankalparora9374 Před rokem
Thanks - very interesting.
@gkcs Před rokem
Thank you 😁
@modernsanskari4398 Před 6 lety ⁺²
Awesome video. Can u pls upload videos on General question asked in System design inteviews(Design Google Docs,Uber,Twitter,FB or whatsapp,Tiny url etc)...
@gkcs Před 6 lety ⁺³
All coming up! :)
@uploder247 Před 6 lety
Your content is good. You can write on quora too to get good traffic.
By the way, are you with revealing where do you work.
@gkcs Před 6 lety ⁺¹
I do write on Quora. Just search for Gaurav Sen there 😋
I work at Directi as a Platform Engineer. It's mentioned on the quora profile 😛
@spicytuna08 Před 5 lety
thanks for the video. i am confused. in your previous video, you have mentioned that DB is a part of MQ. but here, you are saying that DB shouldn't be used in MQ. isn't this a contradiction? please help me to understand.
@gkcs Před 5 lety
Have a look at the comments here. We discussed this in detail 😊
@sambitbharimalla3069 Před 6 lety ⁺¹
I know eBay has been using Oracle DB as the storage system for its home grown messaging system, and its in place for more than a decade.
@gkcs Před 6 lety
Oh God, that is surprising to know. Oracle DB is tremendous when coming to performance. Postgres also has some notification capabilities, so it can be used as a queue.
I also read a blog defending DBs as message queues. So it is a little subjective...but most people agree that it is still undesirable as a design :)
@praveenak Před 6 lety ⁺¹
Their messaging scale has to be low or they have heavy operational load maintaining those DBs up and running. SQS or Rabbit MQ are best for Asynchronous processing.
If you need messages to be received in FIFO manner, you can chose SQS FIFO, Kinesis streams or Kafka. Of course, Kinesis and Kafka provides much more than just storing the messages. They also act as Streaming Map for your Streaming Map Reduce.
@karapelerin61 Před 5 lety
Hi, You said that (at 4.40) updating is expensive and deleting is a problem. Why?
@gkcs Před 5 lety ⁺¹
If the tables are read optimised, it is expected not to have too many writes. A very fuzzy and general statement, but true most of the times. 🙂
@deathstrokebrucewayne Před 3 lety
In case of polling from the message queue, there is still a read load on the queue right?
@gkcs Před 3 lety
Yes there will be
@chandrasekharpatra2416 Před 6 lety ⁺⁵
Kafka can be used as a message queue right ??
@gkcs Před 6 lety ⁺¹
That's correct!
@pythoniccypress6715 Před 6 lety ⁺¹
I think you need to ask whether or not kafka is also internally using a DB to persist messages.
@ShoaibMohammed16940 Před 6 lety
Kafka uses files, log files to be precise to store the messages
@ShoaibMohammed16940 Před 6 lety ⁺³
But if you have binary data as part of the message then you will be needing a database set up for kafka.
@aakashjolly2579 Před 5 lety
You said that databases are not optimized for both reads and writes, but what about NoSql databases like Cassandra which uses majority quorum phenomenon to be fast considering 'eventual consistency'
@gkcs Před 5 lety
What about them? Do you think their reads are fast?
@aakashjolly2579 Před 5 lety
@@gkcs I think Cassandra db can be configured to be both reads and writes optimized because it relies on eventual consistency and other architecture decisions. If there are N replicas of a node/db, if i can successfully read or write to more than N/2 nodes( nodes are replicas of the db for fault tolerance with timestamp) that is enough. The system is not strongly consistent but very high throughput.
@rajumondal4283 Před 6 lety
If someone is designing a WhatsApp like app how should they design their architecture for high loads?
@gkcs Před 6 lety ⁺¹
I'll talk about this soon :)
@Karmihir Před 5 lety ⁺¹
I think Rabbit MQ also a good option for EBS application!!
@gkcs Před 5 lety ⁺²
What's EBS?
@pha1994 Před 4 lety ⁺¹
@@gkcs EBS is what AWS calls a volume.
@pieceoffake7443 Před 4 lety ⁺¹
Ironically, this is the exact pattern used under the hood by several well-known AWS services.
@gkcs Před 4 lety ⁺¹
The database pattern?
Let me know which ones. I've love to read about them :D
@pieceoffake7443 Před 4 lety ⁺¹
@@gkcs I don't think there's any public documentation about this, and I probably can't give too many details. But basically for a lot of services that involve provisioning/de-provisioning instances, it makes more sense to persist the events into a database in an append-only fashion and then have consumers poll.
You might have one service that's polling to find instances that need to be provisioned, another service that's polling for instances that should be de-provisioned, etc. And then to calculate billing, you have a nice immutable record to go back to.
This is better than a message bus because it gives you stronger durability guarantees and you don't really need the near-real-time capabilities of the push model. For example, it's ok to have a 5-10 second delay before an instance actually gets de-provisioned, and since the row in the DB contains the time stamp of the de-provision request you can still bill based on the time the request was made rather than the time it was executed.
@kamesh231 Před 6 lety ⁺¹
One question. How can we optimize database for read operation?
@gkcs Před 6 lety ⁺⁸
The simplest way is to buy a database that is optimized for read operations. Eg: MySQL.
Conversely, databases like Cassandra are optimized for write operations.
You can use caches to make read faster, but that complicates writes.
Adding indexes is also helpful for fast reads, but write operations become slower.
@kamesh231 Před 6 lety ⁺¹
Thanks Gaurav Sen, this is helpful
@gptankit Před 6 lety ⁺¹
If you have far more read operations than writes, you can denormalize your dataset, which means keeping redundant copies of a field in multiple tables (same fields generally queried together).
@futurezing Před 6 lety ⁺¹
It is not just cache but the data structure used behind it. For instance, Cassandra uses LSM Tree in which writes are cheap.
@pha1994 Před 4 lety
Read replicas.Write only to the master. Read only from the slaves.
@dapdizzy Před 5 lety ⁺²
If timing is not that big of a concern for you (poll once every minute, ten minutes, hour, etc) then there is no problem in polling.
The statement should go like this: if you need frequent polling, then Database is not a good solution.
A database is usually both read a lot and written a lot. Most of the databases are a combination of OLTP and OLAP. Optimal is a theoretical term, production design considers trade offs.
If you use optimistic locking for marking the records as “in processing” then the polling and marking the records for processing should be pretty performant.
You talk a lot about “how consumer knows whether there are new entries for processing”, but this does not prevent this design for solving its problem, then why put a bold tag like
“A bad design”, “antipattern”? It’s just a casual judgement unless you know all the trade offs of the particular solution and the problem it solves.
Keep up with the good stuff non the less.
@dapdizzy Před 5 lety
Dig into the details, don’t judge a book (solution in this case) by its cover (technologies chosen). All the best to you.
@gkcs Před 5 lety ⁺¹
I agree with your points, and there has been a lot of discussion on this in the comments below. To be honest, that was the purpose of this video, to call out this pattern and then see if it is as bad/good as it sounds.
@dapdizzy Před 5 lety ⁺²
Gaurav Sen thanks for your reply, I highly appreciate it. From my side, I want to apologize for being too assertive in my judgements (that was partly emotional part of it). I liked your points and generally definitely agree with the “anti pattern” classification. Just wanted to point out some spots you didn’t cover and outline situations when the judgement could be controversial. You make great videos and I appreciate your openness and broad view on the topic of software design. Keep up with the good stuff! I will be eager to discuss it and share the view from my experience. All the best to you, Gaurav!
@indiansoftwareengineer4899 Před 3 lety
Optimistic locks is Oxymoron isn't it? as both words have opposite goal...
please tell me if I am wrong.
@dapdizzy Před 3 lety
@@indiansoftwareengineer4899 when it comes to real update/insert the lock happens anyway, though it is short lived.
@kireeti93 Před 6 lety ⁺¹
Hey buddy, what do you do? where do you work?
@gkcs Před 6 lety
I work as a platform engineer at Directi 😋
Designing systems and problem solving is a part of my job
@MrAmitkarak2010 Před 6 lety ⁺¹
See oracle advanced queue, where polling is not needed, its a publish subscribe model with highly scalable reads and write,the things which is explained is writing in database and reading from table and its not queues, plz correct me if i am think in ither direction and you are explaining the concept publish subscribe....i have used Oracle AQ bound to JMS queue on 3 node Oracle rac server to write 2million messages in a sec and same for read....so its highly scalable...
@gkcs Před 6 lety ⁺¹
That's correct Amit! Postgres also has some notification capabilities which are useful when storing messages in them. In fact, some people argue that databases are perfectly fine as a message queue.
I personally think that the db is okay as a message queue until you hit a large interactivity/scalability requirement. Most applications can work around the db design issues, but it is recommended to try out dedicated message queues before you make that decision.
Good point though :)
@MrAmitkarak2010 Před 6 lety ⁺¹
Gaurav Sen well gaurav, my decision point as I basically work with Oracle and have architected products for ofgpine mode of data inside the hospitality domain which is supposed to work in cuise, hotels, arena etc, the problem that i look is whether its bulk update and event handling of data ir its transactional.handling of data, if it bulk handling od data replication should be used to trabsfer data and if its transactional it should be queue based, the probpem is in older days with on premise solution u are allowed to call service from pl/sql to http request to submit transactional data, but when u go over cloud its not going to be allowed any outbound calls from the database layer, so u need event mechanism to consume transactional data and ypu left with two option, polling or messaging, and believe me Oracle handles it best,
To be frank it depends on the contraints that you are working on and when you draw a context map you know the boundary that you have...design should be context driven there is no thumb rule, but i agree woth the video polling on DB is never a good solution, but its not publish subscribe mode where sockets are opened to consume.messages which oracle support and i heard that mysql is going to support in next release..
@gkcs Před 6 lety
Interesting stuff. I'll look into it when it's out :)
@rogernevez5187 Před 5 lety
And how to use a message queue to submit mysql jobs?
@gkcs Před 5 lety
MySQL must be having a cron. Send messages to it 🙂
@techfornoobs4241 Před 6 lety
thanks, awesome video!
@gkcs Před 6 lety
Thanks!
@kirtanpatel797 Před 6 lety
Does whatsapp use the same tech in backend?
@gkcs Před 6 lety
WhatsApp does need to replay messages in case they aren't received by the client. I don't know if they use an abstraction over the database, but they need some sort of message queue for that.
@kirtanpatel797 Před 6 lety
I don't know much about this stuff . But as you taught virtualization of servers for load balancing. How does the idea of having virtual database for each pair of communication look like !!?
@tdsora Před 5 lety
Is this something common in the wild? I've never heard of anyone ever doing this
@gkcs Před 5 lety
It is, and it's got it's merits. You could have a look at the description for links to these cases.
@JM_utube Před 4 lety
absolutely done in the wild, especially when you need a front end internal tool to monitor the status of your "jobs"
@rajanchoudhary7432 Před 6 lety
Can you suggest a book for system design?
@gkcs Před 6 lety ⁺¹
I haven't found a specific one yet. Will post in case I do :)
@blasttrash Před 6 lety ⁺¹
Hey gaurav where did you learn all this stuff? from ur college? or any other books or resources etc?
@gkcs Před 6 lety ⁺¹
Hey Trash blaster :D
I condense the info I read from the internet for these videos. Common sources are the highscalability blog and Tech conference videos.
@blasttrash Před 6 lety
Thanks. :P :)
@himanshukandwal5373 Před 6 lety ⁺²
I think databases like Dynamo DB form the basis of data streaming systems like AWS kinesis.
@gkcs Před 6 lety ⁺¹
Hey Himanshu! Somewhere down the line, the message queue will need persistence, routing, etc...
It is going to be using a database internally, but the important thing when designing it is to think of that functionality abstracted. Then we can focus on the task of sending messages instead of a implementing persistence for it.
@himanshukandwal5373 Před 6 lety ⁺³
Yes, I totally agree that developers while developing applications, should think (Message/Task) Queues as an Infrastructure component rather than managing the herculean task of statefulness and consistency of the queue by themselves.
However, In my view, what I think of a good and valid System Design question could be 'how would you design a distributed queue like Rabbit MQ or SQS' just to see how a developer can envision and address scalability, TTL, acks/nacks, and exactly-once/atleast-once delivery scenarios, if keeping aside ACID compliance.
@gkcs Před 6 lety ⁺²
That's a good point. In the context of this video, I was talking about message queues as abstractions to internal details :)
@himanshukandwal5373 Před 6 lety ⁺¹
Yes Gaurav. I really liked the video. Great going, dude !! :)
@futurezing Před 6 lety
If one persists something, does that make it a database? Answer is No. You store texts in files, is that a database no. Kafka uses polling on the consumer side and it is a messaging system. Just few thoughts.While, we are it, this is a good article on the same topic - www.cloudamqp.com/blog/2015-11-23-why-is-a-database-not-the-right-tool-for-a-queue-based-system.html
@yansruan1395 Před 4 lety
避免使用数据库做消息队列
@gkcs Před 4 lety ⁺¹
我同意！
@vishweshsingh1548 Před 6 lety
How can I contact you brother ? Any email Id where I can reach you ?

Další v pořadí

Automatické přehrávání

What is a MESSAGE QUEUE and Where is it used?