Distributed Scheduling with Spring Boot: the challenges & pitfalls of implementing a background job
Vložit
- čas přidán 6. 06. 2024
- Spring I/O 2024 - 30-31 May, Barcelona
Speaker: Rafael Ponte
Slides: speakerdeck.com/rponte/distri...
Sooner or later a developer will implement his/her first background job using Java and Spring Boot, and what usually is a simple task for the majority of systems might become a nightmare in scenarios that need to deal with high performance, parallelism, distributed systems and a large volume of data. Scenarios like those hide several issues which many developers are not used to, such as large volumes of data, network failures, data inconsistency, out-of-memory errors and even taking the whole system down.
Although it seems controversial, dealing with many of these problems does not require hype technologies or services, but solid distributed systems fundamentals. This talk will present how an experienced developer implements a background job with Java and Spring Boot taking into consideration the main challenges and pitfalls it brings along, and how he/she designs a solution for high-performance, resilience and horizontal scalability at the same time he/she takes advantage of many modules of Spring Boot, Hibernate and the relational database.
If you still believe that a background job is a simple task, so this talk is for you! - Věda a technologie
I've seen this presentation in portuguese before of Rafael Pontes in Zup Channel, and I could implement something similar it in my job. Great work, Bro! Thank you so much
Hi Felipe,
Thanks for this comment and for having watched both versions of the talk. ❤
Congratulations on your presentation! You absolutely nailed it. Your thorough research and confident delivery captivated everyone in the room. Your ability to explain complex ideas so clearly is truly impressive. Keep up the fantastic work!
Thanks for the kind words, Eduardo! ❤
this for me is the best presentation. Great job
What a comment! Thanks for that! ❤️
Great presentation, great work. Thanks a lot for sharing this knowledge with us!
Thanks so much! I am glad you liked it 🥳
Parabéns, Rafael! Foi um prazer assistir sua apresentação pessoalmente!
Obrigado demais, Rapha! ❤ Você eh top!
Thanks Rafael! especially for the SKIP_LOCKED feature, new knowledge learnt
Thank you so much! I am glad the talk was helpful for you! 🥰
And yeah, SKIP LOCKED is fantastic!! 💪🏻
I really like the way you explained short running transactions. Nice addition to the jobs! Parabéns pela excelente apresentação! É muito útil!
Thanks so much! I am glad you liked it 🥰
Thank you for clear and well-structured presentation. It's very useful and important information even for people with lots years of experience. I wish every developer should watch this video when every time they put @Transactional onto theirs method.
Thanks for the kind words! I am glad you enjoyed the talk! ☺
Parabéns, muito show!
Nicely done @RafaelPonte.
Thank you! ☺
Parabéns meu irmão , você deu um show na apresentação, impecável! show de top!
Obrigado, meu irmão!
Excellent topic! Have some background jobs running here and there and I definitely going to check them again.
Nice! I am glad this talk was helpful to you! 👊🏻
Beautiful presentation, thank you
Thank you so much! That's very nice you liked it! 🥰
Great talk!! so much learnings and addressed real life problems I faced while writing background scheduled jobs... btw we used ShedLock library but this is real good insight.
Thanks! Nice you liked it!! 😊
By the way, ShedLock is a very cool library! 👊🏻
you are an amazing presenter thank you so much learned a lot
Thank you so much!!! I am happy this talk was helpful for you 🥳
Great talk! There are a few Java libraries that already solve these challenges (db-scheduler, JobRunr or Quartz). At JobRunr we'd love to share your talk as it explains JobRunr's architecture well and can help our users understand the challenges of distributed scheduling even better!
Thanks for your comment! I'm glad you liked it! ☺
Please, I would appreciate it if you shared it! By the way, I received great feedback from Ronald, the creator of JobRunr-he watched my talk! He is a fantastic guy! ❤
@@RafaelPonte You're too kind 🤩!
What is your opinion on nflow Java library? Thank you for video
@@marshall143Thanks for the comment! 😊 I didn't know nFlow, but I understand that if your context allows your team or project to adopt a task scheduler or workflow engine, you should go with it.
Usually, those libs and frameworks make the developer's life easier because they address very well all the issues discussed in the talk.
Congrats for your amazing presentation, Rafa!
Thanks, Jess! ❤
Amazing! Congrats Rafa!
Thanks, I'm glad you liked it ☺
Parabeeens manooo! ficou top! sucesso
Obrigado! Feliz que curtiu ❤
Congrats Rafael! Parabéns Rafa!
Thanks so much!!! 🥰
O Rafael é fera demais!! Great presentation
Brigadão!! ☺
mandou bem, parabéns!
Obrigado, Diego ❤️
thank you
you're welcome! ☺
excellent lecture 💚
Thanks, my friend!
Great job Rafael!
Thank you ☺
Muito bom!
que massa que gostou 😊
Great content!
Thanks 🙏🏻
Great talk! Did not catch all the red flags in this :)
Thanks! I am glad you liked it!! ❤
Nice!!
Thanks! ❤
Parabéns marajá! 😉
Brigadão ☺️☺️
What a prince 💛🔥
Thanks, Luis ❤
Braaabo de mais. Parabéns, príncipe do oceano kkk 👏👏👏
Brigadão, Junior! 👊🏻
@@RafaelPonte Parabéns Rafael! Compartilhando com todos do meu time! Abraço.
@@benicioavila obrigado ☺️ E valeu por compartilhar!! ❤️
Great talk and lot of cool new (for me) information about Spring/JPA semantics! But not much of this is specific to background jobs, and not much in the talk about generic background job processing. So I'd say the title is a bit misleading.
Thanks for the comment 😊 I am glad the content was helpful for you!
Out of curiosity, what do you understand as background jobs and job processing, and what do you expect from a talk about these subjects?
é o cara! boooraa!
Valeu Mustafa 👊🏻
👏👏👏
thanks!!!
boa ponte!!!!!
Valeu, Flávio 😊
Really great talk!
But I am curious that if 2 save statements already wrap in 1 small transaction how can it combine with the hibernate batch with another save statement process
Thanks for the comment and feedback 😊
I am not sure if I understood your question correctly. Could you elaborate a little bit more on it?
Hi Rafael,
In the scenario of this video, we are using short-transaction to save data to the database so I think each transaction should be isolated so they can't be wrapped in one batch like your example INSERT INTO ... Values (A),(B)
@@mindcontrolkmc.3286 Yeah, the idea is precisely that! For each batch (chunk) of 50 rows, Hibernate will group (and reorder if needed) each INSERT and UPDATE inside that short-running transaction and convert them into only two single statements right on the commit.
Nice explanation! But did not cover very important case if your app has more than one job marked with @Scheduled annotation. Because it may be crucial moment of performance. May be it will be covered in next topics.
Thanks for the comment 😊 Nice you liked it!
I am not sure if I understood what you mean. Usually, a single application has multiple @Scheduled jobs running concurrently doing different things (sometimes at other times).
Could you give more details?
@@RafaelPonte If you do not specify in application.yml thread pool size for scheduler explicitly all jobs will be operated by one single thread.
@@YuliSlabko Thanks for the explanation. Now I got your point! ☺
You're right. If your application runs multiple jobs close together or jobs that take too long to finish, tuning the Scheduler's thread pool size is essential. 👊🏻
Does a single @Transactional annotation for Scheduled method (in case of JPA framework) fix the original code right away?
Thanks for the comment.
It depends on which problem you're talking about.
In the talk’s context, it solves only part of the problem: it makes the whole operation atomic and recoverable but causes a few side effects.
what's the difference between reading and writing with a rabbit or kafka and reading and writing with a database?
Usually i'm using REDIS for solve same problem, because it much faster than usual relation db
Thanks for the comment.
I will ignore the trade-offs of having a new component in the infrastructure now and focus only on the developer's perspective.
There are differences, but how they can impact your solution depends on your context. I mean, using Kafka or RabbitMQ in the talk's job perspective may have little difference on the job's code, but in the application perspective, which produces events in the queue, we may have to deal with a dual write issue.
The same is true for Redis: it depends on how you're using it, such as a distributed lock provider or a message queue.
Ummm...
Distribution topic starts after 27 min.
Using db locks is tricky and works differently for different databases, e.g. lock escalation. Better use an app level locking.
All that had not really to do a lot with jobs. Just long running tasks in a distributed system.
Do you have a resource recommendation on app level locking? I'm studying the topic and it would be awesome to see it more detailed. Thanks
Thanks for your comment ☺
Distributed systems are tricky, and database locks have worked well for over 30 years. Although some databases might differ, an exclusive row-level lock works similarly. By the way, a few RDBMS suffer from lock escalation, but not PostgreSQL (which was used in the talk's context); in addition to that, we used many approaches in the talk that mitigate the chances of lock escalation 💪🏻
Regarding application-level locking, PostgreSQL offers Advisory Locks as an excellent alternative to row-level locks. They're very light and are handled by the application side.
In my understanding, `select ... limit 50 for update` would directly lock these 50 rows, instead of locking one row and processing one row at a time. But in the video, it seems to be the latter approach. Why is that?
He just presents it like that for a purpose of presentation. Of course it will lock all 50 rows (as long as they meet select criteria and are not locked already). Overall this is a very basic presentation, not sure what was the point of that.
@@wukash999 I think the point is to introduce to more unexperienced people the possibles problems one might encounter, so you can study further on it (at least for me it worked ,since I've never thought or knew about this problems), not to make a thourough implementation guide
Thanks for your comment ☺
As @wukash999 commented, the idea was to make it as didactic and accessible as possible so that junior and inexperienced developers could understand it.
Do you think it got confused?
@@wukash999 Thanks for your comment and helping them to understand my intention ☺
Do you think this was an introductory and basic talk? I'm afraid I have to disagree. The talk was designed to simplify the subject and make it accessible for everyone, but it's still a complex, tricky, and detailed theme.
Parabéns Rafael, Zerou game do Java.
hahaha, valeu bruno!!!
@Transactional Will this works if You have to call a mongoRepositoy and Kafka template ?
All or nothing
If Kafka call KO
The mongo call also ?
Thanks for your comment ☺
Although MongoDB and Kafka support some level of transactions, I don't know how @Transactional annotation would work with MongoRepositories or KafkaTemplates. It's worth reading the Spring Data docs.
But it's important to be aware that you do NOT have an atomic operation (all or nothing) when your code mixes different external service calls, like PostgreSQL, Mongo, and Kafka. When you do that, you hit a common issue in distributed systems called "dual write".
@@RafaelPonte I have the same use case where i need to write to mongo, kafka and also to google cloud storage bucket within the same transaction. Do you by any chance know how to solve this problem so I get a all or nothing? Or if not possible, how we would solve this problem then….
@@RafaelPonte obrigado :)
For mongo, you can spin a new session with transaction as well, manually. However for Kafka if the produced records are idempotent, you can use the mongo transaction support above to achieve the same.
Adorei a conversa, mas não sei se queria falar sobre Spring Boot ou se candidatar a político, hahaha.. brincadeira!
Hahaha, valeu! 😊
Rafa is humble, Freak and beatifiul
Hehe, you're very kind, my friend! ❤
Is he describing Spark 😆?
Thanks for the comment 😊
Do you mean Apache Spark? hehe
Almost made me want to work with boring techs again ;)
I’m moving back to Java/JVM after 15 years in Node/JS/Python
Boring techs are amazing! 🙌🏻
Congrats, nice job!
Thanks, Barroso! 👊🏻