Workflow Orchestration for Building Resilient Software Systems

CodeOpinion

zhlédnutí 17 374

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 07. 2024
Building resilient software systems can be difficult especially when they are distributed. Executing a long-running business process or workflow that involves multiple different service boundaries requires a lot of resiliency. Everything needs to be available and functioning because if something fails mid-way through you can be left in an inconsistent state. So a the solution? Removing direct service to service communication and temporal coupling.
🔗 Solace
solace.com/codeopinion
🔔 Subscribe: / @codeopinion
💥 Join this channel to get access to source code & demos!
/ @codeopinion
🔥 Don't have the JOIN button? Support me on Patreon!
/ codeopinion
📝 Blog: codeopinion.com
👋 Twitter: / codeopinion
✨ LinkedIn: / dcomartin
📧 Weekly Updates: mailchi.mp/63c7a0b3ff38/codeo...
0:00 Intro
1:03 Distributed Monolith
3:53 Temporal Coupling
5:50 Orchestration
#softwarearchitecture #softwaredesign #codeopinion
Věda a technologie

Komentáře • 85

@Pretence01 Před 2 lety ⁺⁹
Increased resilience is one of the added benefits of using messaging, another important one would be that it effectively enables you to emulate a distributed transaction in a microservice environment by using an outbox on the sender side that defers publishing the outgoing message until its own state was successfully persisted.
@charlesopuoro5295 Před rokem
Thank you very much for this hands-on, pragmatic approach to understanding Workflow Orchestration.
@shashanksingh9238 Před 2 lety
Thank you so much for explaining this. My Company is using Netflix conductors and documentation for it is very complex. Thank my Tech Gods that I landed here :-)
@JosueIbarraNinja Před 4 měsíci
Great explanation of workflow orchestration! I’m part of the Cadence team at Uber
@CodeOpinion Před 4 měsíci
Nice! Thanks for the comment and some validation 😀
@avineshwar Před 2 lety
I usually think about services that are being built as not just one service, but, a pair of services:
- main service ("the very core" (perhaps, an even refined version) of business logic code)
- main's supporter service (for development simplification, e.g. steer back a database to a consistent state)
- any other infra piece needed to support this (e.g. object store)
Once we could model and generalize, we should have a working model for a broad class of issues.
I was hoping you think about it.
@alexsiuwh Před 2 lety ⁺¹
I have been with WFL orchestration for 20 years , 100% with you technically, but what makes project fails are mainly human factors and work politics in user environment
@CodeOpinion Před 2 lety
Ya, pretty much the case with everything.
@loren-sr Před 2 lety ⁺⁷
Great video, thanks for making! I love Temporal, and work there, if you have any questions! Main differentiator is orchestrating via code instead of via JSON/YAML, and we have SDKs for Go, Java, PHP, TS/JS. Python/Ruby/.NET/Rust SDKs are in development.
@CodeOpinion Před 2 lety ⁺¹
Was just recently looking at the GitHub repo for the .NET SDK for point of reference to see what it looked like.
@avineshwar Před 2 lety
So, just to be sure how Temporal works, if an "n" step process (Workflow) is executed and if somewhere a step (Activity) "x" is considered (through business logic code) to be failing, then we c/would re-attempt our workflow from the step "x" / "x-1" (assuming idempotency)?
If yes, does that mean Temporal is trying to influence 2 (arguably huge) scenarios:
- simplify failure management pattern (reduce code/implementation branching depth on left or right)
- simplify developer lives by letting them not worry about simple enough things
Those are big things, but, maybe someday we will get:
- detect an underlying reason for a certain observation (e.g. exception due to socket close) and deal in some standard way
@loren-sr Před 2 lety ⁺¹
@@avineshwar yes and yes. For the last, you can certainly handle in code different errors in different ways, and can do so across all Activities and Workflows. It would be nice someday to have some automated ML-based error handling, and a system like Temporal is a necessary base for that, since we not only have information on all failures, but are also the orchestrator deciding what to do next.
@avineshwar Před 2 lety
@@loren-sr I see (when I think about "fast af" operations and AI, seems like apple/oranges by today's standard). Thanks. All good information.
@morespinach9832 Před 5 měsíci
Temporal doesn’t have a visual flow charter like camunda. Correct?
@buildingphase9712 Před rokem ⁺¹
I get the point however payments are probably going to happen client side with a redirect to the payment gateway and a success message call. But point taken in terms of async messages.
@sathyajithps013 Před 2 lety ⁺³
Cool vid. My first exposure to something like this is in MassTransit Sagas. I think this example can be done using Masstransit Sagas + Courier. Perfect for these kind of scenarios.
@TheRak00792 Před 2 lety ⁺¹
Routing slip can definitely work for the provided example. I'll prefer coupling it with a state machine for complex scenarios though
@morespinach9832 Před 5 měsíci
As in choreography?
@rafaspimenta Před 2 lety
Hi Derek, thank you for the great contente as usual. Could you tell about the tools that you use to draw architectural diagrams and you thinking process to build one?
@CodeOpinion Před 2 lety ⁺⁴
I just use PowerPoint, nothing special. In terms of how I make them, specifically for a video, I've been thinking about making a video about that
@softcoda Před rokem
How do you know if the service is down and would like to take an action subsequently
@morgadoapi4431 Před 2 lety ⁺¹
Thanks for the video
@CodeOpinion Před 2 lety
You bet
@morgadoapi4431 Před 2 lety
Kafka provides transactions but in a more technical sense. These transactions are to guarantee that either a collection of messages are written to many topics or not at all.
@nav201182 Před 3 měsíci
In case we want to buy a tool from market to perform workflow orchestration, which tool you would recommend?
@saharis811 Před rokem
Greate video as usual Derek!!
I have one question, when for example one of the services in the workflow is in inavailablity state, now this service is unavailable and is a bottleneck, it's not a failure for this point in time, but we don't really know until it will be available again when it will be available it will process the request and then will publish success or failure, it can happen instantly, or after days, we don't really know.
We need to respond to the client instantly. How can we handle this from client side?
@CodeOpinion Před rokem
Well the key is you don't/shouldn't need to respond to the client instantly. The simplest example is placing an order. You don't need the entire workflow to succeed/fail to respond to the client. You accept the request to place the order, you do so and return to the client. The payment processing and everything else involved can be async/out of process. If the payment service is unavailable or the payment fails, you'd email the customer to notify them. I talk about this a bit more in this video: czcams.com/video/wEUTMuRSZT0/video.html
@dmsanz_youtube Před rokem
To keep that workflow "state" is it necessary to use some kind of saga or similar? Or is it enough with having service Ordering state (i.e: aggregate) capturing all the "distributed state" in order to react with compensating actions, etc? Or would that be too aware of other domains and a violation of separation of concerns?
I suppose sagas (with mass transit, nservicebus) help a lot with these things if we want to have this external orchestration. But what if we don't use a message broker and we simply have event streams the services are subscribed to?
@CodeOpinion Před rokem
What you're referring to is more event choreography where you don't have centralized orchestrator, but each boundary is consuming events and publishing events. Check out: czcams.com/video/TA12e2ZJcGg/video.html
@rcts3761 Před 2 lety
Do you know some good strategies to make sure that services which publish commands reliably process all possible "response" events? For example, a developer might add a new response type to the responding service and forget to update the event handler in the commanding service.
@b1ueocean Před 2 lety ⁺¹
responding service doesn’t care about the commanding service (or any other) in your scenario above - responding service simply lands a response event in the broker.
commanding service emits messages without knowing the who and the how regarding response events.
if a specific event handler is missing from the commanding service how has it been released while falling short of the Definition of Done? 👈
Even if you rely on a list of supported events in the commanding service’s configuration and hand roll verification to ensure supporting handlers are available/registered, such configuration needs to be kept up-to-date.
easiest strategy is good testing practices 😋
@maxkomarow Před 2 lety ⁺¹
Thanks for the video. Isn't it a saga orchestration pattern that you described? And also I wonder how the orchestrator can be implemented without frameworks. It seems like he has to have its own tables in a service database to check completed events. If so, we have to have something to get and save the orchestrator. Repository? But the orchestrator doesn't seem like a part of the domain model, more like a part of an application service. Would be glad to hear your thoughts on this
@CodeOpinion Před 2 lety ⁺⁴
It fits more on top of messaging infrastructure. There can be state involved which is needed if you want to keep track of which events have occurred or something in their contents to then send a command. If state is involved, you'll likely use a library to handle this. The alternative is event choreography, which I will cover independently in a separate video.
@juliancyanid Před 2 lety
@Petar Vucetin aka "Routing Slip"-pattern. My first goto when flow is linear, especially if team/codebase has no existing orchestration (it's not only "stateless", it's also dead simple).
@jalalalmutawa4889 Před rokem
Hi Derek, how should we handle the failure of a service that has updated its database but failed before sending an event/reply?
@CodeOpinion Před rokem
One option is the outbox pattern: czcams.com/video/u8fOnxAxKHk/video.html
@BonnakChea Před 2 lety ⁺²
Thank for the video. It helps me a lot. However, I couldn't find one that supports orchestration for Nodejs. Really appreciate if I can get a stable one.
@loren-sr Před 2 lety ⁺¹
Temporal's Node/TypeScript SDK is pretty stable, and hitting v1 soon (which won't have any major breaking changes).
@BonnakChea Před 2 lety
@@loren-sr Thanks a lot.
@rockmanjacky Před 2 lety
That's a very good video to explain the message queue system design, but how can we prevent the single point of failure if the message queue is down?
@CodeOpinion Před 2 lety ⁺¹
Good question! It's core infrastructure, no different than a database. What do you do to prevent your database from being a single point if failure? Generally, a cluster for high availability and also use the outbox pattern: czcams.com/video/u8fOnxAxKHk/video.html
@AntonioRonde Před 2 lety ⁺¹
Why do we place the workflow inside a service like ordering instead of in the broker? This way the start of a workflow would go the same route as any ongoing workflow. Don't know if this is possible / advisable, just wondering about any tradeoffs.
@CodeOpinion Před 2 lety
Not entirely sure what you mean by of "in the broker". Ultimately you're using a broker in a more of a "hub and spoke" way to broker messages between services. How you interact and handle those messages will be something you need to write in code. Can't say I'm a fan of using anything that's based more on markup/metadata like json to define the workflow.
@juliancyanid Před 2 lety ⁺¹
Mt 2 cents:
In this example, probably the _business_ of 'ordering' depends on (business) capacities of payment and warehouse. If your business had a "sales", maybe _that_ would talk to payment. Business workflows belongs to some business context, and is bound to change with the way business works. That's also why you don't want the flow-logic (orchestration) in some generic component.
@AntonioRonde Před 2 lety
@@CodeOpinion Yes, we use the broker as a "hub and spoke" for messages. Except for the invocation of the workflow at 6:15, there the client directly invokes the workflow at a service and doesn't follow the route via the broker. My question is why not use the broker here? Would it make sense to also relay this first message/call that invokes the workflow via the broker?
@ItzukiTheDemon Před 2 lety
How would you notify the client about the success or failure? Is you depiction here a fire and forget from the “client” to “ordering”?
@CodeOpinion Před 2 lety ⁺²
It depends on what the workflow is and if the client is "aware" of what's actually happening. In the case of an Order, you simply tell them the order was placed initially. If there is an issue with their credit card or payment, you can email them etc. It really depends on the exact use case. Long running process means it can take milliseconds or it can take days or weeks even. It all depends on what the workflow is. Check out the video I did on using WebSockets as a means to push down to the client. czcams.com/video/Tu1GEIhkIqU/video.html
@ItzukiTheDemon Před 2 lety
@@CodeOpinion I’ll definitely check out the video. Thank you!
@selvakumars6487 Před 2 lety ⁺¹
Hi Derek, Is this not chroreography as there is no master process co-ordinating the flow like orchestration ?
@CodeOpinion Před 2 lety
This is orchestration as there is something controlling the workflow. I'm using a combination of sending commands and consuming events (or replies) in the orchestration. Choreography would be each service consuming events and publish events without any central knowledge of the workflow.
@chengchen9032 Před 2 lety
@@CodeOpinion I have the same question for the video. because I was assuming there would be a master process with central knowledge of the workflow which would use that knowledge to pick up events from the MQ, convert those events to specific commands and then put it back to the MQ. so the chain would goes like
service A -> Event -> MQ -> Orchestrator -> Command -> MQ -> service B.
@CodeOpinion Před 2 lety ⁺¹
@@chengchen9032 Exactly. There's different tools/libraries/frameworks that can help you orchestrate. In a bunch of my videos I use NServiceBus which is a .NET Library, but you can also use other tooling such as Termporal (temporal.io) that has SDKs for bunch of languages/platforms.
@selvakumars6487 Před 2 lety
I got it, ordering service (the one with the workflow label) owning the workflow and orchestrates it. I assume, If there is any failure, the same flow expected to issue compensation for all the involved parties.
@CodeOpinion Před 2 lety
Correct
@vicradon Před 2 lety
So an example of such workflow orchestration tool will be Airflow right?
@CodeOpinion Před 2 lety
Yes as well as other tools like Temporal or messaging libraries that provide a stateful saga.
@maartenlouage Před 2 lety
How about using Azure Durable Functions for orchestration?
@katjejoek Před rokem
That's what I was thinking as well. I guess it depends on how the asynchronous communication is done between the parts. What happens if something is not available? Will it be picked up once it is back or will it fail immediately?
@andreashe36 Před 2 lety
Can you make a video, how to aggregate if models from a different domain is needed? Eg if I need summary sells of a product as a projection. But product and order may be processed in different domains and different event stores. How does the order get info about product details?
@CodeOpinion Před 2 lety ⁺¹
Ya good suggestion.
@mrxscheia Před 2 lety
There’s a great video on this topic from Mauro Servienti czcams.com/video/hev65ozmYPI/video.html
@pelaoinfo Před rokem
I'm studying this in deep since I'm modeling a whole system, but I can't still understand whether Orchestration is async (messages queues) or sync (req/res approach), most of the info found tells Orchestration is sync req/res.
@CodeOpinion Před rokem
I'd say more often then not it's async because of resilience and failures. Check out czcams.com/video/LMKVzguhFw4/video.html
@lilivier Před 10 měsíci
But what if after creating your order in database you never receive a message back from the next service ? You have a 'ghost' order still commited in the database. How do you handle that ?
@JosueIbarraNinja Před 4 měsíci
You generally have a timeout and retry limits set as parameters when starting your workflow or calling your next activity.
In Cadence, the cadence service (broker) ensures to follow up on the pending workflow. Even if the codebase changes, you can replay the workflow in its entirety
@sangmin7648 Před 2 lety
The comparison always seems to be between synchronous api call and asynchronous messaging. But how about asynchronous api call? Could there be any advantage of using async api call over messaging (other than ease of infra management maybe)?
@CodeOpinion Před 2 lety ⁺¹
Meaning something like gRPC's non-blocking async? It's still service to service direct calls. If you call a service and it's unavailable, or you call a service async non-blocking, and your own service fails and doesn't get the reply. Adding messaging that's durable in the mix removes services needing to be online.
@sangmin7648 Před 2 lety
More like rest api call on separate thread. I was thinking about something like an outbox pattern; but instead of messaging event to the queue, use rest api call on separate thread. With this approach, other services doesn’t have to be online.
@CodeOpinion Před 2 lety ⁺¹
Yes to a degree. The problem with service to service is if you do it, it can become viral. A calls B that calls C that calls D. At some point that becomes unreliable. Rather deliver a message to a broker and then you're done. I have a video coming out in a few weeks that illustrates where I think rpc and service to service is viable.
@sangmin7648 Před 2 lety
Thanks for the reply. I’ll think about how separate thread rest api call can(or cannot) work in the situation you mentioned. I was wondering about this because my company’s going the separate thread route. Having watched your videos, it felt awkward, but as a clueless junior couldn’t make an argument against it. Thanks again for the great video as always
@evilroxxx Před rokem
My application currently works exactly as you described. 1 api asynchronously awaits for another apis response and then once received continues the rest of its own execution. However this is called temporal coupling and must be avoided. Your 1st api cannot proceed with the success flow or error flow until it receives a response and throws an exception if it times out. So it’s not exactly an asynchronous execution if it’s waiting for something to finish. Hence this complex event driven architecture is favorable because not only can you control the execution flow but also you can pick up where you left off in case if the consumer of a message is unavailable momentarily and then comes back alive after a short period or something.
@ThugLifeModafocah Před 2 lety
But what happens when the broker is broken?
@CodeOpinion Před 2 lety
Use an outbox: czcams.com/video/u8fOnxAxKHk/video.html
@thedacian123 Před rokem
As far as i understood the orchestartor is a pHYSIcALL app from a given boundary, which is dumb cannot contain business logic.Are not you going to run into single point of failure issue,when this app were to failure?Thank you!
@CodeOpinion Před rokem
Generally the logic is around consuming events and sending commands. Generally no I/O or db access. If the service owning the orchestration is down, then yes it won't execute until it's back up
@marcelbricman Před 16 dny
your line of argument is that its impossible to roll back in a distributed way and then magically with messages it works - calling BS: the problem is still exactly the same. dont get me wrong, temporal decoupling is good, also the centralised workflow engine can be a great benefit, but your argument dpes not illustrate the point of all that. if every service can perform rollbacks asynchronously, there are plenty onther ways to resolve this problem
@evilroxxx Před rokem
What is the definition of a service here? Ordering, payment, warehouse are all single api endpoints? Are they console applications running processes in the background?
If they’re apis then how to they pick up events generated and process them?
If they’re background processes how’s the client going to communicate with them?
Let’s say Ordering service is going to communicate with products and/or pricing and/or coupons or inventory services. Is it going to send messages on the broker and wait for a response or an event from them and will that event contain the required data or will it need to do something else to get the data to prepare the order?
How’s all this going to work to handle scale of say millions of orders daily? Won’t that overwhelm the broker? What if the queues get so full that they cannot accept any more commands? What’s going to happen to the data then?
Can you please shed some light on these points?
As always I’m a super fan your videos Derek. Thanks so much for enhancing the community’s knowledge🎉
@CodeOpinion Před rokem
My definition of a service is "the authority of a set of business capabilities." They could be a combination of HTTP APIs as well as separate processes that are consuming messages off a queue/broker. A lot of your questions I actually cover in a ton of videos. You're about to go down a rabbit hole :) Here are some that might be helpful.
re: scaling message processing - czcams.com/video/xv6Ljbq6me8/video.html
re: communicating to client from background processing: czcams.com/video/Tu1GEIhkIqU/video.html
Před rokem ⁺¹
You forgot idempotence.
@CodeOpinion Před rokem ⁺²
Messaging is a broad topic. I've got videos at many that focus specifically on various aspects, one of them being idempotence.
@haskell3702 Před rokem
In this case the Ordering will be the orchestrator? Does this mean that Ordering has to know Payment and Warehouse (which does not look good)? Or the orchestrator should be an independent microservice of these 3?
@maximfateev2369 Před rokem
It depends on the orchestrator's implementation. It can be separate, or each of the services can own its own orchestration and execute its own operations and compensations.
@marcelbricman Před 16 dny
your line of argument is that its impossible to roll back in a distributed way and then magically with messages it works - calling BS: the problem is still exactly the same. dont get me wrong, temporal decoupling is good, also the centralised workflow engine can be a great benefit, but your argument dpes not illustrate the point of all that. if every service can perform rollbacks asynchronously, there are plenty onther ways to resolve this problem

Další v pořadí

Automatické přehrávání

Shared Database between Services? Maybe!