The main four challenges that come with an EDA are:
Handling Out-of-Order Events
When developing we tend to have a happy path scenario in mind. If I apply something to Service A, it will publish an event that will be consumed by any services that care about it.
In reality, depending on the infrastructure used, you may end up receiving messages in a different order than they were sent. For example, if you use the standard SQS there are no guarantees in the ordering and newer events can arrive sooner than older ones.
Figure 8. Potential out-of-order events being consumed.
Or, if you use an orchestration approach, such as AWS Step Functions, you may end up with the following issue.
Figure 9. Out-of-order even if the events are in the right order.
There is no one-solution-fits-all. You have to assess if, in your context, you can simply disregard old messages, include a buffering window prior to consuming so the events can be re-ordered, or simply fail to process and trigger an error.
Duplicated Messages
It is common to talk about delivery guarantees with any messaging infrastructure, which is the delivery mechanism of your EDA. You will find “at least once” is associated with many solutions and it means there is a chance you will end up receiving the same message twice.
Figure 10. Receiving the same message twice with a potentially negative outcome.
Here, the solution is to try to make your consumer idempotent, which effectively means this second message will not change the state of your service.
Debugging is Harder
The default mode when using events is fire-and-forget, where your service does not know if any consumer of those events was successful in doing so.
If there is an oversight in the process, you may find that something is not right, but contrary to a direct mode where the failure is presented immediately, in this case you would have to start looking at the logs to see what went wrong.
Understanding Impact is Harder
A common selling point of EDA is that you can expand the system’s behavior without making changes to the originating service.
Figure 11. When it is time to make changes, Order Service has no idea of the impact/affected services.
This flexibility is amazing but comes with the flip side that if you decide to change an event — or add a new one — to capture new business requirements, you do not have a clear understanding of which other services may or may not be impacted.
One recommended mitigation strategy is using a mix of some sort of registry, so there is a catalog of all that subscribe to a certain event, and keeping process-based documentation where the services that are choreographed to deliver a successful outcome are known.