Published on March 3, 2019
Fault tolerance requirement was known for a long time before Microservices were defined. This requirement has surfaced again because when working with relatively small applications or components like Microservices, it is much easier to observe fault tolerance and it replicates at a higher scale than when working with monolith applications. There are no any special aspects in this requirement caused by very Microservices.
Well-known means that help to improve fault tolerance include timeout, retry, retry policies, and circuit breaker. A method of FallBack occupies an outstanding position among those means because it extends the realm of a faulty Microservice.
Some authors propose to implement FallBack strictly within the faulty Microservice. For example, if a Microservice deals with a collection of items and can return one or several (all) items, a failure to return one item may be, according to those authors, replaced by returning all items. In other words, if the code returning one item does not work, they propose to return all items, i.e. requested item will be returned among others. I am, as a consumer, not a fan of getting what I had not asked for – I am not ready for such return, I do not know what to do with it. I do not consider such solution as fault tolerant.
A FallBack method promotes an idea that the consumer has to step/fall back and consider another solution for the given task – a Microservice may faile, but the task may not, i.e. just an exception is not an option. This article describes two options of realising a FallBack via a Sib pattern.
The Sib pattern states that when the Microservice, which the consumer uses, is not working or unavailable, the consumer should be able to invoke another Microservice that performs the very similar functionality and returns practically the same result – a ‘sibling’ Microservice. An availability of such Microservice in the solution/application is the fundamental attribute of the solution resilience.
This pattern has a few preconditions, particularly, the following principles should be preserved:
- Microservice does not own any business data; otherwise, a failure of the Microservice makes the business data inaccessible to other Microservices or other computational resources in the enterprise. A Microservice may and should own only its state data.
- Any temporary compositions of Microservices must have an orchestrating manager as well as may and should own/keep its state data and the snapshot of business data, which should be intact during the composition work.
- When we use Microservice compositions such as sessions or transactions, which include multiple Microservices, we solve the task that no one involved Microservice can solve on its own. A decision whether to use synchronous or asynchronous interactions between Session or Transaction Manager and particular Microservice requires special considerations for each case. A logic of the session or transaction, as well as a compensation (rollback) transaction, may or may not require a certain order of activities, i.e. asynchronous interactions may or may not be suitable.
Within the Sib pattern, we have (at least) two options for its implementation: the Microservice-based solution/application has to have
1) a totally different ‘sibling’ Microservice
2) a ‘twin’ Microservice.
Case 1) can be illustrated with a Microservice that represents a “shopping basket”. If a call to Microservice “ShoppingBasket” fails and no other means of fault tolerance help to obtain the content of the basket, another Microservice like “BasketItems” may provide “getAllBasketItems” API and return the basket content to the consumer. That is, the consumer satisfies its needs.
Since collecting items in the shopping basket is a long-time running consumer’s session, it can have its own data store where the state of this session, i.e. the content of the basket, is persisted for the time of the session independently from the external data stores where the shop’s items (chosen by the consumer) are kept for other consumers. This session’s data store is shared between Microservices engaged in the session by the Session Manager. The latter (as any composition manager) should be accompanied by the Data Access Service (ADS) dedicated to the shared data store. This ADS is for all engaged Microservices. You can use the CQRS pattern for this data store, but this is a different topic. When the session is over, the Session Manager releases the data store for GC and all involved instances of Microservices vanish.
This solution does not guarantee the success for the consumer because the problem might be outside of the Microservice code, e.g. in the network or in the data store where the basket is persisted. Nonetheless, if the problem is in the Microservice, an invocation of another code can solve the consumer’s task.
Case 2) may appear a bit simpler. For example, a Microservice “ShoppingBasket” may be accompanied (by design) by a Microservice “ShoppingBasketSib” doing the same as the “ShoppingBasket”, but designed and written by a different person in the development team. The “ShoppingBasket” and ShoppingBasketSib when used in the same composition share the data store in the same way as described in 1). With this implementation, every consumer knows that each Microservice XYZ always has a twin – a Microservice XYZSib, which can be used when needed. Though this option doubles the development work, it might be easier to comprehend and to use – the sibling/twin is guaranteed in development and there is no need to looking up for a different though similar Microservice, which might be not created. In essence, the choice of the option is up to the development team.
It is useful to note that both options relate to the principle of composability. It says that every independent Microservice may compose as many different compositions of other Microservices or may participate in as many different compositions as it wants being sometimes even unaware of its participation.
Some Microservice developers try to raise a rule that says something like this, ‘Microservices should not explicitly invoke each other, but have to ‘communicate’ by events or broadcasted messages’. While we have touched this topic already, it may be useful to outline relativity of the decision on Microservice interactions: indeed, Microservices should not know exact network locations/IPs of other Microservices – this is the job of a service or event bus. Exact knowledge of the network locations couples Microservices. At the same time, a notion of business trust requires that the consumer knows its provider to trust it. For example, A Transaction or Session Manager may know what Microservices can satisfy the needs of a particular step and will name these pre-known Microservices in the calls when needed. The bus has to find requested Microservice in the network and deliver the call. If a Microservice needs something to be done in the application and fires an unaddressed event or broadcasts a message there is a risk that no one would be listening to such an event or message. Yes, asynchronous communication is very appealing and fits with the Microservice isolation idea, but it does not guarantee that the needed effect takes place in the application. Another Microservice, developed by another team may or may not listen to unaddressed events or to the broadcast, i.e. in order to guarantee such communication we have to couple Microservices even across teams by design. If a Microservice fires an addressed event, it is practically the same as calling another Microservice via a bus. The required type of response should be included in the call as synchronous or asynchronous. The only “coupling” in this case is a design policy regarding the required response type that all Microservices in the solution/application should follow.
Finally, described Sib pattern for Microservices aimes the task of Fault Tolerance. It is based on the use of different Microservice at runtime if the initially chosen one fails. The initial Microservice is ‘shadowed’ by its sibling or twin Microservice creating a composition that shosuld be managed by either the consumer or by specially developed ‘manager’. This pattern can eliminate the failure caused by the Microservice code, but still has only a certain probability to work around network or resource failure.
Leave a comment