#Microservice #Polling: #Tandem #Pattern for Reliability

Published on April 30, 2019

Intent

This pattern increases durability of polling data by a Microservice (MS) and, as a result, increases robustness of the overall application or system.

A Microservice, in contrast to SOA Service, can initiate its work based on a schedule with no other elements of the Application (App) being aware of this. If a polling MS fails, the only possible mechanism to find this out and eventually fix it is an analysis of logged records. However, in some cases, the failing MS might be unable to submit appropriate log. The polling MS can fail because of, at least, three reasons:

1)     A failure of the resource/ queue that the MS polling

2)     A failure of the network used by the MS for accessing the resource/queue to be polled

3)     A failure of the MS itself.

Though the polling is a synchronous operation, a MS can identify failures such as 1) and 2) using, e.g., a time-out to break the hanged connection. In the case 3), we are dealing with a “dead-end” and we cannot resolve the situation without engaging other means. Depending on the nature of the business task that the polling MS contributes to, for example, whether Eventual or Strong Consistency of the data and its processing are required, the failure of polling MS may be respectively ignored or requires a fail-over mechanism.

Thus, if a fail-over mechanism is a must have, the App should have another MS to substitute the failed polling MS at runtime.

Based on

Based on the Tween Pattern for Microservices

Motivation

If a fail-over mechanism is a must have, the App should have another MS to be involved to substitute the failed polling MS at runtime.

Solution

Each polling Microservice is accompanied by its Microservice-Tween and they work in tandem utilising shared temporary data storage.

Participants

1.      A resource or data to be polled

2.      A Microservice that is supposed to poll and receive the data (MSR)

3.      A Microservice-Tween to the MSR (MSRT)

4.      External Resource/Queue (EQ) that the Microservice polls

5.      Internal Queue (IQ) where Microservice persists previously polled data

Assumptions

  1. Both MSR and MSRT are designed and developed by the same person or team
  2. Both MSR and MSRT are deployed always as a pair
  3. A frequency of the new data appearing at the external resource/queue should be not significantly higher than the polling frequency from the resource/queue
  4. A capacity of the resource/queue should be enough to handle the number of entities at the external resources or the number of new entries into the external queue.

Implementation and Execution

1.       The MSR and MSRT solve exactly the same task with the same data and have the same implementation, but have a bit different configurations. The initiation of the schedule for the MSRT must be set with a delay of approximately of ½ the period of the schedule for MSR.

2.      Each of these Microservices:

a.      Access the same EQ upon individual scheduler trigger

b.     If the MS doesn’t obtain data, it waits for the next polling round

c.      If the MS obtains the data:

                            i.     The MS polls the latest entry from the shared IQ

                            ii.     The MS compares the data (e.g. an identificator or a time-stamp) for the data obtained from EQ and IQ.

                            iii.     If the data the same, the MS stops processing

                            iv.     If the data is different, the MS:

1)     Data obtained from EQ is placed into shared IQ as the new, the latest record

2)     Continues its further processing as designed. 

Consequences

1.      The MSR and MSRT together provide doubled frequency of data polling from the external resource /queue

2.      If any one – MSR or MSRT – fatally fails, another one continues the work with the same data while the frequency of polling degrades twice

3.      If the polling happens more frequently than the new data appears in the EQ, the duplication of the same data processing is excluded

4.      The fatally failed MS can be easily identified based on the dropped frequency of log entries from the MS tandem.

5.      The App can continue working with all designed internal operations

Implementation

The implementation of the IQ and EQ may be identical. For example, both EQ and IQ can be different partitions of the same Kafka messaging system. It is recommended that the configuration of partition for the IQ sets the life-time of entries in correspondence with the polling frequency – this is a temporary storage to be between for a just a few polling cycles.

Applicability

This pattern can be applied to, at least, two types of resources that may be polled:

1)     Revocable data – when a polling entity accesses the data in the resource/queue, the data is released to the requester and revoked from the resource/queue in full. Example, several messaging or event-store systems of type Queue – the data is obtained by the entity that requested data the first. In this case, there is no need to engage the IQ.

2)     Irrevocable data – one or several polling entities can accesses the data in any order and obtain the data clone many times until the resource eliminates this data after certain (configurable) period of time. Example: Kafka messaging intermediary/store, which is sender-centric and rely on the data consumer for managing the data versions in retrieval.

The data, which a MS can poll, may be of any type. Example:

1)     Message/note

2)     Message/object

3)     Document

4)     Event notification.

An example scenario: a MS sender (MSS) fires an event it experiences to the Event Broker/Bus (EB) of “irrevocable data” type. A MSR, which is aware about an end-point of the EB where the event can be retrieved from, is supposed (by design) to execute its polling call to obtain the event notice. Unfortunately, MSR fails and does not log its failure. The process that had to be triggered by the MRS does not start.

The Tandem Pattern resolves this situation because the MRS-Tween operates polling from the same resource/queue for the same data independently from the failed MSR. A probability that both MSR and MSR-Tween would fail at the same time is much less than the probability that a single MSR fails.

Join the Conversation

3 Comments

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: