#Implementation-agnostic #Logging for #Microservices

Published on May 10, 2019

After several years of using Microservices, people still debate what would be the best or minimally sufficient logging mechanism for independently deployed and loosely-interacting autonomous Microservices. Nowadays, when DevOps are responsible not only for the fast development and deployment, but also are made liable for supporting produced Microservices, the perception of quick and simple development changes toward more manageable and maintainable design and implementation, and logs are among the most important instruments for this.

In this article, I share some practical considerations about logging for Microservices. The case with Microservices is much simpler than the one described back in 2008 in “Logging for SOA” because #SOA Services do not create any applications and may have several internal components each with its own log. Regular developers might not appreciate enough an importance of logging while Technical Leads, Product Owners and Architects are well familiar with such topics as an audit and solution/product reliability fundamentally based on logs.

A Context of Microservice Execution

In many publications about Microservice logging, I’ve read a recommendation to create and propagate a unique ID of the end-user through the chain of Microservices involved in preparing the response to the user’s request. Such type of requests triggered by user interface occupy only a small niche of Microservices. Considering Microservice eventing model and, especially polling mechanism, a lot of Microservices simply do not have such user and related unique ID. Thus, the latter is a special case with limited scope (in contrast to SOA Services that have a consumer/requester in all cases).

At the same time, Microservices always work in certain execution contexts. This context is formed by a chain of invocations. If Microservices explicitly call each other (coupling by invocation) or if they implicitly invoked by listening for or polling event notifications/messages (coupling by design), they create such chains at runtime. The recommendation is this: if an initiator has an identity A, its request or fired event notification should include it’s A-ID; the invoked Microservice with B-ID when invokes another Microservice with C-ID, passes a chain of identifiers like {A-ID, B-ID}. The last Microservice passes {A-ID, B-ID, C-ID} and so forth. Obviously, if the initiator is a polling Microservice, its ID should be in the ‘first’ position in the chain. Additionally, it makes sense to distinguish between several chains rooted in the same Microservice. For this purpose, the initial Microservice ID can be appended with a local timestamp.

The log-messages, therefore, can be easily grouped based on the origination of the invocation chain and ordered by invocation paths. This mechanism can be used for both audit- and error/operational-logs.

Monitoring versus Logging

My recommendation to the developers is never mix general/audit logging with monitoring reports. In some cases, it might be very interesting to record and analyse how long the Microservice executes or performs particular operation in its ‘body’. Related timestamps with short comments can be also stored or reported to the monitor. It is tempting to simply log this information. Yes, it is possible, but usually, it has to be a different collector or data storage for the monitoring data.

Handling Failures

In real world, it is not always possible to catch a failure and log it. Even if a Microservice works with its own data-store, the communication with it can occasionally hang. The same relates to the explicit invocation of Microservices, especially if external #API – from another application or from the application of external provider – is invoked. The first rule of managing logs is to make sure that the Microservice always sets time-outs for all of its actions conducted toward external resources. This allows Microservice avoiding potential hanging, perform several attempts to resolve the action and, if finally failed, log it.

My personal preference and wide industry experience state that we better have one log per application than multiple logs per Microservice. The latter is not really manageable and difficult in analysis. However, a single log channel and data-store (collector) constitute a single point of failure. What if a Microservice is able to catch the exception, but cannot log it because of inaccessibility of the centralised log collector?

Known recommendations here point to the FallBack pattern. Well, in such situation a Microservice stops its attempts to log, waits for a while and tries again. How much time this can take and has the application this time? The Microservice actually stack while the application is unaware of the problem and counts on this not working Microservice.

The solution for the problems of this type is known for decades (it was first described by Eric Evans in Domain-Driven Design) and it is not a FallBack pattern. From the time of deploying application components on Application Servers, me and my peers used asynchronous logging via Log4J and similar others. That is, the log request was submitted to the messaging system and went independently from the main process to the logging receiver. For Microservices, an #Anti-#Corruption #Layer (ACL) pattern does the same thing though articulated differently.

Logging Companion – an implementation of the ACL pattern

I recommend using the ACL pattern in all Microservices that are designed to log data. Its implementation, for example, may be as the following: each Microservice does not log directly but, instead, sends a message or fires a notification event that carries the log content, and continues its work. It is assumed that each Microservice has its Logging Companion – a Microservice, which is dedicated to listening to such messages/events from one or even several Microservices. This makes logging asynchronous – the log transition does not impact the main process of the Microservice sending logs. Eventually, this log will be delivered to the log data collector even if at some moment it is not accessible.

Apparently, an introduction of Logging Companion fits well with the concept of single function implementation per Microservice. Indeed, logging is a second and separate ‘function’, and it better to be delegated to another dedicated Microservices. The #Logging #Companion, in essence, is a wrapper shielding the function-realising Microservice from the logging infrastructure, which can be offered by a separate system or a runtime platform, i.e. entities existing and operating outside of Microservice ownership boundaries. For instance, if the logging infrastructure becomes inaccessible for a while, the Logging Companion can and should accumulate submitted log-messages/event notifications until the problem is resolved and then pass the logs to their collector doing this asynchronously and on its own pace.

The final note

Finally, I’d like to outline that operational and technology logging have different requirements than audit logging. The specific of the audit logging is that no data may be lost regardless any failures of the application or system. In the Application Server days, the audit logging used a pattern called Reliable Messaging. This pattern required persisting log data at the sender side before the log is sent to the log collector, as well as persisting log data at each transition step/node and on the collector’s side before the data became available for processing. Thus, the log data could never be lost.

Since ‘Microservice Architecture’ permits a Microservice failure, it, nevertheless, may not result in loosing log data. In other words, if regular logging can be implemented based on the concept of #Eventual Consistency of data, the audit logging requires #Strong #Consistency of data. This means that all Microservices participating in the transition of audit log to its final collector must implement runtime fail-over mechanism. Just #FallBack and #Circuit Breaker patterns are not enough – the postponed attempts to engage needed Microservice may never succeed and there may be no alternative Microservice for the broken one in the application. This is why for audit log I recommend to use #Sib and #Tandem patterns that make Microservice-based application more robust.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: