J2EE Journal, 2006
Reliable messaging via Web services and JMS
This article describes two techniques that may be used for assured delivery of important data, specifically, audit data, in distributed systems. We will review design that leads from assured to guaranteed delivery. This task gets more and more important in light of modern global operation risk regulations and related application risk management. Business Task and Functional Requirements
Relatively recent operation risk management regulations like Sarbanes-Oxley (SOX), and in some cases Basel II, require collection of “material evidences” of user activities that can affect financial reporting of the company. This includes user activities in the software applications, especially in the financial industry. In many cases an activity is interpreted not only as a fact of application access, but also as an access to a particular application function and even data. The user activities in the application are supposed to be stored in persistence storage for following audit (for this article we will use relational database for simplicity). Such databases are usually centralized and serve multiple applications; therefore, we are dealing with distributed systems. The aforementioned regulations assume that audit data may not be lost on its way to the database. In automated systems, this means assured delivery of data. Assured delivery of data is not a new thing in the application landscape. For many years MOM (Message Oriented Middleware) and recently, ESB (Enterprise Service Bus) technologies provided such functionality. The “cons” here are the high product costs and expensive maintenance. Plus, they do not guarantee that sent data is stored in the targeted persistent storage – they only assure that data is reliably transmitted from the sender to the receiver components or rolled back. This is the basic difference between assured and guaranteed delivery. In the article we will discuss an assured delivery and design a guaranteed delivery feature utilizing widely available J2EE technologies that may be suitable for small companies or for departments of large corporations. When talking about delivery data with assurance in a distributed system, the first thing that comes to my mind is reliable messaging (RM). If an item of audit data is interpreted as a message, we can concentrate on the delivery mechanisms – in particular, on Web services and Java Messaging Service (JMS). It is interesting to notice that if a task of delivery is slightly extended and includes the reuse of audit data for integration with other systems, e.g., security systems, the Web services-based design has to be reconstructed to become scalable, while the JMS-based design requires just an extension for reliability. Details of these designs will be discussed in the following sections. Web Services-Based Solution.
The reliable messaging implemented as Web services is based on several standards such as SOAP, Web Services Reliable Messaging (WS-RM), WS-Acknowledgement, WS-MessageData, WS-Callback, SOAP-Conversation, and others. WebLogic platform version 8.1 provides SOAP Reliable Messaging solution while version 9.0 offers WS-RM.
In both cases, the concept of RM may be demonstrated as shown in Figure 1. An audit message is created in the Worker Component or Business Application and sent to the Sender Run-Time Procedure. Before the message is sent further, it is persisted locally. This protects the message from being lost if the receiver side is unavailable at the moment. Then the message is sent to the Receiver Run-Time Procedure where it is persisted first of all. Since message transition is performed in the transaction, the latter can be rolled back in case of any problems on the network or receiver side. If the transaction is rolled back, the sender is notified that the message was not delivered. Depending on configuration, the message may be re-sent by Sender Run-Time Procedure or by the sender. The Receiver Run-Time Procedure invokes a business method in the Audit Service Provider before the acknowledgement of delivery is sent to the sender. If the receiver – the Audit Service Provider – operates in the Receiver Transaction Context, it has an ability to perform its own operations in the same transaction. For example, the receiver can store the message in the database. If storing fails and rolls back, the Receiver Transaction Context rolls back and in turn, the process does not remove the message from the persistent store of the Sender Run-Time Procedure. Thus, the audit data is not lost. The only problem with this mechanism is that the Receiver Transaction Context does not automatically roll back if the receiver throws an application-specific exception, i.e., the Audit Service Provider has to take care of such exceptions and explicitly roll back the Receiver Transaction Context if needed. As we can see now, if data is persisted using the same transaction as the Receiver Transaction Context, we get a reliable solution for transmitting audit data into persistent storage. The RM in the WebLogic 8.1 implementation has one major limit – it works on the WebLogic platform only. WebLogic 9.0 overcomes that limit via support of WS-RM, which works across all platforms that support the same standard. However, if a message has to be sent to multiple storages or transmitted data should be used for integration with other systems (for example, integration between authorization systems built into ALES, Documentum, and Business Objects products), the described Web services-based solution is limited in “vertical” business scalability. In particular, every time a new audit data consumer (or destination endpoint) has to be added to the system, a new sender (or source endpoint) has to be implemented and deployed. To improve “vertical” scalability, we probably need to change design and set the Worker Component as a Web service while setting the Audit Provider and other integrated data consumers as Service clients. An alternative design might include an intermediary service that is situated between client (again – Worker Component) and integrated services. The intermediary service distributes audit data to all interested services. In both cases, original Web services-based design requires significant modification. JMS-Based Solution.
While JMS is designed for assured delivery from the beginning, we have to use it in a special way to achieve guaranteed delivery. Moreover, since we are discussing practical solutions, we have to address security in the design, despite the fact that it does not contribute to the guaranteed delivery process itself. Information security has not been discussed with regard to Web services because it is a well-known issue and has accumulated a lot of attention already. At the came time, messaging is traditionally considered as internal infrastructure and therefore secured, while actually, it is not (the majority of recent research points out that 75-80 percent of security violations happened inside the company). Therefore, we will examine JMS-based design while keeping in mind “vertical” scalability, security, and reliability. Design for Security and Scalability
Let’s assume that we deal with two Audit Service Providers. Each one collects only audit data of a certain type. Instead of an Audit Service Provider, it may be another system that integrates with Worker Component via data exchange. If we use just two JMS Queues for message receivers, we will need to modify sender code when we add more receivers, – this solution is not scalable. Therefore, we need to broadcast the message to all interested parties/receivers via, for example, JMS Topic. Since audit data is sensitive, we cannot just put a message into a JMS Topic and rely on message filtering on the receiver side to select only appropriate messages; instead, we have to direct the message to the approved receivers only. We can observe several security models for JMS. As we know, JMS Connection Factory access may be protected by user name and password (UN/PW). We believe that this protection is not enough especially if the Topic is used by multiple receivers for integration purposes. It is common practice that, for example, an operation team discloses user name and password to those projects that need integration urgently without notifying the information owner (sender) and without checking security compliance. In WebLogic Trusted Domain configuration (trust between WebLogic Server domains), the password is not required at all. Sensitive data also may be encrypted; however, it requires dealing with additional encryption service and/or encryption key management infrastructure. Both of them may be not available or may be too expensive.
We propose an inexpensive combination of UN/PW with operational security design. Figure 2 displays one of the possible solutions where messages that are sent to the JMS Topic are directed to particular JMS Queues for approved receivers, e.g., Audit Service Providers.
The Message Bridges shown in the diagram have been known since the WebLogic 7.0 release and play the role of an intermediary. They subscribe to the JMS Topic and transmit messages in the transaction to the JMS Queues. Both Topic and Queues are configured with assured delivery and the Message Bridges have a durable subscription to the Topic. Each Audit Service Provider uses Message-Driven Beans (MDB) for message retrieval from the JMS Queue. The MDB starts a transaction and manages the message transition into the database in the scope of the same transaction.
If a Message Bridge is down, the message stays with JMS Topic. If a JMS Queue is down, the Message Bridge rolls back its transaction and the message still stays with the JMS Topic. If Audit Service Provider or database experiences problems, the MDB transaction rolls back and the message stays with the Queue. When problems are fixed, it is guaranteed that the message is processed and stored in the database.
The trick here is not in the technical solution, but in the operational data processing. The matter is that the information owner controls Message Bridges and approves receivers. Upon approval, the receiver is granted the Message Bridge and JMS Queue, and the operation team configures them (in addition to UN/PW). Thus, the solution gets relatively secured and is still flexible and scalable.
Design for Reliability
There are several steps to be performed to achieve reliability and guaranteed delivery via the JMS model described before. The overall system is shown in the Figure 3.
Step 1: All JMS destinations used in the system have to be configured with assured delivery. All JMS listeners – Bridges and MDB – have to use durable subscriptions.
Step 2: It is clear that the JMS Topic is the heart of the system. That is why we have to minimize the risk of its failure. We place the JMS Topic on a physically separated server with an automated failover feature. If a working instance of the Topic gets down, the failover instance immediately starts to work. If the sender or receiver’s application server goes down, the Topic is still capable of operating.
Step 3: Since JMS Topic becomes a remote destination, the Worker Component endures a risk of loosing audit data if the JMS Topic is not reachable. To protect audit data at this point, we add a distributed (clustered) JMS Queue, which is situated on the same cluster as the Worker Component.
Step 4: We add Message Bridges to connect each of the instances of the distributed Queue with the JMS Topic. All Message Bridges are situated in the same cluster as the sender’s distributed Queue.
Step 5: For every receiver, create a distributed Queue that serves the receiver (e.g., Audit Service Provider). A receiver’s Queue may be isolated from or collocated with the sender component.
Step 6: Connect the JMS Topic with the receiver’s distributed Queue by a Message Bridge. The latter transmits messages in the transaction to the receiver’s distributed Queue. This Message Bridge is collocated with the receiver’s distributed Queue.
Step 7: Receivers, such as Audit Service Providers, use MDB to subscribe to the receiver’s distributed Queue. MDB operates in the same way as described above.
Now, let’s look at how it works together. The sender forms an audit message and sends it to the sender’s distributed Queue where it is persisted. The Message Bridge, which connects a particular JMS Queue instance with the JMS Topic, starts a transaction, retrieves the message from the Queue, and sends it to the Topic. If Message Bridge is down or JMS Topic is unavailable, the message stays with the sender’s distributed Queue until next attempt. If the message is delivered to the JMS Topic successfully, the acknowledgement is sent to the sender’s distributed Queue and the message leaves from the sender’s distributed Queue persistent storage. In Figure 3, four images of the Message Bridge represent a case of a cluster with four nodes where the distributed Queue is situated.
In the JMS Topic, the message is persisted again and broadcasted to durable subscribers. We have only one subscriber in the system – the Message Bridge. The latter starts transaction, retrieves message from the JMS Topic, and sends it to the receiver’s distributed Queue. If Message Bridge is down or receiver’s distributed Queue is unavailable, the message stays with the JMS Topic until next attempt. When the message is successfully delivered to the receiver’s distributed Queue, the acknowledgement is sent and message leaves from the JMS Topic persistent storage.
If more than one receiver is permitted to get the message, a new Message Bridge and receiver’s distributed Queue are configured in the system. In this case, the JMS Topic sends the message to as many durable subscribers (Message Bridges) as permitted and configured in the system. Figure 3 does not reflect if a receiver is deployed in a cluster, though it is highly recommended.
Finally, a receiver – Audit Service Provider – deploys a pool of MDB to get the messages from the receiver’s distributed Queue. Each MDB starts is own transaction and Audit Service Provider persists an audit message into the Audit database in the context of this transaction. If transmission to the database results in an exception, the transaction is rolled back and the message stays with the receiver’s distributed Queue until the next MDB processes it. It is important to notice that MDB uses Bean-Managed Transaction (BMT). The BMT allows Audit Service Provider to roll back the transaction if Provider experiences any problems with, for example, data transformation and throws an application exception.
The discussed system looks relatively complex. One can ask – what is the big deal here? Any Enterprise Service Bus (ESB) can solve this problem and hide all details! Of course, you are right. However, even if an ESB provides guaranteed delivery, what do you do if your organization cannot afford ESB, or it takes too long to buy the product?
We have described two solutions for transmitting audit or any sensitive data from the application to the database with guaranteed delivery. Both solutions are scalable and can effectively work in a distributed environment. Both solutions may be easily converted into services to be used in the service-oriented architecture. The most important aspect of the solutions is that they do not require expensive products and can be implemented on a regular WebLogic platform.
Many thanks go to Bruce Horner and Manmeet Ahluwalia for their significant contributions into the reliability design.
- OASIS Web Services Reliable Messaging specification: http://specs.xmlsoap.org/ws/2005/02/rm/ws-reliablemessaging.pdf
- OASIS Committees by Category Web Services: www.oasis-open.org/committees/tc_cat.php?cat=ws
- Orchard, D. ” Making Sense of Web Services Standards”: http://dev2dev.bea.com/pub/a/2004/01/ws_orchard.html
- BEA AquaLogic Service Bus: www.bea.com/framework.jsp?CNT=index.htm&FP=/ content/products/aqualogic/service_bus/
- National Security Institute’s Information, Security Resource: http://nsi.org/NSIProducts/SECURITYsense/SECURITYsense.html
Poulin, M. “How to Deal with Security When Building Application Architecture.” JDJ. SYS-CON Publications, Inc. Vol.10, issue 9: http://java.sys-con.com/author/poulin.htm