Message Broker Failover

(S4)PACG 200 SP 05 Message broker 3.4

Overview

FSM Connector Message Broker is a .NET application that sits between SAP ECC or S/4 backend and FSM Cloud and participates in data exchange between these systems. The data are sent as asynchronous messages. Message Broker supports in sending the messages to and fetching the messages from the Cloud, with support for FSM messaging protocol to ensure
• the messages are always delivered to the target system, no messages are lost
• the messages are delivered to the target system in a sequence, in which they were generated on the source system
The delivery guarantee is based on several types of confirmations, which allow the source system to mark the messages as received on the target. To deliver the messages in the right sequence, while not scarifying performance, the messages are put in queues, bundled together, etc. As the Message Broker queues are held in memory, restart of the service could cause message loss. To resolve this problem, Message Broker on its start-up sends so-called resend request to ECC backend. ECC sends again all messages that have not been confirmed by the Cloud. There is a similar mechanism for the reverse channel – the Cloud resends the messages to ECC until they have been confirmed.

To achieve the goals described above, Message Broker must run as a stateful single-instance application. In other words, for each FSM account + company combination, there must be only one Message Broker running at the given point in time. If there were several active instances running and handling message for the same account + company, the right sequence of message processing could not be guaranteed. E.g., instance A could be faster in processing messages than instance B, causing messages processed by A to arrive to the target system before messages of instance B, even if they were sent later.

Solution

Currently, only two message brokers can run in parallel where one of them is active and the other one inactive

In order to meet the requirements described above and, at the same time, cover the requirements towards failover capable solution Message Broker (>= message broker 3.4 ) can be started in inactive mode. Inactive instance is in a standby, not participating in message exchange, and waiting for an activation signal. This influences both directions of messaging:
• Outbound messages, sent from ECC to FSM Cloud must only be sent through an active Message Broker instance.
• Inbound messages, sent from FSM Cloud to ECC are retrieved with polling. The Cloud never actively pushes any messages to Message Broker. Instead, Message Broker polls the cloud to check if there are any new data to be processed. On the inactive Message Broker, the polling must be disabled. Only the active instance can poll the Cloud.

For the outbound case, there must be a dispatcher that takes care for watching, which Message Broker is active and sending the messages to this instance. To avoid another single point of failure and simplify the architecture outbound message dispatching was built into FSM Connector ((S4)pacg 200 sp05). This is done via heartbeat mechanism. Heartbeat calls allows the FSM Connector to keep track of available Message Brokers.

Message Broker can be in one of the following states:
• Unavailable – Message Broker instance is down or broken, it should not be used to send the messages.
• Available inactive – Message Broker instance is running normally but is not selected as active instance. Presumably, because another instance is active.
• Available active – Message Broker instance is running normally and is used to send the messages.

In SAP ERP backend system there must be a job running that periodically, every minute checks availability of all configured Message Broker instances, by calling a new SOAP operation – heartbeat – on the Message Broker web service. In response, Message Broker should respond with its state:

  • State – Message Broker state, can be one of:

    • Unavailable – Message Broker instance is down or broken, it should not be used to send the messages.

    • Inactive – Message Broker instance is running normally but is not selected as active instance.

    • Active – Message Broker instance is running normally and is used to send the messages.

  • Unavailability code – technical code (30-character string) with reason of unavailability.

  • Unavailability message – user-readable message in English with a reason of unavailability.

When inactive Message Broker receives heartbeat call with active flag set, it should immediately activate itself and respond with state active. If Message Broker fails to activate itself, it should respond with state unavailable and respective reason. Similarly, if active Message Broker receives a call with active flag unset, it should immediately deactivate itself and respond with state inactive. Activation is performed synchronously. ECC system should wait till it is finished, so that any activation errors can be returned as the web service call result and processed respectively on backend side. If FSM Connector receives response unavailable on the heartbeat call or the heartbeat call fails for any reason (e.g., networking problem, Message Broker not listening on the specified port, etc.), it should mark the instance as inactive and activate (and send heartbeat with active flag set) another instance of Message Broker.
Basically, for each cluster there should be a loop over defined endpoints (Message Broker instances) in the sequence of priority with active flag set, until the first of them responds with active state. The instances that do not respond or respond with unavailable state are marked as unavailable. Afterwards, if any of the remaining Message Broker instances is active, a heartbeat call should be sent to this instance with the active flag cleared, so that the instance can deactivate itself.

Heartbeat call is not sent to Message Broker instances that are inactive and should remain inactive. This reduces the overhead of heartbeat, which in normal circumstances (primary instance in each cluster is active) pings only one instance of Message Broker for each cluster.

If you'd like to help us improve the documentation, please provide your feedback using the communication channels listed /wiki/spaces/PFCC/pages/1561427969. Learn about support possibilities here.