Most of us have used brokers in one way or another.
Whether you’re buying a house, insurance, a car, or even something as simple as coffee, brokers can facilitate the transactions between you and producers.
A broker’s primary role is to facilitate a transaction, representing either a buyer, or a seller, but not both at the same time.
However, the broker concept is not only used in the real world but also in the digital world – and we’re not just talking about online selling.
The broker plays an essential role in enterprise application integration (EAI), as it is used to facilitate the messaging system in data transmission between the consumer and the producer.
When millions of messages are transmitted between producer and consumer via different systems, managing the huge number of messages becomes a real challenge.
Thus, there is a need to ensure that data is sent and received by the correct systems.
The architectural pattern of message brokers in EAI supports to validate, transform, and route messages between systems by working as a mediator in communication between the applications.
Thus, it minimizes the mutual awareness between the applications while effectively implementing decoupling.
In a nutshell, a message broker performs the following actions:
Accept incoming messages from different applications.
Perform required actions on them
Helps to meet specific, non-functional, requirements
Reuse intermediary functions
Message brokers come in different variations based on their individual features. Some of the most popular brokers are JMS and Kafka, and both are used in EAI practice.
In fact, both have been pretty successful in allowing organizations to communicate to internal departments, third parties, and even customers through servers.
This blog is designed to differentiate between the two brokers, and how you can leverage their use for the benefit of the operations of an organization.
JMS stands for Java Message Service, which is a java-oriented middleware API and the first enterprise messaging API that has gained industry-wide support. The primary function of JMS is to send messages between two or more clients.
JMS is a messaging standard that allows Java components to create, read, send, and receive messages. At the same time, it leverages communications between different components in a distributed application to make it loosely coupled, asynchronous, and reliable.
Components of JMS application:
|JMS provider||A JMS provider, commonly known as Message-Oriented-Middleware (MOM), is a messaging system that works to implement JMS interfaces in providing functionalities, such as administrative and control features.|
|JMS client||A Java application that produces and receives messages.|
|JMS producer||A JMS producer works as a JMS client that creates and sends messages.|
|JMS consumer||This is a JMS client that receives messages.|
|JMS message||An object that contains the data during the transfer between JMS clients.|
|JMS Destination||This is either a JMS topic or a queue that works as a destination for messages between the consumer and producer.|
|JMS queue||This is a staging area during message transmission. Messages from the producer wait here before it is read by one of the consumers. Unlike the concept of a queue, the messages don't maintain any order here. The queue only guarantees that there is a single time process of the message.|
|JMS topic||The JMS topic facilitates the distribution for publishing messages which are delivered to multiple subscribers.|
Messaging models are basically programming models, and JMS follows an asynchronous messaging model between heterogeneous systems. It supports two types of messaging models:
Point to Point Messaging model (P2P)
Publish-Subscribe model (Pub-Sub)
Let's have a detailed look at each type of messaging model.
In the P2P model, the JMS queue is used as a destination. The model follows a message routing pattern where messages are routed to individual consumers.
These consumers maintain the queues of incoming messages. This messaging type is based on three concepts:
In this case, each message addresses a specific queue. While receiving the messages, the clients extract the messages from the destination queue.
In this model, any number of producers can send messages to the queue, and there is a guarantee of message delivery. However, only one consumer can consume the message. The queues retain all the messages until they are delivered successfully or have expired. Additionally, if the message is not registered to any consumer, the queue holds the message until it is registered to one.
The publish-subscribe model uses the JMS topic as the means of the messaging system.
In this model, neither the publisher nor the subscriber know each other. However, consumers can register to receive messages published in a particular JMS topic.
If the JMS consumers subscribe to a specific topic, it can consume all the messages under that topic. However, it is a time-bound activity where JMS subscribers can only consume published messages on a topic after it subscribes to that topic. In this case, if any message is published to the topic before the subscription or during when it is inactive, such a message cannot be delivered to the consumer. Unlike a queue, the topic does not store messages.
JMS has multiple capabilities.
It makes it easy to develop applications that follow an asynchronous messaging pattern for business data and events.
It defines an enterprise messaging API that efficiently and easily supports a wide range of enterprise messaging products.
It supports Java applications for enterprise messaging systems.
It is a message-oriented middleware (MOM) that provides a low-level abstraction between database and application adapters, business process automation, and event processing.
It provides a common set of messaging concepts and facilities.
It needs minimum work to implement the provider.
It maximizes the portability of messaging applications.
It provides the client interface for both pub-sub and P2P domains.
Supports Asynchronous Communication: As JMS follows an asynchronous messaging pattern, users can expect the JMS queue to perform well and provide high throughput. Apart from this, the JMS queue is able to stream messages for consumers. These messages are processed together in RAM whenever the consumer is present. JMS can also send multiple messages within a second – thousands even – by utilizing multiple threads and processes.
Great Industry Support: The JMS specification is widely available in message brokers. JMS was the first messaging API that had substantial industry support when applied to enterprise applications.
Reliability: Messaging in JMS is reliable as it ensures the delivery of messages to the intended consumer once sent by the producer. It also excludes duplicate delivery of messages.
Standardized Messaging API: The standard schemes and conventions for JMS have been widely accepted by other vendors, allowing JMS to address any portability issues more efficiently while facilitating simple application development.
Java’s Simplicity: The JMS API is easy to learn. Thus, developers will be able to write portable, messaging enterprise applications at a faster and more efficient rate.
Loose Coupling: JMS can decouple unrelated systems via system boundaries. It doesn’t have to share with a common database.
Can be Processed by Message Driven Beans: As Message Driven Beans is based on JMS, developers are able to implement asynchronous enterprise java beans for scalable and robust applications.
Interoperability Between Different Providers: JMS’ high interoperability allows two distinct applications to communicate with one another despite using different messaging providers.
JMS is Java-based. In multi-tiered applications using microservices, where multiple languages and frameworks are used, this can become a hindrance.
In JMS, although APIs are specified, the message format is not. This is a limitation of JMS. They just have to use the same API.
When the time comes to choose the right messaging API for the provider, JMS is the better choice than a tightly coupled API-like remote procedure call (RPC) in the following scenarios:
The provider wants no inter-dependency between the components handling information. This means you can easily replace the components.
The provider needs the application to continue to function whether or not all its components are up and running.
The inter-component information exchange is allowed by the business model without receiving an immediate response.
Message delivery between Java components and legacy systems.
Enabling asynchronous messaging in large web applications.
Message-based system integration applications.
Anything that consumes data and pushes data to another source can be integrated with JMS messaging.
For process operations and requests that are long-running.
With emerging protocols like MQTT and AMQP in IoT technology, we are able to observe some shifts within the messaging sphere. For example, sensors that perform temperature readings now need to interface with computer systems. JMS is used here as a conduit that bridges different messaging protocols within those systems.
JMS is also used for global server rollouts and reliable messaging. Large social media sites benefit from JMS when they engage in global server rollouts. Most of these sites need to roll out servers on such a large scale that reliable messaging may be a challenge. Hence, any project that needs reliable messaging from endpoints can use JMS.
Apache Kafka is an open-source software platform that utilizes stream-processing to provide a low-latency, high-throughput platform that handles real-time data feeds.
Kafka was originally developed by LinkedIn and used as a scalable messaging platform for the social media network’s central data pipeline to accommodate its growing membership. It was later donated to the Apache Software Foundation.
Kafka is written in Java and Scala, and it works by connecting to external systems for data import/export, which is done via Kafka Connect. It provides Kafka Streams, which is a Java stream-processing library. Transaction logs heavily influence the design of Kafka.
Apache Kafka also works as a messaging system that is distributed and follows the publish-subscribe model. It also acts as a robust queue capable of handling a high volume of data. Users can also pass messages between endpoints with the help of it.
Kafka is suitable for both online and offline message consumption. Furthermore, Kafka messages persist on the disk, and within the cluster, they can replicate by preventing data loss.
|Topic||This is a category name where messages are published. These are always multi-subscriber, which means a topic can have zero, one, or many consumers that subscribe to the data.|
|Partition||Partitions are created by splitting topics. There should be a minimum of one partition for each topic. Messages persist on partitions in an immutable sequential order. A partition is implemented as a set of equal-sized segment files. Topics can handle any amount of data as they have many partitions in it.|
|Leader||This is a node responsible for all reads/writes for a given partition. A leader is basically one server that acts per partition.|
|Follower||This is also a node that follows the instructions of the leader and acts as a normal consumer. Once the leader fails, one of the followers becomes the new leader automatically.|
|Broker||Brokers are systems that maintain the published data. Each broker may have zero or more partitions per topic.|
|Producer||Producers publish data on the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin manner or following some semantic partition function.|
|Consumer||Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance. If all the consumer instances have the same consumer group, then the records will effectively be load-balanced over the consumer instances. If all the consumer instances have different consumer groups, each record will be broadcast to all the consumer processes.|
Kafka keeps up with messages within topics. Producers create the data within the topics, and consumers read from those topics. As Kafka is distributed, partitions separate topics and are replicated across various nodes (leader & follower). While consuming from a topic, we can also configure a group with multiple consumers. Each of the consumers in a specific group can access messages from a particular subset of partitions within the topics they subscribe to. This will ensure that every message is delivered to one consumer in the group, and all of the messages that carry the same key make it to the same consumer.
Kafka's uniqueness is that it handles each topic partition as a log, which is nothing but an ordered set of messages. Also, every message within a given partition is assigned a unique, one-of-a-kind offset. Kafka doesn’t track which message is actually read by what consumers. Instead of holding just unread messages, Kafka holds all of the messages for a pre-specified amount of time.
Kafka is the better choice and replacement for a more traditional message broker where there is a requirement for very high throughput for distributed systems. Kafka is also well suited to large scale message processing applications because it has better throughput, built-in partitioning, replication, and fault-tolerance.
Website Activity Tracking
Different site activities (page views, searches, or other actions) can be published to central topics. One topic is used per activity type. These feeds can be subscribed for a range of use cases, including real-time processing, loading into Hadoop, real-time monitoring, and offline data warehousing systems for offline processing and reporting.
Kafka is also often used for operational monitoring of data.
Kafka is a good substitute for a log aggregation solution. Log aggregation means physically collecting log files off servers and putting them in a central repository, which is either a file server or HDFS for processing. Kafka creates an abstraction on the details of files and provides a cleaner abstraction of log or event data as a stream of messages. This allows easier support for multiple data sources, lower-latency processing, and distributed data consumption.
Apache Kafka has a library called Kafka Streams, a lightweight but powerful stream processing library that supports data processing pipelines which consist of multiple stages. In this case, the processor consumes raw input data from the Kafka topic, aggregates it and enriches it, or transforms it into new topics for further consumption.
Event sourcing is a form of application design where users can log state changes as a time-ordered sequence of records. Kafka supports huge, stored, log data, making it an excellent backend for an application built in this style.
Kafka serves the purpose of an external commit log for a distributed system. The log helps in replicating data between nodes and applying a re-sync mechanism to restore data for the failed nodes. It is the log compaction feature in Kafka that helps support this usage.
High-throughput: Kafka is capable of handling high-velocity and high-volume data, despite not having large hardware. It can also support the message throughput of thousands of messages per second.
Low Latency: Kafka can handle messages with very low latency in the milliseconds range.
Fault-Tolerant: Kafka has the inherent capability to be resistant to node failure within a cluster.
Durability: Since Kafka can perform message replication, messages are never lost. This ensures the persistence of data or messages on disk.
Scalable: You can scale up Kafka by adding extra nodes without incurring any downtime. Moreover, Kafka can handle messages in a fully transparent and seamless manner.
Distributed: Kafka has a distributed architecture which makes it scalable along with the capabilities like replication and partitioning.
Consumer Friendly: Kafka can be very versatile and be tailor-made for a variety of consumers. Moreover, Kafka can integrate with a variety of consumers instead of being written in various languages.
No Complete Set of Monitoring Tools: Kafka does not have a full set of monitoring and management tools. This is a point of concern for many enterprise support staff when choosing Kafka.
Issues with Message Tweaking: Kafka doesn’t run optimally when modifying the message, as the performance reduces significantly when doing this. On the contrary, it performs well if the message is unchanged.
Does not support wildcard topic selection: Kafka only matches the exact topic name. Hence, if there is any wildcard topic selection, it fails. This makes Kafka incapable of addressing certain use cases.
Reduced Performance due to compression: While Kafka doesn’t have issues on message sizes, if the messages get compressed by the broker and consumers, the throughput and overall performance may be affected.
Lacks some Messaging Paradigms: Kafka does not support messaging paradigms like queues, request/reply, point-to-point, etc. This can be problematic for specific use cases.
Here are the differences between JMS and Kafka based on specific parameters. You can use this to make a more informed decision for your choice of message broker.
|Order of Messages||There is no guarantee that the messages will be received in order.||The receiving of messages follows the order in which they are sent to the partition.|
|Filters||This is a JMS API message selector that allows the consumers to specify which messages they are interested in. This way, message filtering happens in JMS. Message selection can follow specific criteria. The filtering occurs at the producer.||There is no concept of the filter at the broker level. Hence, messages picked up by the consumer do not specify any criteria. The filtering can happen only at the consumer level.|
|Persistence of Messages||It provides either in- memory or disk-based storage of messages.||It stores the messages for a specified period whether or not it has been picked up by the consumer.|
|Push vs. Pull of Messages||The providers push the JMS message to queues and topics.||The consumers pull the message from the broker.|
|Load Balancing||Load balancing can be designed by implementing some clustering mechanism. Thus, once the producer sends the messages, the load will be distributed across the clusters.||Here load balancing happens automatically. Because once the Kafka nodes publish its metadata that indicates which servers are up and running in the cluster. Also, it tells the producer where the leader is. Thus, the client can send messages to the appropriate partition.|
JMS and Kafka are both wildly popular solutions for messaging. Whilst JMS has been around for longer it is still a very popular choice for certain use cases. Before you consider which one is better, it is best to do your homework and study the business requirements and your capabilities. Discover what you need from a messaging platform then see whether JSM or Kafka is a better fit for your requirements. Try not to look at individual market popularity – rather, focus on how they can solve your problems.
TORO Cloud introduces Martini™, a modern API centric platform that includes everything you need to consume, publish, and manage APIs for digital transformation. It integrates cloud-based and on-premise applications, automates business processes, log transactions, and creates reports. It can integrate message brokers of various kinds like ActiveMQ, RabbitMQ, Apache Artemis, and also support transacted messages. Using Martini™, you can integrate Apache Kafka. At the same time, it supports endpoints like JMS message listeners.