top of page

Kafka Integration In Pega: Boosting Data Flow

  • Kafka, renowned for its high-throughput and low-latency capabilities, serves as a powerful platform for managing real-time data feeds. Leveraging Kafka as an input source, you can drive event strategies within the Pega Platform™.

  • Kafka data sets exhibit high performance and horizontal scalability, particularly concerning event and message queues. These data sets can be partitioned to facilitate load distribution across the cluster.

  • The Stream service facilitates the asynchronous flow of data between processes within the Pega Platform™. As a multi-node component built upon Apache Kafka, it offers robust functionality. This service can be leveraged for various purposes, such as transmitting correspondence data to delivery channels in Pega Customer Decision Hub, handling customer responses for the Adaptive Decision Manager (ADM), or triggering background processes across any Pega application.

  • Pega Platform is typically deployed on servers referred to as nodes, which collaborate within groups known as clusters.


Scaling a Kafka Cluster -


Adjusting the size of your Kafka cluster involves executing the following actions:

  1. Reassigning partitions

  2. Incorporating a new node

  3. Restarting a temporarily inactive node

  4. Decommissioning a node

  5. Substituting a node

  6. Modifying the replication factor

  7. Altering the number of partitions

  8. Removing a dataset

  9. Truncating a dataset


Integrating Kafka with PEGA -


Kafka Configuration Instance - Kafka configuration instances are specific data instances created within the Data-Admin-Kafka class of your application. These configurations serve the purpose of establishing a client connection between the Pega Platform and an external Kafka server or server cluster. Defining such configuration instances is necessary when setting up a Kafka dataset.


Execute the subsequent actions to generate a Kafka configuration instance representing a Kafka cluster within the Pega Platform:

  1. Access the Designer Studio interface. Click on the "+ Create" button. Navigate to "SysAdmin" and select "Kafka.

  2. Provide a brief description for the rule.

  3. Enter the rule ID, such as "MyKafkaInstance," in the Kafka field.

  4. Click "Create and open."

  5. Within the Details section: Click on "Add host."

  6. Input the address of the Kafka cluster into the Host field.

  7. Enter the port number in the Port field.

  8. Optionally, click "Add host" to configure additional host and port pairs for connection. Note: The Pega Platform automatically discovers all nodes within the cluster upon initial connection. Hence, you may input a single host and port combination to connect to the Kafka cluster. However, it's recommended to input at least two host and port combinations to ensure successful connection in case a node is unavailable during Pega Platform restart.

  9. Optional: Configure SSL authentication settings for communication between the Pega Platform and the Kafka cluster.

    1. In the Security settings section, select the "Use SSL configuration" checkbox.

    2. Choose a truststore file containing a Kafka certificate or create a truststore record by clicking the "Open" icon in the Truststore field.

    3. Enable client authentication by selecting the "Use client certificate" checkbox and provide the Pega Platform private key and private key password credentials in the Keystore and Key password fields, respectively.

  10. Optional: If SASL authentication method is enabled in the Kafka cluster, configure SASL authentication settings for communication between the Pega Platform and the Kafka cluster. In the Authentication section, depending on the SASL authentication method configured in the Kafka cluster, perform one of the following actions:

    1. Enter login and password credentials.

    2. Input the Kerberos authentication key.

  11. Click "Save."


Application Settings -


The Application Settings rule setting enables your Kafka dataset to utilize distinct topics across various environments (e.g., development, staging, production) without necessitating modifications and saving of a dataset rule in each environment.


Message Format -


Kafka currently supports two message formats: JSON and Avro. If you opt for Avro as the message format, it's essential to preconfigure an Avro schema.


Kafka Datasets -


Each Kafka server or server cluster you connect to stores records in streams categorized as topics.

  • Starting from Pega® Platform 7.3, you can establish connections to Apache Kafka servers using a dedicated dataset rule. Apache Kafka, known for its fault-tolerant and scalable nature, serves as a valuable data source for conducting real-time analysis of customer records, including messages and calls, as they happen. The optimal method for leveraging Kafka datasets within your application is by incorporating them into Data Flow rules that encompass event strategies.

  • In order to access each topic from Pega Platform, it is necessary to create a Kafka dataset rule. During the configuration of a Kafka dataset, you have the option to select an existing topic from the designated Kafka configuration instance, or alternatively, create a new topic if the Kafka cluster is set up for automatic topic creation.

  • Furthermore, you have the flexibility to define partition keys for application during distributed data flow runs, as well as the ability to retrieve historical Kafka records. These historical records encompass data generated prior to the initiation of the real-time data flow run associated with this Kafka dataset.


Communication between embedded Kafka and Pega Platform -


The communication schema between Kafka and Pega Platform consists of the following components:

  1. Database: This stores the metadata for brokers and topics.

  2. Pega node: The Charlatan server, which initiates as part of the Pega Platform stream service, is responsible for accepting and managing incoming requests.

  3. Zookeeper protocol: Communication between a Kafka broker and a Charlatan server occurs through the use of the Zookeeper messaging protocol.

  4. Kafka: Utilizes the standard Kafka distribution. Kafka identifies the Charlatan server as a Zookeeper.




49 views0 comments

Comments

Couldn’t Load Comments
It looks like there was a technical problem. Try reconnecting or refreshing the page.
bottom of page