Does Kafka Guarantee Message Ordering?

Summary: Yes — Kafka guarantees message ordering within a partition. Ordering across multiple partitions is not guaranteed by default.

1. The Problem: Why Message Ordering Matters in Distributed Systems

In a monolithic application, operations execute sequentially within a single JVM — order is implicit. In a distributed, event-driven architecture built on Apache Kafka, that implicit contract disappears. Messages are produced by independent services, transported across a network, and consumed by multiple concurrent threads. Without intentional design, the sequence in which events were produced is not necessarily the sequence in which they are consumed.

Consider a real-world e-commerce order lifecycle:

Event 1 → ORDER_PLACED
Event 2 → PAYMENT_CONFIRMED
Event 3 → ORDER_SHIPPED

If a downstream service consumes ORDER_SHIPPED before PAYMENT_CONFIRMED, it dispatches an order that has not been paid for. If ORDER_PLACED arrives last, the system may attempt to update a record that does not yet exist in its local database. These are not edge cases — they are predictable failure modes in any system that does not enforce ordering at the infrastructure level.

Ordering is critical whenever events have a causal dependency — that is, when the correctness of processing Event N depends on Event N−1 having already been processed. Common examples in production systems include financial transaction ledgers, inventory state machines, user session pipelines, and audit logs.

2. Kafka Core Concepts

Before examining ordering guarantees, it is essential to understand the internal architecture that makes them possible.

What is Apache Kafka?

Apache Kafka is a distributed, fault-tolerant, append-only log system designed for high-throughput, low-latency event streaming. Originally developed at LinkedIn, it is now an Apache Software Foundation project used widely as the backbone of event-driven architectures.

Unlike traditional message queues where messages are deleted upon consumption, Kafka retains messages for a configurable period regardless of whether they have been consumed. This enables multiple consumers to independently replay or process the same stream of events.

Architectural Components

Component Role in the System

Producer The application that publishes messages to a Kafka topic

Topic A logical stream of related events, identified by name (e.g., `order-events`)

Partition A topic is subdivided into one or more partitions — each is an independent, ordered, append-only log

Broker A Kafka server responsible for storing partitions and serving producers and consumers

Consumer The application that reads and processes messages from a topic

Consumer Group A set of consumer instances that collectively read a topic, with each partition assigned to exactly one member

Message Key An optional byte array attached to a message; used to deterministically assign messages to a partition

Offset An immu , monotonically increasing integer that uniquely identifies each message within a partition

Replication Factor The number of brokers that store a copy of each partition for fault tolerance

Component	Role in the System
Producer	The application that publishes messages to a Kafka topic
Topic	A logical stream of related events, identified by name (e.g., `order-events`)
Partition	A topic is subdivided into one or more partitions — each is an independent, ordered, append-only log
Broker	A Kafka server responsible for storing partitions and serving producers and consumers
Consumer	The application that reads and processes messages from a topic
Consumer Group	A set of consumer instances that collectively read a topic, with each partition assigned to exactly one member
Message Key	An optional byte array attached to a message; used to deterministically assign messages to a partition
Offset	An immu , monotonically increasing integer that uniquely identifies each message within a partition
Replication Factor	The number of brokers that store a copy of each partition for fault tolerance

What is a Partition, Really?

A partition is the fundamental unit of parallelism and ordering in Kafka. Internally, each partition is stored as a sequence of log segment files on disk. Every message appended to a partition receives the next available offset — this process is atomic and sequential.

Partition 0 (on disk — append-only log):
  [Offset 0] ORDER_PLACED      for user-42  ← written first
  [Offset 1] PAYMENT_CONFIRMED for user-42  ← written second
  [Offset 2] ORDER_SHIPPED     for user-42  ← written third

Because appends are sequential and offsets are immutable, a consumer that reads Partition 0 from Offset 0 will always receive messages in exactly the order they were written. This is Kafka’s core ordering guarantee — and it is absolute within a single partition.

Why Multiple Partitions Break Global Ordering

A topic with multiple partitions distributes its messages across those partitions, potentially across different brokers on different machines. Each partition operates independently — there is no global clock or coordination mechanism between them.

Topic: order-events (3 partitions across 3 brokers)

Partition 0 (Broker A): [Offset 0] ORDER_PLACED      for user-42
Partition 1 (Broker B): [Offset 0] PAYMENT_CONFIRMED for user-42
Partition 2 (Broker C): [Offset 0] ORDER_SHIPPED     for user-42

When a consumer reads all three partitions, it receives one message from each partition in a non-deterministic interleaving. Broker B may respond faster than Broker A due to lower disk I/O or network latency. The consumer has no way to reconstruct the original sequence without additional coordination.

The foundational rule: Kafka guarantees ordering within a partition. It makes no guarantees about ordering across partitions.

3. How Kafka Enforces Ordering — The Partition Key Mechanism

The Role of the Message Key

The message key is the primary mechanism for enforcing ordering across a distributed topic. When a producer sends a message with a key, Kafka applies the default Murmur2 hash function to determine the target partition:

Target Partition = abs(murmur2(keyBytes)) % numberOfPartitions

Because this is a pure, deterministic function, the same key will always hash to the same partition — regardless of which producer instance sends it, which broker is the leader, or how many consumer instances are running.

Hash("user-42") % 3 = Partition 0  ← always, deterministically
Hash("user-99") % 3 = Partition 2  ← always, deterministically

This means all events for user-42 — across their entire lifecycle — are written sequentially to Partition 0 and consumed in that exact order.

What Happens Internally When a Key is Present

Producer sends three events for user-42:

Step 1: Serialise message → compute murmur2("user-42") % 3 = 0
Step 2: Route to Partition 0 leader on Broker A
Step 3: Broker A appends to log at Offset 0, 1, 2 — sequentially
Step 4: Replication occurs to follower brokers
Step 5: Producer receives acknowledgment (based on acks config)
Step 6: Consumer reads Partition 0, receives Offset 0 → 1 → 2 in order

The ordering contract is maintained at every step through the combination of deterministic routing and append-only writes.

What Happens Internally Without a Key

When no key is provided, Kafka’s default partitioner distributes messages using a round-robin or sticky partitioning strategy (sticky partitioning was introduced in Kafka 2.4 to improve batching efficiency). In both strategies, consecutive messages for the same logical entity can land in different partitions:

Message 1 (ORDER_PLACED)      → Partition 0 (Broker A)
Message 2 (PAYMENT_CONFIRMED) → Partition 1 (Broker B)
Message 3 (ORDER_SHIPPED)     → Partition 2 (Broker C)

Since consumers read partitions independently and Broker B may respond faster, the consumer receives PAYMENT_CONFIRMED before ORDER_PLACED. The causal sequence is broken with no error, no exception, and no warning.

4. The Spring Boot Implementation

Project Domain

We will build an order lifecycle event system. Two Spring Boot services communicate through a Kafka topic:

`OrderProducerService` — publishes `OrderEvent` messages representing state transitions in an order’s lifecycle

`OrderConsumerService` — reads and processes those events downstream

Domain Model

// OrderEvent.java
package com.example.kafka.model;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
@AllArgsConstructor
public class OrderEvent {

    private String orderId;
    private String userId;
    private String status;      // ORDER_PLACED | PAYMENT_CONFIRMED | ORDER_SHIPPED
    private long   timestamp;

    @Override
    public String toString() {
        return String.format("[orderId=%-8s | userId=%-8s | status=%-20s | ts=%d]",
                orderId, userId, status, timestamp);
    }
}

userId is the field we will use as the Kafka partition key — ensuring all events for the same user are routed to the same partition and processed in sequence.

Maven Dependencies (`pom.xml`)

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
    </dependency>

    <!-- Spring Kafka: auto-configuration, KafkaTemplate, @KafkaListener -->
    <dependency>
        <groupId>org.springframework.kafka</groupId>
        <artifactId>spring-kafka</artifactId>
    </dependency>

    <!-- Jackson: JSON serialisation for OrderEvent -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
    </dependency>

    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
</dependencies>

Application Configuration (`application.yml`)

Every configuration property below has a direct bearing on ordering, reliability, or both. Each is explained inline.

spring:
  kafka:
    bootstrap-servers: localhost:9092

    # ─── PRODUCER CONFIGURATION ─────────────────────────────────────────────────
    producer:
      key-serializer:   org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer

      properties:
        # IDEMPOTENCE: Kafka assigns each producer a unique Producer ID (PID).
        # Every message carries a per-partition sequence number.
        # If a retry delivers a duplicate, the broker detects it via PID + SeqNo
        # and discards it — preventing both duplicates and out-of-order writes.
        enable.idempotence: true

        # ACKNOWLEDGMENT LEVEL: "all" (equivalent to -1) requires the partition
        # leader AND all in-sync replicas (ISR) to persist the message before
        # returning acknowledgment. This prevents data loss on broker failure
        # and ensures replay-consistent ordering after a failover.
        acks: all

        # IN-FLIGHT REQUESTS: The number of unacknowledged requests the producer
        # can have open simultaneously per broker connection.
        # With idempotence enabled, Kafka safely supports up to 5 in-flight
        # requests while preserving ordering. Without idempotence, this must
        # be set to 1 to avoid reordering on retry.
        max.in.flight.requests.per.connection: 5

        # RETRY POLICY: Retry indefinitely for transient broker or network errors.
        # Combined with idempotence, retries are safe and order-preserving.
        retries: 2147483647
        retry.backoff.ms: 100
        delivery.timeout.ms: 120000   # Overall timeout per send attempt: 2 minutes

    # ─── CONSUMER CONFIGURATION ─────────────────────────────────────────────────
    consumer:
      group-id: order-processing-group
      auto-offset-reset: earliest     # On first start, read from the beginning of the topic
      enable-auto-commit: false        # Disable auto-commit; offsets committed manually after processing

      key-deserializer:   org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer

      properties:
        spring.json.trusted.packages: "com.example.kafka.model"
        # POLL BATCH SIZE: Limit records fetched per poll. A lower value reduces
        # the window within which out-of-order processing can occur in multi-threaded
        # consumers. Tune based on processing latency per record.
        max.poll.records: 50
        fetch.min.bytes: 1
        fetch.max.wait.ms: 500

    # ─── LISTENER CONFIGURATION ─────────────────────────────────────────────────
    listener:
      # CONCURRENCY: Number of concurrent consumer threads.
      # Each thread is exclusively assigned to one partition.
      # Setting concurrency = number of partitions ensures one thread per partition,
      # which is the optimal configuration for maintaining per-partition ordering.
      concurrency: 3

      # ACK MODE: MANUAL_IMMEDIATE commits the offset synchronously immediately
      # after acknowledgment.acknowledge() is called. This ensures the offset
      # is only advanced after the message has been successfully processed —
      # preventing silent message loss on processing failure.
      ack-mode: MANUAL_IMMEDIATE

# ─── APPLICATION-LEVEL TOPIC CONFIGURATION ──────────────────────────────────────
app:
  kafka:
    topic:
      order-events: order-events
      partitions: 3
      replication-factor: 1    # Set to min(3, broker count) in production

Topic Provisioning

// KafkaTopicConfig.java
package com.example.kafka.config;

import org.apache.kafka.clients.admin.NewTopic;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.config.TopicBuilder;

@Configuration
public class KafkaTopicConfig {

    @Value("${app.kafka.topic.order-events}")
    private String topicName;

    @Value("${app.kafka.topic.partitions}")
    private int partitions;

    @Value("${app.kafka.topic.replication-factor}")
    private int replicationFactor;

    /**
     * Spring Boot's KafkaAdmin bean (auto-configured) detects NewTopic beans
     * and creates the topic on the broker at application startup if it does
     * not already exist. No manual kafka-topics.sh command is required.
     *
     * IMPORTANT: Never alter the partition count of this topic after deployment.
     * Doing so changes the hash-to-partition mapping for all existing keys,
     * breaking the ordering guarantee for in-flight and future messages.
     */
    @Bean
    public NewTopic orderEventsTopic() {
        return TopicBuilder
                .name(topicName)
                .partitions(partitions)
                .replicas(replicationFactor)
                .build();
    }
}

Programmatic Producer Configuration

The application.yml approach is sufficient for most applications. For teams that prefer to centralise all Kafka configuration in Java (e.g., for easier testing or environment-specific overrides), the equivalent programmatic configuration is:

// KafkaProducerConfig.java
package com.example.kafka.config;

import com.example.kafka.model.OrderEvent;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.StringSerializer;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.core.DefaultKafkaProducerFactory;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.core.ProducerFactory;
import org.springframework.kafka.support.serializer.JsonSerializer;

import java.util.HashMap;
import java.util.Map;

@Configuration
public class KafkaProducerConfig {

    @Value("${spring.kafka.bootstrap-servers}")
    private String bootstrapServers;

    @Bean
    public ProducerFactory<String, OrderEvent> producerFactory() {
        Map<String, Object> config = new HashMap<>();

        config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,      bootstrapServers);
        config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,   StringSerializer.class);
        config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);

        // Idempotent delivery: guarantees exactly-once writes per partition,
        // preserving order even in the presence of producer retries.
        config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,                      true);
        config.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,          5);
        config.put(ProducerConfig.ACKS_CONFIG,                                    "all");
        config.put(ProducerConfig.RETRIES_CONFIG,                                 Integer.MAX_VALUE);
        config.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG,                        100);

        return new DefaultKafkaProducerFactory<>(config);
    }

    @Bean
    public KafkaTemplate<String, OrderEvent> kafkaTemplate() {
        return new KafkaTemplate<>(producerFactory());
    }
}

5. Scenario A — Producing Without a Key (Ordering Violated)

This scenario demonstrates the consequence of omitting the partition key. It is a common oversight in initial implementations and a frequent source of subtle, environment-dependent bugs in production.

// OrderProducerService.java  — illustrating the incorrect approach
package com.example.kafka.producer;

import com.example.kafka.model.OrderEvent;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.support.SendResult;
import org.springframework.stereotype.Service;

import java.util.concurrent.CompletableFuture;

@Slf4j
@Service
@RequiredArgsConstructor
public class OrderProducerService {

    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;

    @Value("${app.kafka.topic.order-events}")
    private String topic;

    /**
     * INCORRECT: No partition key is specified.
     * Kafka applies its default partitioning strategy (round-robin or sticky),
     * distributing consecutive messages for the same user across different partitions.
     * Since partitions are consumed independently, the consumer receives these
     * messages in an undefined order — determined by broker latency, not
     * business logic.
     */
    public void sendWithoutKey(OrderEvent event) {
        CompletableFuture<SendResult<String, OrderEvent>> future =
                kafkaTemplate.send(topic, event); // no key — third argument omitted

        future.whenComplete((result, ex) -> {
            if (ex == null) {
                log.warn("[NO KEY] Partition: {} | Offset: {} | Status: {}",
                        result.getRecordMetadata().partition(),
                        result.getRecordMetadata().offset(),
                        event.getStatus());
            }
        });
    }
}

Observed behaviour in logs:

# Producer publishes three events for user-42, sequentially:
[NO KEY] Partition: 0 | Offset: 4 | Status: ORDER_PLACED
[NO KEY] Partition: 1 | Offset: 2 | Status: PAYMENT_CONFIRMED
[NO KEY] Partition: 2 | Offset: 6 | Status: ORDER_SHIPPED

# Broker B (Partition 1) responds with lower latency.
# Broker C (Partition 2) responds next.
# Broker A (Partition 0) responds last.

# Consumer receives:
[CONSUMER] Partition 2 | ORDER_SHIPPED     ← processed first  ❌
[CONSUMER] Partition 1 | PAYMENT_CONFIRMED ← processed second ❌
[CONSUMER] Partition 0 | ORDER_PLACED      ← processed last   ❌

The order in which the consumer processes these events is governed entirely by broker-level timing — not by the application’s intent. This failure mode is silent: no exception is thrown, no alert fires, and the system appears to function correctly until a downstream service acts on the out-of-order data.

6. Scenario B — Producing With a Key (Ordering Enforced)

// OrderProducerService.java  — the correct, production-grade approach
package com.example.kafka.producer;

import com.example.kafka.model.OrderEvent;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.support.SendResult;
import org.springframework.stereotype.Service;

import java.util.concurrent.Completable
Future;

@Slf4j
@Service
@RequiredArgsConstructor
public class OrderProducerService {

    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;

    @Value("${app.kafka.topic.order-events}")
    private String topic;

    /**
     * CORRECT: userId is used as the partition key.
     *
     * Kafka applies murmur2(userId) % numPartitions to determine the target
     * partition. Because this is a deterministic function, every event for
     * the same user will always be routed to the same partition, regardless
     * of which producer instance or broker is involved.
     *
     * The broker appends messages to that partition sequentially.
     * The consumer reads that partition sequentially.
     * Per-user ordering is therefore guaranteed end-to-end.
     */
    public void sendOrderEvent(OrderEvent event) {
        String partitionKey = event.getUserId();

        CompletableFuture<SendResult<String, OrderEvent>> future =
                kafkaTemplate.send(topic, partitionKey, event);

        future.whenComplete((result, ex) -> {
            if (ex == null) {
                log.info("[SENT] key={} | Partition: {} | Offset: {} | Status: {}",
                        partitionKey,
                        result.getRecordMetadata().partition(),
                        result.getRecordMetadata().offset(),
                        event.getStatus());
            } else {
                log.error("[FAILED] key={} | Status: {} | Reason: {}",
                        partitionKey, event.getStatus(), ex.getMessage());
            }
        });
    }
}

REST Controller to Simulate Concurrent Order Pipelines

// OrderController.java
package com.example.kafka.controller;

import com.example.kafka.model.OrderEvent;
import com.example.kafka.producer.OrderProducerService;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/orders")
@RequiredArgsConstructor
public class OrderController {

    private final OrderProducerService producerService;

    /**
     * POST /orders/simulate
     *
     * Publishes a complete, interleaved order lifecycle for two users.
     * The two pipelines (user-42 and user-99) run concurrently in production —
     * this simulation interleaves their events to demonstrate that Kafka routes
     * each user's events to a distinct partition, preserving per-user ordering
     * while processing both pipelines in parallel.
     */
    @PostMapping("/simulate")
    public String simulateOrders() {

        // User A — events published in causal order
        producerService.sendOrderEvent(
            new OrderEvent("ORD-001", "user-42", "ORDER_PLACED",      System.nanoTime()));
        producerService.sendOrderEvent(
            new OrderEvent("ORD-001", "user-42", "PAYMENT_CONFIRMED", System.nanoTime()));
        producerService.sendOrderEvent(
            new OrderEvent("ORD-001", "user-42", "ORDER_SHIPPED",     System.nanoTime()));

        // User B — events interleaved with User A
        producerService.sendOrderEvent(
            new OrderEvent("ORD-002", "user-99", "ORDER_PLACED",      System.nanoTime()));
        producerService.sendOrderEvent(
            new OrderEvent("ORD-002", "user-99", "PAYMENT_CONFIRMED", System.nanoTime()));
        producerService.sendOrderEvent(
            new OrderEvent("ORD-002", "user-99", "ORDER_SHIPPED",     System.nanoTime()));

        return "Order events published successfully.";
    }
}

Observed behaviour in logs:

# Kafka's deterministic routing:
# murmur2("user-42") % 3 = Partition 0
# murmur2("user-99") % 3 = Partition 2

[SENT] key=user-42 | Partition: 0 | Offset: 0 | Status: ORDER_PLACED
[SENT] key=user-99 | Partition: 2 | Offset: 0 | Status: ORDER_PLACED
[SENT] key=user-42 | Partition: 0 | Offset: 1 | Status: PAYMENT_CONFIRMED
[SENT] key=user-99 | Partition: 2 | Offset: 1 | Status: PAYMENT_CONFIRMED
[SENT] key=user-42 | Partition: 0 | Offset: 2 | Status: ORDER_SHIPPED
[SENT] key=user-99 | Partition: 2 | Offset: 2 | Status: ORDER_SHIPPED

# Consumer Thread 1 reads Partition 0 (user-42):
[CONSUMER P-0] Offset 0 → ORDER_PLACED       ✅
[CONSUMER P-0] Offset 1 → PAYMENT_CONFIRMED  ✅
[CONSUMER P-0] Offset 2 → ORDER_SHIPPED      ✅

# Consumer Thread 3 reads Partition 2 (user-99):
[CONSUMER P-2] Offset 0 → ORDER_PLACED       ✅
[CONSUMER P-2] Offset 1 → PAYMENT_CONFIRMED  ✅
[CONSUMER P-2] Offset 2 → ORDER_SHIPPED      ✅

Both users are processed in parallel, with strict per-user event ordering maintained throughout. ✅

7. Consumer Service — Reliable, Ordered Message Processing

// OrderConsumerService.java
package com.example.kafka.consumer;

import com.example.kafka.model.OrderEvent;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.support.Acknowledgment;
import org.springframework.stereotype.Service;

@Slf4j
@Service
public class OrderConsumerService {

    /**
     * concurrency = "3" instructs Spring Kafka to create three consumer threads,
     * each exclusively assigned to one partition. This is the correct
     * configuration for a 3-partition topic — it ensures no two threads contend
     * for the same partition and that messages within each partition are processed
     * strictly in offset order.
     *
     * The ConsumerRecord wrapper gives access to both the partition key
     * (record.key()) and Kafka metadata (partition, offset) — essential for
     * structured logging and debugging in production environments.
     */
    @KafkaListener(
        topics      = "${app.kafka.topic.order-events}",
        groupId     = "order-processing-group",
        concurrency = "3"
    )
    public void consume(ConsumerRecord<String, OrderEvent> record,
                        Acknowledgment acknowledgment) {

        String     userId = record.key();
        OrderEvent event  = record.value();
        int        part   = record.partition();
        long       offset = record.offset();

        log.info("[CONSUMER] Partition: {} | Offset: {} | userId: {} | Status: {}",
                part, offset, userId, event.getStatus());

        processOrderEvent(event);

        /**
         * Manual offset commitment: the offset is advanced only after
         * processOrderEvent() returns successfully. If an exception is thrown,
         * this line is never reached — Kafka will re-deliver the message on
         * the next poll, ensuring no event is silently skipped due to a
         * processing failure.
         */
        acknowledgment.acknowledge();
    }

    private void processOrderEvent(OrderEvent event) {
        switch (event.getStatus()) {
            case "ORDER_PLACED"       -> log.info("Order initialised: {}", event.getOrderId());
            case "PAYMENT_CONFIRMED"  -> log.info("Payment verified for order: {}", event.getOrderId());
            case "ORDER_SHIPPED"      -> log.info("Fulfilment triggered for order: {}", event.getOrderId());
            default                   -> log.warn("Unrecognised status '{}' for order: {}",
                                                   event.getStatus(), event.getOrderId());
        }
    }
}

8. Idempotent Producer — Ordering Guarantees Under Failure Conditions

The Problem: Retries and Reordering

A producer retry is not inherently safe. Consider the following failure scenario without idempotence:

Step 1: Producer sends Batch A [PAYMENT_CONFIRMED] to broker.
Step 2: Broker writes Batch A. Network drops before acknowledgment.
Step 3: Producer does not receive ack. Timeout fires → producer retries Batch A.
Step 4: Meanwhile, producer sends Batch B [ORDER_SHIPPED].
Step 5: Batch B is acknowledged first.
Step 6: Retry of Batch A arrives and is written after Batch B.

Result: Partition log = [ORDER_SHIPPED (Offset 1), PAYMENT_CONFIRMED (Offset 2)]
Consumer reads: ORDER_SHIPPED before PAYMENT_CONFIRMED ❌

The Solution: Producer ID and Sequence Numbers

When enable.idempotence = true, the Kafka broker assigns each producer session a unique Producer ID (PID). The producer attaches a monotonically increasing sequence number to every message it sends, scoped to a specific (PID, Partition) pair.

Producer PID = 101

Message 1: PID=101, Partition=0, SeqNo=0, Status=ORDER_PLACED
Message 2: PID=101, Partition=0, SeqNo=1, Status=PAYMENT_CONFIRMED
Message 3: PID=101, Partition=0, SeqNo=2, Status=ORDER_SHIPPED

The broker maintains the last accepted sequence number per (PID, Partition). Upon receiving a message:

If SeqNo == lastAccepted + 1 → accept and write.
If SeqNo <= lastAccepted → duplicate detected, silently discard.
If SeqNo > lastAccepted + 1 → sequence gap detected, return an error (OutOfOrderSequenceException).

Retry scenario with idempotence:

Producer sends PAYMENT_CONFIRMED with PID=101, SeqNo=1.
Network drops. Producer retries: PAYMENT_CONFIRMED, PID=101, SeqNo=1.
Broker: "SeqNo 1 from PID 101 already written." → Discard. ✅
Ordering preserved. No duplicate in the log. ✅

This is already configured in our application.yml and KafkaProducerConfig.java. No application code changes are required.

9. Custom Partitioner — Domain-Driven Partition Assignment

The default key-hash strategy distributes messages uniformly across partitions. In some production scenarios, business requirements demand more targeted routing — for example, isolating high-value transactions to a dedicated partition for priority processing or SLA compliance.

// PriorityOrderPartitioner.java
package com.example.kafka.config;

import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

/**
 * A domain-aware partitioner that routes messages based on business attributes
 * of the event payload, rather than solely on the message key.
 *
 * Routing strategy:
 *   - Orders with amount > 10,000 → Partition 0 (dedicated high-value lane)
 *   - All other orders            → Partitions 1..N-1 (distributed by key hash)
 *
 * This ensures high-value orders receive dedicated consumer resources and
 * are not delayed by the volume of standard-value transactions.
 */
public class PriorityOrderPartitioner implements Partitioner {

    private final ObjectMapper objectMapper = new ObjectMapper();

    @Override
    public int partition(String topic, Object key, byte[] keyBytes,
                         Object value, byte[] valueBytes, Cluster cluster) {

        int totalPartitions = cluster.partitionCountForTopic(topic);

        try {
            Map<?, ?> payload    = objectMapper.readValue(valueBytes, Map.class);
            Object    amountObj  = payload.get("amount");

            if (amountObj != null) {
                double amount = Double.parseDouble(amountObj.toString());
                if (amount > 10_000) {
                    return 0;   // Priority lane: dedicated partition for high-value orders
                }
            }
        } catch (Exception e) {
            // Fall through to default routing on deserialisation failure
        }

        // Standard routing: distribute across non-priority partitions using key hash
        if (keyBytes == null) return 1;
        return (Math.abs(key.hashCode()) % (totalPartitions - 1)) + 1;
    }

    @Override public void close()                           { }
    @Override public void configure(Map<String, ?> configs) { }
}

spring:
  kafka:
    producer:
      properties:
        partitioner.class: com.example.kafka.config.PriorityOrderPartitioner

10. Common Ordering Failures — Root Cause Analysis

Understanding why ordering breaks is as important as knowing how to enforce it. The following section covers the most frequently encountered failure patterns in production Kafka systems, along with their root causes and recommended remediation.

10.1 Absent Partition Key — Silent Round-Robin Distribution

Root cause: When no key is provided, Kafka’s default partitioner distributes messages across all available partitions using a round-robin or sticky strategy. Logically related messages for the same entity land in different partitions, and the consumer’s read order becomes a function of broker-level latency rather than the intended event sequence.

// Problematic pattern — no key provided:
kafkaTemplate.send(topic, event);           // key = null → non-deterministic routing

// Correct pattern — entity identifier used as key:
kafkaTemplate.send(topic, event.getUserId(), event);   // deterministic, ordered routing

10.2 Partition Count Modification on a Live Topic

Root cause: Kafka’s partition assignment formula is murmur2(key) % numPartitions. If numPartitions changes, the same key evaluates to a different partition. Messages written before the change reside in the old partition; messages written after reside in a new one. Consumers reading both partitions receive an interleaved, out-of-order view of the event history for that key.

# Current state: user-42 → murmur2("user-42") % 3 = Partition 0
kafka-topics.sh --alter --topic order-events --partitions 6

# Post-change: user-42 → murmur2("user-42") % 6 = Partition 3
# Historical events: Partition 0
# New events:        Partition 3
# Consumer sees two independent timelines for the same user ❌

Recommendation: Determine the partition count during initial system design based on projected throughput and consumer parallelism. Over-provisioning (e.g., starting at 12 partitions when 3 are sufficient today) is far less disruptive than modifying partition count after deployment.

10.3 Consumer Concurrency Exceeding Partition Count

Root cause: Each partition can be assigned to at most one consumer within a consumer group. If concurrency exceeds the number of partitions, the surplus consumer threads receive no partition assignment and remain idle for the lifetime of the group. The active threads may experience uneven load, and the idle threads represent wasted resources.

# Misconfiguration — surplus idle threads with no partition assignment:
listener:
  concurrency: 10     # 10 threads, but topic has only 3 partitions
                      # → 7 threads permanently idle

# Correct configuration — one thread per partition:
listener:
  concurrency: 3      # 3 threads, 3 partitions → balanced, no idle resources

10.4 Offset Auto-Commit Causing Message Loss Under Failure

Root cause: When enable-auto-commit: true, Kafka commits the consumer offset on a periodic schedule (default: every 5 seconds), regardless of whether the application has successfully processed the fetched messages. If a processing failure occurs after an auto-commit but before the next poll, those messages are considered consumed and will not be redelivered. This breaks both reliability and, indirectly, ordering — because downstream state derived from later messages may be processed without the context of the lost ones.

// Risk-prone pattern — auto-commit does not guarantee processing completion:
@KafkaListener(topics = "order-events")
public void consume(OrderEvent event) {
    // If an exception is thrown here, the offset may already be committed.
    // The message is silently lost; downstream state becomes inconsistent.
    processOrderEvent(event);
}

// Reliable pattern — offset is committed only after verified processing:
@KafkaListener(topics = "${app.kafka.topic.order-events}")
public void consume(ConsumerRecord<String, OrderEvent> record,
                    Acknowledgment acknowledgment) {
    try {
        processOrderEvent(record.value());
        acknowledgment.acknowledge();   // Offset committed after successful processing
    } catch (Exception ex) {
        log.error("Processing failure for offset {}. Message will be re-delivered.",
                  record.offset(), ex);
        // acknowledgment is not called → Kafka re-delivers on next poll
    }
}

10.5 In-Flight Request Reordering Without Idempotence

Root cause: With max.in.flight.requests.per.connection > 1 and idempotence disabled, a producer may have multiple batches in transit simultaneously. If Batch 1 fails and is retried while Batch 2 has already been acknowledged, Batch 1’s retry is appended after Batch 2 in the partition log — reversing the intended write order.

# Produces ordering risk — multiple in-flight requests without idempotence:
producer:
  properties:
    enable.idempotence: false
    max.in.flight.requests.per.connection: 5   # ← unsafe combination

# Safe option 1 — idempotence ON (recommended; allows up to 5 in-flight):
producer:
  properties:
    enable.idempotence: true
    max.in.flight.requests.per.connection: 5   # ← safe with idempotence

# Safe option 2 — strictly one in-flight (maximum ordering safety, reduced throughput):
producer:
  properties:
    enable.idempotence: false
    max.in.flight.requests.per.connection: 1   # ← safe without idempotence

11. Consumer Group Mechanics and Ordered Parallel Processing

How Kafka Achieves Both Ordering and Parallelism

These two properties appear contradictory: parallelism suggests multiple workers processing simultaneously; ordering requires sequential execution. Kafka resolves this through partition-level isolation.

Each partition within a consumer group is assigned to exactly one consumer instance. That consumer reads the partition’s messages in strict offset order. Multiple consumers — each owning a different partition — execute in parallel without any shared state or coordination overhead.

Topic: order-events — 3 partitions

Consumer Group: order-processing-group
  ├─ Consumer Thread 1 ──► Partition 0 (users whose key hashes to 0: user-42, user-15 ...)
  ├─ Consumer Thread 2 ──► Partition 1 (users whose key hashes to 1: user-88, user-03 ...)
  └─ Consumer Thread 3 ──► Partition 2 (users whose key hashes to 2: user-99, user-21 ...)

Thread 1 processes user-42’s events in order. Thread 3 processes user-99’s events in order. Both run concurrently — there is no contention between them. This is the model that enables Kafka to scale horizontally while preserving per-entity event ordering.

Partition Rebalancing on Consumer Failure

If a consumer instance fails or is restarted, Kafka triggers a rebalance. The group coordinator reassigns the orphaned partition to a surviving consumer. That consumer resumes from the last committed offset — the point at which the previous consumer last called `acknowledgment.acknowledge()`. No messages are lost or skipped, and the ordering guarantee is maintained across the transition.

This is precisely why manual offset management (`MANUAL_IMMEDIATE` ack-mode) is critical in ordered-processing scenarios. An auto-committed offset may represent a point beyond what was actually processed, causing gaps in the event history post-rebalance.

12. Ordering Guarantee Reference

Scenario Ordering Guaranteed? Notes

Single partition, any configuration ✅ Global Throughput limited to one broker’s write capacity

Multiple partitions + stable entity key ✅ Per entity Standard production pattern

Multiple partitions + no key ❌ No guarantee Default partitioner distributes non-deterministically

Idempotent producer + `acks=all` ✅ Preserved on retry Required for production reliability

`concurrency` equals partition count ✅ Optimal One thread per partition; no idle resources

Partition count altered post-deployment ❌ Broken Key-to-partition mapping changes; ordering violated

`enable-auto-commit: true` ⚠️ Loss risk Processing failure may silently advance offset

`max.in.flight > 1` without idempotence ❌ Retry reordering Unsafe; set to 1 or enable idempotence

Scenario	Ordering Guaranteed?	Notes
Single partition, any configuration	✅ Global	Throughput limited to one broker’s write capacity
Multiple partitions + stable entity key	✅ Per entity	Standard production pattern
Multiple partitions + no key	❌ No guarantee	Default partitioner distributes non-deterministically
Idempotent producer + `acks=all`	✅ Preserved on retry	Required for production reliability
`concurrency` equals partition count	✅ Optimal	One thread per partition; no idle resources
Partition count altered post-deployment	❌ Broken	Key-to-partition mapping changes; ordering violated
`enable-auto-commit: true`	⚠️ Loss risk	Processing failure may silently advance offset
`max.in.flight > 1` without idempotence	❌ Retry reordering	Unsafe; set to 1 or enable idempotence

13. Configuration Reference

Property Recommended Value Impact on Ordering

`enable.idempotence` `true` Eliminates duplicate writes and retry-induced reordering

`acks` `all` Ensures all replicas confirm before success; prevents loss on failover

`max.in.flight.requests.per.connection` `5` (with idempotence) / `1` (without) Controls retry-reordering risk

`retries` `Integer.MAX_VALUE` Prevents silent message drop on transient failure

`retry.backoff.ms` `100` Avoids aggressive retry storms

`delivery.timeout.ms` `120000` Maximum time before a send is considered failed

`enable-auto-commit` `false` Prevents offset advancement before processing is complete

`ack-mode` `MANUAL_IMMEDIATE` Offset committed only after explicit acknowledgment

`concurrency` Equal to partition count One thread per partition; no idle threads

`max.poll.records` `50` Limits in-flight records per consumer thread

Property	Recommended Value	Impact on Ordering
`enable.idempotence`	`true`	Eliminates duplicate writes and retry-induced reordering
`acks`	`all`	Ensures all replicas confirm before success; prevents loss on failover
`max.in.flight.requests.per.connection`	`5` (with idempotence) / `1` (without)	Controls retry-reordering risk
`retries`	`Integer.MAX_VALUE`	Prevents silent message drop on transient failure
`retry.backoff.ms`	`100`	Avoids aggressive retry storms
`delivery.timeout.ms`	`120000`	Maximum time before a send is considered failed
`enable-auto-commit`	`false`	Prevents offset advancement before processing is complete
`ack-mode`	`MANUAL_IMMEDIATE`	Offset committed only after explicit acknowledgment
`concurrency`	Equal to partition count	One thread per partition; no idle threads
`max.poll.records`	`50`	Limits in-flight records per consumer thread

14. Questions and Answers

Q: Does Kafka guarantee message ordering?

Kafka guarantees message ordering within a partition. Messages assigned to the same partition are written and consumed in strict offset order. Across multiple partitions, there is no global ordering guarantee — consumers read partitions independently and the relative order is determined by broker-level timing.

Q: How would you implement ordered processing in a Spring Boot Kafka application?

Assign a stable entity identifier (such as `userId` or `orderId`) as the partition key in `kafkaTemplate.send(topic, key, value)`. Configure the producer with `enable.idempotence=true` and `acks=all`. Set `concurrency` in the `@KafkaListener` equal to the number of partitions, and use `MANUAL_IMMEDIATE` ack-mode to ensure offsets are committed only after successful processing.

Q: What is the consequence of not providing a partition key?

Without a key, Kafka uses round-robin or sticky partitioning, distributing consecutive messages for the same entity across different partitions. Since partitions are consumed independently, the consumer receives those messages in an order determined by broker latency — not by the producer’s intent. This failure is silent: no exception is raised and no alert fires.

Q: What is an idempotent producer and how does it relate to ordering?

An idempotent producer is one configured with `enable.idempotence=true`. Kafka assigns the producer a unique Producer ID (PID) and tracks a sequence number per `(PID, partition)` pair. If a retry delivers a message with a sequence number the broker has already accepted, the duplicate is discarded without being written. This eliminates both duplicate records and the reordering that can result from retry races.

Q: Why is modifying the partition count of a live topic dangerous?

Partition assignment uses `murmur2(key) % numPartitions`. Changing `numPartitions` changes the target partition for every existing key. Historical messages for a key remain in the old partition while new messages go to a different one. Consumers reading both partitions receive an interleaved, causally inconsistent view of the event history.

Q: What is the relationship between `concurrency` and partition count?

`concurrency` in a `@KafkaListener` creates that many consumer threads. Kafka assigns each thread to at most one partition. If `concurrency < numPartitions`, some partitions are read by the same thread sequentially — reducing parallelism. If `concurrency > numPartitions`, surplus threads receive no assignment and remain idle. The optimal setting is `concurrency = numPartitions`.

Q: Why should `enable-auto-commit` be set to `false` in ordered-processing systems?

Auto-commit advances the offset on a periodic schedule, regardless of processing outcome. If a processing failure occurs after an offset is auto-committed, the message will not be redelivered — it is permanently lost from the consumer’s perspective. Manual acknowledgment decouples offset advancement from the polling cycle and ties it directly to verified processing success, ensuring that no message is silently skipped and that ordered processing can resume correctly after a failure.

15. Conclusion

Apache Kafka does not make ordering easy by accident — it makes it possible by design. The append-only partition log, deterministic key-based routing, and exclusive partition-to-consumer assignment are all deliberate architectural choices that, when used correctly, deliver strong per-entity ordering guarantees at scale.

In a Spring Boot application, enforcing those guarantees requires five intentional decisions:

Select a meaningful partition key — use a stable entity identifier (`userId`, `orderId`) that represents the ordering boundary for your business domain.

Enable idempotent delivery — configure `enable.idempotence=true` and `acks=all` to protect ordering and eliminate duplicates under failure conditions.

Align concurrency with partition count — set `concurrency` equal to the number of partitions so each partition is served by a dedicated thread.

Use manual offset management — configure `enable-auto-commit=false` and `ack-mode=MANUAL_IMMEDIATE` so offsets are only advanced after verified processing completion.

Treat partition count as immutable post-deployment — plan partition capacity upfront and avoid alterations on live topics to preserve key-to-partition routing consistency.

Applied together, these decisions ensure that Kafka delivers on its ordering guarantees reliably, predictably, and at production scale.

Does Kafka Guarantee Message Ordering?

Does Kafka Guarantee Message Ordering?

1. The Problem: Why Message Ordering Matters in Distributed Systems

2. Kafka Core Concepts

Before examining ordering guarantees, it is essential to understand the internal architecture that makes them possible.

What is Apache Kafka?

Architectural Components

What is a Partition, Really?

Why Multiple Partitions Break Global Ordering

3. How Kafka Enforces Ordering — The Partition Key Mechanism

The Role of the Message Key

What Happens Internally When a Key is Present

What Happens Internally Without a Key

4. The Spring Boot Implementation

Project Domain

We will build an order lifecycle event system. Two Spring Boot services communicate through a Kafka topic: OrderProducerService — publishes OrderEvent messages representing state transitions in an order’s lifecycle OrderConsumerService — reads and processes those events downstream

Domain Model

Maven Dependencies (pom.xml)

Application Configuration (application.yml)

Topic Provisioning

Programmatic Producer Configuration

5. Scenario A — Producing Without a Key (Ordering Violated)

6. Scenario B — Producing With a Key (Ordering Enforced)

REST Controller to Simulate Concurrent Order Pipelines

7. Consumer Service — Reliable, Ordered Message Processing

8. Idempotent Producer — Ordering Guarantees Under Failure Conditions

The Problem: Retries and Reordering

The Solution: Producer ID and Sequence Numbers

9. Custom Partitioner — Domain-Driven Partition Assignment

10. Common Ordering Failures — Root Cause Analysis

Understanding why ordering breaks is as important as knowing how to enforce it. The following section covers the most frequently encountered failure patterns in production Kafka systems, along with their root causes and recommended remediation.

10.1 Absent Partition Key — Silent Round-Robin Distribution

10.2 Partition Count Modification on a Live Topic

10.3 Consumer Concurrency Exceeding Partition Count

10.4 Offset Auto-Commit Causing Message Loss Under Failure

10.5 In-Flight Request Reordering Without Idempotence

11. Consumer Group Mechanics and Ordered Parallel Processing

How Kafka Achieves Both Ordering and Parallelism

Partition Rebalancing on Consumer Failure

12. Ordering Guarantee Reference

13. Configuration Reference

14. Questions and Answers

15. Conclusion

You Might Like

Post a Comment

Post a Comment

Contact Form

We will build an order lifecycle event system. Two Spring Boot services communicate through a Kafka topic:

`OrderProducerService` — publishes `OrderEvent` messages representing state transitions in an order’s lifecycle

`OrderConsumerService` — reads and processes those events downstream

Maven Dependencies (`pom.xml`)

Application Configuration (`application.yml`)