Does Kafka Guarantee Message Ordering?
Yes — Kafka guarantees message ordering within a partition. Ordering across multiple partitions is not guaranteed by default.
Yes — Kafka guarantees message ordering within a partition. Ordering across multiple partitions is not guaranteed by default.
1. The Problem: Why Message Ordering Matters in Distributed Systems
In a monolithic application, operations execute sequentially within a single JVM — order is implicit. In a distributed, event-driven architecture built on Apache Kafka, that implicit contract disappears. Messages are produced by independent services, transported across a network, and consumed by multiple concurrent threads. Without intentional design, the sequence in which events were produced is not necessarily the sequence in which they are consumed.
Consider a real-world e-commerce order lifecycle:
Event 1 → ORDER_PLACED Event 2 → PAYMENT_CONFIRMED Event 3 → ORDER_SHIPPED
If a downstream service consumes ORDER_SHIPPED before PAYMENT_CONFIRMED, it dispatches an order that has not been paid for. If ORDER_PLACED arrives last, the system may attempt to update a record that does not yet exist in its local database. These are not edge cases — they are predictable failure modes in any system that does not enforce ordering at the infrastructure level.
Ordering is critical whenever events have a causal dependency — that is, when the correctness of processing Event N depends on Event N−1 having already been processed. Common examples in production systems include financial transaction ledgers, inventory state machines, user session pipelines, and audit logs.
2. Kafka Core Concepts
Before examining ordering guarantees, it is essential to understand the internal architecture that makes them possible.
What is Apache Kafka?
Apache Kafka is a distributed, fault-tolerant, append-only log system designed for high-throughput, low-latency event streaming. Originally developed at LinkedIn, it is now an Apache Software Foundation project used widely as the backbone of event-driven architectures.
Unlike traditional message queues where messages are deleted upon consumption, Kafka retains messages for a configurable period regardless of whether they have been consumed. This enables multiple consumers to independently replay or process the same stream of events.
Architectural Components
Component Role in the System Producer The application that publishes messages to a Kafka topic Topic A logical stream of related events, identified by name (e.g., order-events) Partition A topic is subdivided into one or more partitions — each is an independent, ordered, append-only log Broker A Kafka server responsible for storing partitions and serving producers and consumers Consumer The application that reads and processes messages from a topic Consumer Group A set of consumer instances that collectively read a topic, with each partition assigned to exactly one member Message Key An optional byte array attached to a message; used to deterministically assign messages to a partition Offset An immutable, monotonically increasing integer that uniquely identifies each message within a partition Replication Factor The number of brokers that store a copy of each partition for fault tolerance
| Component | Role in the System |
|---|---|
| Producer | The application that publishes messages to a Kafka topic |
| Topic | A logical stream of related events, identified by name (e.g., order-events) |
| Partition | A topic is subdivided into one or more partitions — each is an independent, ordered, append-only log |
| Broker | A Kafka server responsible for storing partitions and serving producers and consumers |
| Consumer | The application that reads and processes messages from a topic |
| Consumer Group | A set of consumer instances that collectively read a topic, with each partition assigned to exactly one member |
| Message Key | An optional byte array attached to a message; used to deterministically assign messages to a partition |
| Offset | An immutable, monotonically increasing integer that uniquely identifies each message within a partition |
| Replication Factor | The number of brokers that store a copy of each partition for fault tolerance |
What is a Partition, Really?
A partition is the fundamental unit of parallelism and ordering in Kafka. Internally, each partition is stored as a sequence of log segment files on disk. Every message appended to a partition receives the next available offset — this process is atomic and sequential.
Partition 0 (on disk — append-only log):
[Offset 0] ORDER_PLACED for user-42 ← written first
[Offset 1] PAYMENT_CONFIRMED for user-42 ← written second
[Offset 2] ORDER_SHIPPED for user-42 ← written third
Because appends are sequential and offsets are immutable, a consumer that reads Partition 0 from Offset 0 will always receive messages in exactly the order they were written. This is Kafka’s core ordering guarantee — and it is absolute within a single partition.
Why Multiple Partitions Break Global Ordering
A topic with multiple partitions distributes its messages across those partitions, potentially across different brokers on different machines. Each partition operates independently — there is no global clock or coordination mechanism between them.
Topic: order-events (3 partitions across 3 brokers)
Partition 0 (Broker A): [Offset 0] ORDER_PLACED for user-42
Partition 1 (Broker B): [Offset 0] PAYMENT_CONFIRMED for user-42
Partition 2 (Broker C): [Offset 0] ORDER_SHIPPED for user-42
When a consumer reads all three partitions, it receives one message from each partition in a non-deterministic interleaving. Broker B may respond faster than Broker A due to lower disk I/O or network latency. The consumer has no way to reconstruct the original sequence without additional coordination.
The foundational rule: Kafka guarantees ordering within a partition. It makes no guarantees about ordering across partitions.
3. How Kafka Enforces Ordering — The Partition Key Mechanism
The Role of the Message Key
The message key is the primary mechanism for enforcing ordering across a distributed topic. When a producer sends a message with a key, Kafka applies the default Murmur2 hash function to determine the target partition:
Target Partition = abs(murmur2(keyBytes)) % numberOfPartitions
Because this is a pure, deterministic function, the same key will always hash to the same partition — regardless of which producer instance sends it, which broker is the leader, or how many consumer instances are running.
Hash("user-42") % 3 = Partition 0 ← always, deterministically
Hash("user-99") % 3 = Partition 2 ← always, deterministically
This means all events for user-42 — across their entire lifecycle — are written sequentially to Partition 0 and consumed in that exact order.
What Happens Internally When a Key is Present
Producer sends three events for user-42:
Step 1: Serialise message → compute murmur2("user-42") % 3 = 0
Step 2: Route to Partition 0 leader on Broker A
Step 3: Broker A appends to log at Offset 0, 1, 2 — sequentially
Step 4: Replication occurs to follower brokers
Step 5: Producer receives acknowledgment (based on acks config)
Step 6: Consumer reads Partition 0, receives Offset 0 → 1 → 2 in order
Producer sends three events for user-42:
Step 1: Serialise message → compute murmur2("user-42") % 3 = 0
Step 2: Route to Partition 0 leader on Broker A
Step 3: Broker A appends to log at Offset 0, 1, 2 — sequentially
Step 4: Replication occurs to follower brokers
Step 5: Producer receives acknowledgment (based on acks config)
Step 6: Consumer reads Partition 0, receives Offset 0 → 1 → 2 in order
The ordering contract is maintained at every step through the combination of deterministic routing and append-only writes.
What Happens Internally Without a Key
When no key is provided, Kafka’s default partitioner distributes messages using a round-robin or sticky partitioning strategy (sticky partitioning was introduced in Kafka 2.4 to improve batching efficiency). In both strategies, consecutive messages for the same logical entity can land in different partitions:
Message 1 (ORDER_PLACED) → Partition 0 (Broker A)
Message 2 (PAYMENT_CONFIRMED) → Partition 1 (Broker B)
Message 3 (ORDER_SHIPPED) → Partition 2 (Broker C)
Since consumers read partitions independently and Broker B may respond faster, the consumer receives PAYMENT_CONFIRMED before ORDER_PLACED. The causal sequence is broken with no error, no exception, and no warning.
4. The Spring Boot Implementation
Project Domain
We will build an order lifecycle event system. Two Spring Boot services communicate through a Kafka topic:
OrderProducerService— publishesOrderEventmessages representing state transitions in an order’s lifecycleOrderConsumerService— reads and processes those events downstream
Domain Model
userId is the field we will use as the Kafka partition key — ensuring all events for the same user are routed to the same partition and processed in sequence.
userId is the field we will use as the Kafka partition key — ensuring all events for the same user are routed to the same partition and processed in sequence.
Maven Dependencies (pom.xml)
Application Configuration (application.yml)
Every configuration property below has a direct bearing on ordering, reliability, or both. Each is explained inline.
Topic Provisioning
Programmatic Producer Configuration
The application.yml approach is sufficient for most applications. For teams that prefer to centralise all Kafka configuration in Java (e.g., for easier testing or environment-specific overrides), the equivalent programmatic configuration is:
5. Scenario A — Producing Without a Key (Ordering Violated)
This scenario demonstrates the consequence of omitting the partition key. It is a common oversight in initial implementations and a frequent source of subtle, environment-dependent bugs in production.
Observed behaviour in logs:
# Producer publishes three events for user-42, sequentially:
[NO KEY] Partition: 0 | Offset: 4 | Status: ORDER_PLACED
[NO KEY] Partition: 1 | Offset: 2 | Status: PAYMENT_CONFIRMED
[NO KEY] Partition: 2 | Offset: 6 | Status: ORDER_SHIPPED
# Broker B (Partition 1) responds with lower latency.
# Broker C (Partition 2) responds next.
# Broker A (Partition 0) responds last.
# Consumer receives:
[CONSUMER] Partition 2 | ORDER_SHIPPED ← processed first ❌
[CONSUMER] Partition 1 | PAYMENT_CONFIRMED ← processed second ❌
[CONSUMER] Partition 0 | ORDER_PLACED ← processed last ❌
The order in which the consumer processes these events is governed entirely by broker-level timing — not by the application’s intent. This failure mode is silent: no exception is thrown, no alert fires, and the system appears to function correctly until a downstream service acts on the out-of-order data.
6. Scenario B — Producing With a Key (Ordering Enforced)
REST Controller to Simulate Concurrent Order Pipelines
Observed behaviour in logs:
# Kafka's deterministic routing:
# murmur2("user-42") % 3 = Partition 0
# murmur2("user-99") % 3 = Partition 2
[SENT] key=user-42 | Partition: 0 | Offset: 0 | Status: ORDER_PLACED
[SENT] key=user-99 | Partition: 2 | Offset: 0 | Status: ORDER_PLACED
[SENT] key=user-42 | Partition: 0 | Offset: 1 | Status: PAYMENT_CONFIRMED
[SENT] key=user-99 | Partition: 2 | Offset: 1 | Status: PAYMENT_CONFIRMED
[SENT] key=user-42 | Partition: 0 | Offset: 2 | Status: ORDER_SHIPPED
[SENT] key=user-99 | Partition: 2 | Offset: 2 | Status: ORDER_SHIPPED
# Consumer Thread 1 reads Partition 0 (user-42):
[CONSUMER P-0] Offset 0 → ORDER_PLACED ✅
[CONSUMER P-0] Offset 1 → PAYMENT_CONFIRMED ✅
[CONSUMER P-0] Offset 2 → ORDER_SHIPPED ✅
# Consumer Thread 3 reads Partition 2 (user-99):
[CONSUMER P-2] Offset 0 → ORDER_PLACED ✅
[CONSUMER P-2] Offset 1 → PAYMENT_CONFIRMED ✅
[CONSUMER P-2] Offset 2 → ORDER_SHIPPED ✅
Both users are processed in parallel, with strict per-user event ordering maintained throughout. ✅
7. Consumer Service — Reliable, Ordered Message Processing
8. Idempotent Producer — Ordering Guarantees Under Failure Conditions
The Problem: Retries and Reordering
A producer retry is not inherently safe. Consider the following failure scenario without idempotence:
Step 1: Producer sends Batch A [PAYMENT_CONFIRMED] to broker.
Step 2: Broker writes Batch A. Network drops before acknowledgment.
Step 3: Producer does not receive ack. Timeout fires → producer retries Batch A.
Step 4: Meanwhile, producer sends Batch B [ORDER_SHIPPED].
Step 5: Batch B is acknowledged first.
Step 6: Retry of Batch A arrives and is written after Batch B.
Result: Partition log = [ORDER_SHIPPED (Offset 1), PAYMENT_CONFIRMED (Offset 2)]
Consumer reads: ORDER_SHIPPED before PAYMENT_CONFIRMED ❌
The Solution: Producer ID and Sequence Numbers
When enable.idempotence = true, the Kafka broker assigns each producer session a unique Producer ID (PID). The producer attaches a monotonically increasing sequence number to every message it sends, scoped to a specific (PID, Partition) pair.
Producer PID = 101
Message 1: PID=101, Partition=0, SeqNo=0, Status=ORDER_PLACED
Message 2: PID=101, Partition=0, SeqNo=1, Status=PAYMENT_CONFIRMED
Message 3: PID=101, Partition=0, SeqNo=2, Status=ORDER_SHIPPED
The broker maintains the last accepted sequence number per (PID, Partition). Upon receiving a message:
- If
SeqNo == lastAccepted + 1→ accept and write. - If
SeqNo <= lastAccepted→ duplicate detected, silently discard. - If
SeqNo > lastAccepted + 1→ sequence gap detected, return an error (OutOfOrderSequenceException).
Retry scenario with idempotence:
Producer sends PAYMENT_CONFIRMED with PID=101, SeqNo=1.
Network drops. Producer retries: PAYMENT_CONFIRMED, PID=101, SeqNo=1.
Broker: "SeqNo 1 from PID 101 already written." → Discard. ✅
Ordering preserved. No duplicate in the log. ✅
This is already configured in our application.yml and KafkaProducerConfig.java. No application code changes are required.
9. Custom Partitioner — Domain-Driven Partition Assignment
The default key-hash strategy distributes messages uniformly across partitions. In some production scenarios, business requirements demand more targeted routing — for example, isolating high-value transactions to a dedicated partition for priority processing or SLA compliance.
Register the custom partitioner in application.yml:
10. Common Ordering Failures — Root Cause Analysis
Understanding why ordering breaks is as important as knowing how to enforce it. The following section covers the most frequently encountered failure patterns in production Kafka systems, along with their root causes and recommended remediation.
10.1 Absent Partition Key — Silent Round-Robin Distribution
Root cause: When no key is provided, Kafka’s default partitioner distributes messages across all available partitions using a round-robin or sticky strategy. Logically related messages for the same entity land in different partitions, and the consumer’s read order becomes a function of broker-level latency rather than the intended event sequence.
10.2 Partition Count Modification on a Live Topic
Root cause: Kafka’s partition assignment formula is murmur2(key) % numPartitions. If numPartitions changes, the same key evaluates to a different partition. Messages written before the change reside in the old partition; messages written after reside in a new one. Consumers reading both partitions receive an interleaved, out-of-order view of the event history for that key.
Recommendation: Determine the partition count during initial system design based on projected throughput and consumer parallelism. Over-provisioning (e.g., starting at 12 partitions when 3 are sufficient today) is far less disruptive than modifying partition count after deployment.
10.3 Consumer Concurrency Exceeding Partition Count
Root cause: Each partition can be assigned to at most one consumer within a consumer group. If concurrency exceeds the number of partitions, the surplus consumer threads receive no partition assignment and remain idle for the lifetime of the group. The active threads may experience uneven load, and the idle threads represent wasted resources.
10.4 Offset Auto-Commit Causing Message Loss Under Failure
Root cause: When enable-auto-commit: true, Kafka commits the consumer offset on a periodic schedule (default: every 5 seconds), regardless of whether the application has successfully processed the fetched messages. If a processing failure occurs after an auto-commit but before the next poll, those messages are considered consumed and will not be redelivered. This breaks both reliability and, indirectly, ordering — because downstream state derived from later messages may be processed without the context of the lost ones.
10.5 In-Flight Request Reordering Without Idempotence
Root cause: With max.in.flight.requests.per.connection > 1 and idempotence disabled, a producer may have multiple batches in transit simultaneously. If Batch 1 fails and is retried while Batch 2 has already been acknowledged, Batch 1’s retry is appended after Batch 2 in the partition log — reversing the intended write order.
11. Consumer Group Mechanics and Ordered Parallel Processing
How Kafka Achieves Both Ordering and Parallelism
These two properties appear contradictory: parallelism suggests multiple workers processing simultaneously; ordering requires sequential execution. Kafka resolves this through partition-level isolation.
Each partition within a consumer group is assigned to exactly one consumer instance. That consumer reads the partition’s messages in strict offset order. Multiple consumers — each owning a different partition — execute in parallel without any shared state or coordination overhead.
Topic: order-events — 3 partitions
Consumer Group: order-processing-group
├─ Consumer Thread 1 ──► Partition 0 (users whose key hashes to 0: user-42, user-15 ...)
├─ Consumer Thread 2 ──► Partition 1 (users whose key hashes to 1: user-88, user-03 ...)
└─ Consumer Thread 3 ──► Partition 2 (users whose key hashes to 2: user-99, user-21 ...)
Thread 1 processes user-42’s events in order. Thread 3 processes user-99’s events in order. Both run concurrently — there is no contention between them. This is the model that enables Kafka to scale horizontally while preserving per-entity event ordering.
Partition Rebalancing on Consumer Failure
If a consumer instance fails or is restarted, Kafka triggers a rebalance. The group coordinator reassigns the orphaned partition to a surviving consumer. That consumer resumes from the last committed offset — the point at which the previous consumer last called acknowledgment.acknowledge(). No messages are lost or skipped, and the ordering guarantee is maintained across the transition.
This is precisely why manual offset management (MANUAL_IMMEDIATE ack-mode) is critical in ordered-processing scenarios. An auto-committed offset may represent a point beyond what was actually processed, causing gaps in the event history post-rebalance.
12. Ordering Guarantee Reference
Scenario Ordering Guaranteed? Notes Single partition, any configuration ✅ Global Throughput limited to one broker’s write capacity Multiple partitions + stable entity key ✅ Per entity Standard production pattern Multiple partitions + no key ❌ No guarantee Default partitioner distributes non-deterministically Idempotent producer + acks=all ✅ Preserved on retry Required for production reliability concurrency equals partition count✅ Optimal One thread per partition; no idle resources Partition count altered post-deployment ❌ Broken Key-to-partition mapping changes; ordering violated enable-auto-commit: true⚠️ Loss risk Processing failure may silently advance offset max.in.flight > 1 without idempotence❌ Retry reordering Unsafe; set to 1 or enable idempotence
| Scenario | Ordering Guaranteed? | Notes |
|---|---|---|
| Single partition, any configuration | ✅ Global | Throughput limited to one broker’s write capacity |
| Multiple partitions + stable entity key | ✅ Per entity | Standard production pattern |
| Multiple partitions + no key | ❌ No guarantee | Default partitioner distributes non-deterministically |
Idempotent producer + acks=all | ✅ Preserved on retry | Required for production reliability |
concurrency equals partition count | ✅ Optimal | One thread per partition; no idle resources |
| Partition count altered post-deployment | ❌ Broken | Key-to-partition mapping changes; ordering violated |
enable-auto-commit: true | ⚠️ Loss risk | Processing failure may silently advance offset |
max.in.flight > 1 without idempotence | ❌ Retry reordering | Unsafe; set to 1 or enable idempotence |
13. Configuration Reference
Property Recommended Value Impact on Ordering enable.idempotencetrueEliminates duplicate writes and retry-induced reordering acksallEnsures all replicas confirm before success; prevents loss on failover max.in.flight.requests.per.connection5 (with idempotence) / 1 (without)Controls retry-reordering risk retriesInteger.MAX_VALUEPrevents silent message drop on transient failure retry.backoff.ms100Avoids aggressive retry storms delivery.timeout.ms120000Maximum time before a send is considered failed enable-auto-commitfalsePrevents offset advancement before processing is complete ack-modeMANUAL_IMMEDIATEOffset committed only after explicit acknowledgment concurrencyEqual to partition count One thread per partition; no idle threads max.poll.records50Limits in-flight records per consumer thread
| Property | Recommended Value | Impact on Ordering |
|---|---|---|
enable.idempotence | true | Eliminates duplicate writes and retry-induced reordering |
acks | all | Ensures all replicas confirm before success; prevents loss on failover |
max.in.flight.requests.per.connection | 5 (with idempotence) / 1 (without) | Controls retry-reordering risk |
retries | Integer.MAX_VALUE | Prevents silent message drop on transient failure |
retry.backoff.ms | 100 | Avoids aggressive retry storms |
delivery.timeout.ms | 120000 | Maximum time before a send is considered failed |
enable-auto-commit | false | Prevents offset advancement before processing is complete |
ack-mode | MANUAL_IMMEDIATE | Offset committed only after explicit acknowledgment |
concurrency | Equal to partition count | One thread per partition; no idle threads |
max.poll.records | 50 | Limits in-flight records per consumer thread |
14. Questions and Answers
Q: Does Kafka guarantee message ordering?
Kafka guarantees message ordering within a partition. Messages assigned to the same partition are written and consumed in strict offset order. Across multiple partitions, there is no global ordering guarantee — consumers read partitions independently and the relative order is determined by broker-level timing.
Q: How would you implement ordered processing in a Spring Boot Kafka application?
Assign a stable entity identifier (such as
userIdororderId) as the partition key inkafkaTemplate.send(topic, key, value). Configure the producer withenable.idempotence=trueandacks=all. Setconcurrencyin the@KafkaListenerequal to the number of partitions, and useMANUAL_IMMEDIATEack-mode to ensure offsets are committed only after successful processing.
Q: What is the consequence of not providing a partition key?
Without a key, Kafka uses round-robin or sticky partitioning, distributing consecutive messages for the same entity across different partitions. Since partitions are consumed independently, the consumer receives those messages in an order determined by broker latency — not by the producer’s intent. This failure is silent: no exception is raised and no alert fires.
Q: What is an idempotent producer and how does it relate to ordering?
An idempotent producer is one configured with
enable.idempotence=true. Kafka assigns the producer a unique Producer ID (PID) and tracks a sequence number per(PID, partition)pair. If a retry delivers a message with a sequence number the broker has already accepted, the duplicate is discarded without being written. This eliminates both duplicate records and the reordering that can result from retry races.
Q: Why is modifying the partition count of a live topic dangerous?
Partition assignment uses
murmur2(key) % numPartitions. ChangingnumPartitionschanges the target partition for every existing key. Historical messages for a key remain in the old partition while new messages go to a different one. Consumers reading both partitions receive an interleaved, causally inconsistent view of the event history.
Q: What is the relationship between concurrency and partition count?
concurrencyin a@KafkaListenercreates that many consumer threads. Kafka assigns each thread to at most one partition. Ifconcurrency < numPartitions, some partitions are read by the same thread sequentially — reducing parallelism. Ifconcurrency > numPartitions, surplus threads receive no assignment and remain idle. The optimal setting isconcurrency = numPartitions.
Q: Why should enable-auto-commit be set to false in ordered-processing systems?
Auto-commit advances the offset on a periodic schedule, regardless of processing outcome. If a processing failure occurs after an offset is auto-committed, the message will not be redelivered — it is permanently lost from the consumer’s perspective. Manual acknowledgment decouples offset advancement from the polling cycle and ties it directly to verified processing success, ensuring that no message is silently skipped and that ordered processing can resume correctly after a failure.
15. Conclusion
Apache Kafka does not make ordering easy by accident — it makes it possible by design. The append-only partition log, deterministic key-based routing, and exclusive partition-to-consumer assignment are all deliberate architectural choices that, when used correctly, deliver strong per-entity ordering guarantees at scale.
In a Spring Boot application, enforcing those guarantees requires five intentional decisions:
- Select a meaningful partition key — use a stable entity identifier (
userId,orderId) that represents the ordering boundary for your business domain. - Enable idempotent delivery — configure
enable.idempotence=trueandacks=allto protect ordering and eliminate duplicates under failure conditions. - Align concurrency with partition count — set
concurrencyequal to the number of partitions so each partition is served by a dedicated thread. - Use manual offset management — configure
enable-auto-commit=falseandack-mode=MANUAL_IMMEDIATEso offsets are only advanced after verified processing completion. - Treat partition count as immutable post-deployment — plan partition capacity upfront and avoid alterations on live topics to preserve key-to-partition routing consistency.
Applied together, these decisions ensure that Kafka delivers on its ordering guarantees reliably, predictably, and at production scale.
Post a Comment