Do You Really Need Kafka? When to Use and When Not To

2026年4月17日 · 阅读需 12 分钟

Kristen

Most teams don't actually need Kafka — they just assume they do.

If you're here, you're probably designing a data pipeline and asking: "Do I really need Kafka (an event streaming platform / message bus), or is there a simpler architecture?"

In the next 5 minutes, you'll get:

A decision checklist you can reuse for any pipeline
3 real-world scenarios (Kafka vs no Kafka)
A step-by-step plan to ship real-time data without operating Kafka

TL;DR: When to use Kafka (and when to skip it)

Use Kafka if you need multiple independent consumers, replay/backfill, or event-driven workflows at scale (classic pub/sub + durable log).
Skip Kafka if you're mostly doing data movement from A → B (one destination, modest scale, low replay needs).
If you're unsure, use the checklist in the next section and decide in under 5 minutes.

5-Minute Kafka Decision Checklist

Answer these questions and count how many times you say YES:

Do you have 3+ independent consumers (analytics + search + downstream services)?
Do you need replay/backfill from a retained log (hours/days) for recovery or new consumers?
Is throughput likely to reach thousands+ events/sec, or do you have bursty traffic that needs buffering?
Do you need decoupled deployment (producers shouldn't change when consumers change)?
Do you need fan-out routing where each consumer can fail/retry independently?
Are you willing to operate Kafka (or pay for managed Kafka) and monitor it properly?
Do you need ordering guarantees per key and a clear at-least-once vs exactly-once plan?
Do you need a schema evolution strategy (Schema Registry or strict compatibility rules)?

How to decide (rule of thumb):

Kafka decision checklist flow: count YES answers to decide use Kafka or skip it

0–2 YES → Kafka is usually overkill. Prefer direct CDC or a queue.
3–4 YES → It depends. Kafka can be a fit if you expect more consumers or need replay.
5–6 YES → Kafka is likely the right backbone (or a Kafka-like streaming system).
7–8 YES → Kafka is very likely the right backbone — budget for doing it properly (operations + schema + consumer correctness).

3 Real-World Scenarios (Kafka vs No Kafka)

Scenario 1: "Just replicate data into a warehouse"

Example: SQL Server → (CDC) → ClickHouse/Iceberg/StarRocks

Goal: analytics and reporting
Consumers: usually 1 (the warehouse)
Recommendation: Skip Kafka and use a direct CDC pipeline
Why: Kafka adds an extra hop (topics, partitions, retention, ops) without giving you much fan-out value

Scenario 2: "One event must feed many systems"

Example: Orders → analytics + search + fraud + notifications

Goal: event-driven workflows + multiple independent consumers
Recommendation: Use Kafka
Why: fan-out + buffering + replay are first-class, and each consumer can fail/retry independently

Scenario 3: "Bursty traffic and consumers sometimes fall behind"

If you only need simple buffering for one consumer, a queue is often enough.
If you need multiple consumers + replay/backfill, Kafka is the safer long-term choice.

Still unsure? Jump to "Kafka vs No Kafka Architecture" and compare the trade-offs.

What Kafka Actually Adds to Your Pipeline

The checklist above tells you whether Kafka fits. Now let's map Kafka's concrete capabilities to real requirements — so you can justify (or reject) Kafka with clarity.

Decoupling Producers and Consumers

Without Kafka, data pipelines are typically tightly coupled:

Without Kafka: a database pushes data directly to multiple downstream services

This may look simple at first, but it quickly becomes problematic:

Every new downstream system requires changes upstream
Your upstream database has to bear the access pressure from multiple consumers
Failures in downstream systems can impact upstream stability

With Kafka in place:

With Kafka: producers publish once to Kafka and multiple consumers subscribe independently

Kafka becomes a middle layer that cleanly separates producers from consumers.

In real engineering terms, this means:

Upstream systems only write data once
Downstream systems can be developed, deployed, and scaled independently
Adding a new consumer doesn't require changing existing pipelines

This decoupling capability is one of Kafka's most fundamental advantages.

Buffering and Backpressure Handling

In real-world systems, downstream instability is normal:

Data warehouses may slow down during heavy writes
Search systems may rebuild indexes
Some services may temporarily go offline

In a direct pipeline, this often leads to:

Failed data writes → data loss
Upstream gets blocked → entire pipeline stalls

Kafka provides a very critical capability: Buffering

Data is written to Kafka first, then consumed by downstream at its own pace:

Downstream is slow → Kafka stores the data temporarily
Downstream recovers → consumption resumes from the original position

This effectively solves: The mismatch in processing speeds between systems (backpressure)

Fan-out to Multiple Systems

In most real applications, the same data often serves multiple purposes. For example, a single order event might be:

Written to a real-time analytics database (e.g., ClickHouse)
Synchronized to a search system
Sent to a risk control system
Used for real-time monitoring

Without Kafka: All this routing logic has to be handled inside your CDC tool or application

Which leads to:

Increasing complexity
Tight coupling between systems
Difficult debugging and failure isolation

With Kafka: One stream → multiple independent consumers

Each downstream system processes data independently, without affecting others.

Replay and Fault Recovery

This is a capability many teams underestimate — until they need it.

Kafka retains data for a configurable period (e.g., days or weeks), which means: You can replay historical data

This is extremely useful when:

A downstream bug corrupts data
You need to recompute metrics
A new system needs historical backfill

Without Kafka: Your options are limited:

Re-run full data loads (expensive)
Or accept data loss

With Kafka: You simply re-consume from a specific offset

When You Definitely Need Kafka

The previous section covered capabilities. Now let's be practical. This section covers scenarios where Kafka is clearly the right choice.

when you need kafka

1. You Have Multiple Downstream Systems

When your data needs to be consumed by multiple systems:

MySQL → Kafka → ClickHouse / Elasticsearch / Analytics

Kafka becomes highly valuable because:

Each system consumes data independently
No need to modify upstream pipelines
New consumers can be added easily

As a rule of thumb: if you have 3 or more downstream systems, Kafka is worth serious consideration.

2. You're Building an Event-Driven Architecture

If your system relies on events, for example:

User places an order → order triggers inventory, payment, notification
User behavior → actions trigger real-time recommendations

Kafka can act as: An event backbone

Services communicate through events instead of direct calls, improving scalability and flexibility.

Practical tip: if events come from a database write, consider the transactional outbox pattern (aka "outbox pattern") so producers publish reliably without dual-write bugs.

3. High Throughput and Scale

When data volume reaches a certain scale:

Tens of thousands or even hundreds of thousands of messages per second
Distributed systems across multiple nodes

Kafka's partitioning model enables:

Horizontal scaling
Parallel processing

This is where Kafka truly shines compared to simpler tools.

If you also need stream processing (windowed aggregations, joins, real-time enrichment), Kafka is commonly paired with Flink/Spark Streaming/Kafka Streams.

4. You Need Replay or Audit Logs

If your system requires:

Data traceability (audit logs)
The ability to reprocess historical data

Kafka's log-based design becomes critical.

At this point, Kafka is not just a transport layer — it becomes a "data record layer".

When Kafka Is Overkill

This is where many teams make the wrong call.

when you do not need kafka

1. You Only Have One Destination

If your pipeline looks simple like this:

SQL Server → Data Warehouse

Then Kafka's value is actually very limited:

No need for multi-consumer fan-out
No complex routing
No replay requirement

In this case, Kafka is just an extra layer.

2. Your Data Volume Is Small

If you're handling dozens to a few hundred events per second. Then you don't need Kafka's distributed capabilities at all. Introducing Kafka would instead increase deployment complexity and operational costs

3. You're Doing Simple ETL or Reporting

For use cases like daily batch sync and reporting pipelines, real-time streaming is not critical. Kafka becomes unnecessary complexity.

4. Your Team Doesn't Want to Operate Kafka

This is a very practical consideration.

Kafka comes with real operational costs:

Cluster deployment and maintenance
Monitoring (often using Prometheus and Grafana)
Partition management
Failure recovery

Without prior experience, teams often underestimate this cost.

Kafka vs No Kafka Architecture

Here's the real trade-off:

Dimension	With Kafka	Without Kafka
Complexity	High	Low
Scalability	High	Moderate
Latency	Slightly higher	Lower
Cost	Higher	Lower
Flexibility	High	Limited

This is not about better vs worse. It's about fit for purpose.

Common Kafka Alternatives (and when they fit)

If you're thinking "Kafka feels heavy," you're not alone. Depending on your requirements, these alternatives can be a better fit:

Direct CDC / database replication (CDC tools like BladePipe): best for DB → warehouse or DB → search when you mainly need fresh data movement.
Traditional queues (RabbitMQ/SQS): best for task distribution and simple buffering (usually not for long replay/backfill).
Redis Streams: good for smaller-scale streaming and simple consumer groups (ops is simpler, replay windows are usually shorter).
Cloud event streaming (Kinesis / Pub/Sub): good when you want managed scaling and you're already on that cloud.
Pulsar / Redpanda: Kafka-like streaming alternatives if Kafka ops or licensing constraints are driving the decision.

If you skip Kafka, the next section helps you avoid the common failure modes (duplicates, backfill, schema changes, and observability gaps).

If You Skip Kafka: How to Ship This Reliably (Step-by-Step)

Skipping Kafka is totally valid — but you still need to replace the useful guarantees Kafka would have provided (buffering, replay, and consumer isolation). Use this plan to ship a reliable pipeline without Kafka.

Step-by-step plan

Define the outcome: destination(s), freshness SLA (seconds/minutes), and acceptable data loss (usually "none").
Choose the delivery shape: direct CDC/replication, a queue, or a managed streaming service (based on the checklist above).
Define the data contract: primary key, delete semantics (tombstone vs hard delete), and how schema changes are handled.
Guarantee correctness at the edge: make writes/consumption idempotent; assume retries and duplicates will happen.
Plan bootstrap + replay: initial snapshot/backfill, plus a repeatable way to re-run history when consumers break.
Add observability: lag, throughput, error rate, and a clear "stuck" alert.
Harden connectivity & permissions: least privileges, network allow-lists, and TLS.

Common failure modes (and what to do)

Duplicates show up → idempotent consumers (upserts by primary key), deterministic keys, and retry-safe processing.
Deletes don't propagate → make delete semantics explicit in the contract and test them end-to-end.
Schema changes break consumers → choose a strategy (ignore/fail/propagate) and enforce compatibility rules.
Backfill is impossible under pressure → keep a tested backfill path (and a retention window if you need replay).

If you skip Kafka, these guides may help you reduce effort and cost: Data Movement Guides List

If You Choose Kafka: Minimum "Do It Right" Checklist

Kafka pays off when you run it intentionally. This is the "minimum viable correctness + ops" bar.

Data model & contracts

Keys & ordering: choose message keys (usually primary key) and define ordering expectations (ordering is per-partition, not global).
Delivery semantics: assume at-least-once and make consumers idempotent; be explicit about what "exactly-once" would mean in your system.
Schema evolution: use Schema Registry or a strict compatibility policy (and test schema changes before rollout).

Topics & retention

Partitions: size for throughput and parallelism; document why the partition count is what it is.
Retention/compaction: pick based on replay/backfill needs; don't "set and forget".
Retries & poison messages: decide what you do with poison messages (retries vs DLQ), and how you recover.

Operations (the part teams underestimate)

Monitoring: consumer lag, broker health, throughput, error rates, and storage growth.
On-call & upgrades: define an upgrade plan and practice failure recovery (don't wait for the first incident).
Security: TLS/SASL, least privileges, and secret rotation.

If you choose Kafka, these three guides may help you "do it right":

Final Thoughts

Kafka is powerful — but it's not the default answer.

The real goal isn't to "use Kafka." The goal is to: Build the simplest system that solves your problem reliably

Remember:

Kafka = flexibility, scalability, complexity
No Kafka = simplicity, speed, lower cost

If you choose the right architecture upfront, you won't need to redesign your pipeline later.

And that's what actually matters.

FAQ

Do I need Kafka for CDC?

Not always. If you're primarily doing DB → warehouse or DB → search with one main destination, direct CDC is often enough. Kafka helps most when you need multiple consumers, replay/backfill, and independent retries.

Is Kafka a message queue?

Kafka can be used like a queue, but it's closer to a durable log + pub/sub system. That's why replay/backfill and fan-out are strong — and also why topic/partition design matters.

Does Kafka guarantee exactly-once delivery?

Not by default. Most pipelines are at-least-once, and you design consumers to be idempotent. Exactly-once requires careful end-to-end design and is easy to get wrong.

What's the fastest way to decide?

Use the 5-Minute Kafka Decision Checklist near the top. If you answer YES to replay, multi-consumer fan-out, and operational readiness, Kafka is usually a good fit.

TL;DR: When to use Kafka (and when to skip it)​

5-Minute Kafka Decision Checklist​

3 Real-World Scenarios (Kafka vs No Kafka)​

Scenario 1: "Just replicate data into a warehouse"​

Scenario 2: "One event must feed many systems"​

Scenario 3: "Bursty traffic and consumers sometimes fall behind"​

What Kafka Actually Adds to Your Pipeline​

Decoupling Producers and Consumers​

Buffering and Backpressure Handling​

Fan-out to Multiple Systems​

Replay and Fault Recovery​

When You Definitely Need Kafka​

1. You Have Multiple Downstream Systems​

2. You're Building an Event-Driven Architecture​

3. High Throughput and Scale​

4. You Need Replay or Audit Logs​

When Kafka Is Overkill​

1. You Only Have One Destination​

2. Your Data Volume Is Small​

3. You're Doing Simple ETL or Reporting​

4. Your Team Doesn't Want to Operate Kafka​

Kafka vs No Kafka Architecture​

Common Kafka Alternatives (and when they fit)​

If You Skip Kafka: How to Ship This Reliably (Step-by-Step)​

Step-by-step plan​

Common failure modes (and what to do)​

If You Choose Kafka: Minimum "Do It Right" Checklist​

Data model & contracts​

Topics & retention​

Operations (the part teams underestimate)​

Final Thoughts​

Remember:​

FAQ​

Do I need Kafka for CDC?​

Is Kafka a message queue?​

Does Kafka guarantee exactly-once delivery?​

What's the fastest way to decide?​