What is Kafka?
Apache Kafka is a distributed commit log for fast, fault-tolerant communication between producers and consumers using message based topics. Kafka provides the messaging backbone for building a new generation of distributed applications capable of handling billions of events and millions of transactions.
How it works
Kafka provides a powerful set of primitives for connecting your distributed application: messages, topics, partitions, producers, consumers, and log compaction.
Kafka is a message passing system, messages are events and can have keys.
A Kafka cluster is made up of brokers that run Kafka processes.
Topics are streams of messages of a particular category.
Partitions are append only, ordered logs of a topic’s messages. Messages have offsets denoting position in the partition. Kafka replicates partitions across the cluster for fault tolerance and message durability.
Producers are client processes that send messages to a broker on a topic and partition. Producers can use a partitioning function on keys to control message distribution.
Consumers read messages from topics' partitions on brokers, tracking the last offset read to coordinate and recover from failures. Consumers can be deployed in groups for scalability.
Log compaction keeps the most recent value for every key so clients can restore state.
“Like anything we implement on Heroku, the time it took to set up Apache Kafka on the platform was incredibly fast. It requires less management, and we have peace of mind knowing that once it’s set up correctly, the Heroku team will keep it running smoothly.”Read customer story →
“Apache Kafka on Heroku offers a single solution that powers both event notification between apps and event data flows for site analytics. We no longer have to manually configure apps or manage additional event streaming mechanisms. It saves us time and reduces complexity.”
“One of the biggest benefits of Apache Kafka on Heroku is the developer experience. We can use the same familiar tools and unified management experience for Kafka as we do for our Heroku apps and other add-ons, and we now have a system that more closely matches our team structure.”
Build data intensive apps
See it in action
See what Kafka on Heroku can do. Check out our recent demo.
Tutorials and other resources
- Kafka Stream Processing Demo
- Heroku Metrics: There and Back Again
- Powering the Heroku Platform API: A Distributed Systems Approach Using Streams and Apache Kafka
- Apache Kafka 0.10: Evaluating Performance in Distributed Systems
- Apache Kafka, Data Pipelines, and Functional Reactive Programming with Node.js
Apache Kafka can be used to stream billions of events per day — but do you know where to use it in your app architecture? Find out at our technical session. See a live demo and hear answers to questions from Heroku product experts.
Listen to our podcast with Software Engineering Daily from October 25th, 2016.
Apache Kafka is a durable, distributed message broker that’s a great choice for managing large volumes of inbound events, building data pipelines, and acting as the communication bus for microservices. In this Software Engineering Daily podcast, Heroku engineer, Tom Crayford, talks about building the Apache Kafka on Heroku service, challenges we faced, and why we focused on Kafka in the first place.