xxxxxxxxxx
It can perform in several ways, such as:
>> In order to transmit data between two systems, we can build a real-time stream of data pipelines with it.
>> Also, we can build a real-time streaming platform with Kafka, that can actually react to the data.
xxxxxxxxxx
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics
Key Features:-
High Throughput Support for millions of messages with modest hardware
Scalability Highly scalable distributed systems with no downtime
Replication Messages are replicated across the cluster to provide support for multiple subscribers and balances the consumers in case of failures
Durability Provides support for persistence of message to disk
Stream Processing Used with real-time streaming applications like Apache Spark & Storm
Data Loss Kafka with proper configurations can ensure zero data loss
xxxxxxxxxx
How does Kafka work?
Kafka combines two messaging models, queuing and publish-subscribe, to provide the key benefits of each to consumers.
Queuing allows for data processing to be distributed across many consumer instances, making it highly scalable.
However, traditional queues aren't multi-subscriber.
Kafka is a distributed system used for creating real-time streaming data pipelines and applications.
xxxxxxxxxx
@Configuration
@EnableKafka
public class Config {
@Bean
ConcurrentKafkaListenerContainerFactory<Integer, String>
kafkaListenerContainerFactory(ConsumerFactory<Integer, String> consumerFactory) {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
return factory;
}
@Bean
public ConsumerFactory<Integer, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerProps());
}
private Map<String, Object> consumerProps() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, IntegerDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
// ...
return props;
}
@Bean
public Sender sender(KafkaTemplate<Integer, String> template) {
return new Sender(template);
}
@Bean
public Listener listener() {
return new Listener();
}
@Bean
public ProducerFactory<Integer, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(senderProps());
}
private Map<String, Object> senderProps() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.LINGER_MS_CONFIG, 10);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
//...
return props;
}
@Bean
public KafkaTemplate<Integer, String> kafkaTemplate(ProducerFactory<Integer, String> producerFactory) {
return new KafkaTemplate<Integer, String>(producerFactory);
}
}
Immutable, append-only logs
Fast, resilient, scalable, high throughput
Relies on file system for storing and caching messages
Resilient and fault-tolerant by replication
Disk caching, memory mapped files instead of GC eligible memory
Scale by partitions
Ordered inside partition
As an event store: A great match for event-driven microservices
xxxxxxxxxx
kafka-topics — create — zookeeper zookeeper:2181 — topic applog — partitions 5 — replication-factor 3