Data is ingested from Producers and is stored for 24 hours (default) up to 7 and even 365 days (additional charges apply) so that it can be later used for processing by other applications.
A Kinesis Data Stream is a set of Shards, each shard contains a sequence of Data Records.
Data Records are composed of a Sequence Number, a Partition Key, and a Data Blob, and they are stored as an immutable sequence of bytes.
Kinesis Datastream is real time ~200ms.
KCL (Kinesis Client Library ) helps you consume and process data from a stream by enumerating shards and instantiating a record processor.
There are two ways to get data from a stream using KCL: Classic mechanism happens by Poll while Enhance Fan Out by Push - consumers can subscribe to a shard and data will then be pushed automatically.
Each shard is processed by exactly 1 KCL worker.
One shard provides a capacity of 1MB/sec data input and 2MB/sec data output and allows 1000 PUT records per second.
Ordering is maintained by shard
Destinations are:
Redshift
DynamoDB
EMR
S3
Firehose
Kinesis Data Stream does NOT have AutoScaling capabilities,
To scale you need to manually add shards ( and pay for them )