3Vs of Big Data
Volume: Terabytes to Petabytes to Exabytes
Variety: Structured, Semi structured, Unstructured
Velocity: Batch, Streaming ..
Terminology: Data warehouse vs Data lake
Data warehouse: PBs of Storage + Compute (Typically)
Data stored in a format ready for specific analysis! (processed data)
Examples: Teradata, BigQuery(GCP), Redshift(AWS), Azure Synapse Analytics
Typically uses specialized hardware
Data lake: Typically retains all raw data (compressed)
Typically object storage is used as data lake
Amazon S3, Google Cloud Storage, Azure Data Lake Storage Gen2 etc..
Flexibility while saving cost
Perform ad-hoc analysis on demand
Analytics & intelligence services (even data warehouses) can directly read from data lake
Azure Synapse Analytics, BigQuery(GCP), Redshift Spectrum(AWS), Amazon Athena etc..