Service Scenario
Amazon
Redshift
Run complex queries (SQL) against data warehouse - housing structured and unstructured
data pulled in from a variety of sources
Amazon EMR Managed Hadoop.
Large scale data processing with high customization (machine learning, graph analytics)
Important tools in Hadoop ecosystem are natively supported (Pig, Hive, Spark or Presto)
Amazon S3 Can be used as a Data Lake
Lake Formation Makes it easy to set up a secure data lake
Amazon
Redshift
Spectrum
Run queries directly against S3 without worrying about loading entire data from S3 into a
data warehouse. Scale compute and storage independently.
Amazon Athena Quick ad-hoc queries without provisioning a compute cluster (serverless)
Amazon Redshift Spectrum is recommended if you are executing queries frequently against
structured data