Breaking News: Grepper is joining You.com. Read the official announcement!

spark

Add Answer

Innocent Iguana answered on January 21, 2023 Popularity 4/10 Helpfulness 8/10

answer spark

related spark

spark

Comment

Tip Innocent Iguana 1 GREPCC

- General purpose data processing engine designed for big data.
- Written in scala
- Spark is a platform for cluster computing.
- Spark lets you spread data and computations over clusters with multiple nodes (each node as a separate computer). 
- Very large datasets are split into smaller datasets and  each node only works with a small amount of data.
- Data processing and computation are performed in parallel over the nodes in the cluster. 
- However, with greater computing power comes greater complexity.
- Can be used for Analytics, Data Integration, Machine learning, Stream Processing.
- Master and Worker:
    - Master: 
        - Connected to the rest of the computers in the cluster, which are called worker
        - sends the workers data and calculations to run
    - Worker: 
        - They send their results back to the master.
- Spark's core data structure is the Resilient Distributed Dataset (RDD)
- Instead of RDDs, it is easier to work with Spark DataFrame abstraction built on top of RDDs ( Operations using DataFrames are automatically optimized.)
- spark dataframes are immutable, you need to return a new instance after modification 
- You start working with `SparkSession` or `SparkContext` entrypoint
- 2 modes:
    - local mode : Single computer
    - cluster mode : cluster computers
    - You first build in local mode and deploy in cluster mode (no code change is required)
- Spark shell : 
    - interactive environment for spark jobs
    - allow interacting with data on disk or in memory

xxxxxxxxxx

- General purpose data processing engine designed for big data.

- Written in scala

- Spark is a platform for cluster computing.

- Spark lets you spread data and computations over clusters with multiple nodes (each node as a separate computer).

- Very large datasets are split into smaller datasets and  each node only works with a small amount of data.

- Data processing and computation are performed in parallel over the nodes in the cluster.

- However, with greater computing power comes greater complexity.

- Can be used for Analytics, Data Integration, Machine learning, Stream Processing.

- Master and Worker:

    - Master:

        - Connected to the rest of the computers in the cluster, which are called worker

        - sends the workers data and calculations to run

    - Worker:

        - They send their results back to the master.

- Spark's core data structure is the Resilient Distributed Dataset (RDD)

- Instead of RDDs, it is easier to work with Spark DataFrame abstraction built on top of RDDs ( Operations using DataFrames are automatically optimized.)

- spark dataframes are immutable, you need to return a new instance after modification

- You start working with `SparkSession` or `SparkContext` entrypoint

- 2 modes:

    - local mode : Single computer

    - cluster mode : cluster computers

    - You first build in local mode and deploy in cluster mode (no code change is required)

- Spark shell :

    - interactive environment for spark jobs

    - allow interacting with data on disk or in memory

Popularity 4/10 Helpfulness 8/10 Language whatever

Source: Grepper

Tags: whatever

Link to this answer
Share Copy Link

Contributed on Jan 21 2023

Innocent Iguana

0 Answers Avg Quality 2/10

Closely Related Answers

spark

Comment

Tip Yawning Yacare 1 GREPCC

SPARK

xxxxxxxxxx

SPARK

Popularity 3/10 Helpfulness 1/10 Language whatever

Source: www.njgroup.in

Tags: whatever

Link to this answer
Share Copy Link

Contributed on Apr 25 2023

Yawning Yacare

0 Answers Avg Quality 2/10

spark

Contents

More Related Answers

spark

Closely Related Answers

spark

Grepper

Documentation

Social

Legal

Contact

Oops, You will need to install Grepper and log-in to perform this action.