Terms Apache Spark [Application / Job / Task]

May 5, 2022 | - views

Spark Cluster operate with drivers and executors.

Driver is entry point of the Spark Shell (Scala, Python or R). It is the place where SparkContext is created. Driver translates RDD into execution graph and split graph into stages. This program schedules tasks and controls their execution and stores metadata about all the RDDs and their partitions.

Executor stores the data in cache in JVM heap or on HDDs. This program reads data from external sources, performs all the data processing and writes to external sources.


Lets decompose application.


Related articles: