Explain what are accumulators in spark
#Explain what are accumulars in spark
In Apache Spark, an accumulator is a shared variable that is used for aggregating(sum ,avg , max , min ) information across the tasks running in parallel on a cluster.
It allows for the efficient and fault-tolerant accumulation of results from worker nodes back to the driver program.
scala> sc.parallelize(Array(1, 2, 3)).foreach(x => accum.add(x))
-----
-----
scala> accum.value
res2: Long = 6
or
val spark = SparkSession.builder()
.appName(" accumulators in spark")
.master("local")
.getOrCreate()
val longAcc = spark.sparkContext.longAccumulator("SumAccumulator")
val rdd = spark.sparkContext.parallelize(Array(1, 2, 3,4))
rdd.foreach(x => longAcc.add(x))
println(longAcc.value)