Spark operator: RDD action Action action (1) -first, count, reduce, collect
First
Def first (): TFirst returns the first element in the RDD, not sorted.
- ("A", "1"), ("B", "2"), ("C", "3")), 2)
- Rdd1: org.apache.spark.rdd.RDD [(String, String)] = ParallelCollectionRDD [33] at makeRDD at: 21
- Scala> rdd1.first
- Res14: (String, String) = (A, 1)
- Scala> var rdd1 = sc.makeRDD (Seq (10, 4, 2, 12, 3))
- Rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [0] at makeRDD at: 21
- Scala> rdd1.first
- Res8: Int = 10
Count
Def count (): LongCount Returns the number of elements in the RDD.
- ("A", "1"), ("B", "2"), ("C", "3")), 2)
- Rdd1: org.apache.spark.rdd.RDD [(String, String)] = ParallelCollectionRDD [34] at makeRDD at: 21
- Scala> rdd1.count
- Res15: Long = 3
Reduce
Def reduce (f: (T, T) ⇒ T): TAccording to the mapping function f, the elements in the RDD are binary calculated and the results are returned.
- Scala> var rdd1 = sc.makeRDD (1 to 10,2)
- Rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [36] at makeRDD at: 21
- Scala> rdd1.reduce (_ + _)
- Res18: Int = 55
- ("A", 0), ("A", 2), ("B", 1), ("B", 2), ("C", 1) ))
- Rdd2: org.apache.spark.rdd.RDD [(String, Int)] = ParallelCollectionRDD [38] at makeRDD at: 21
- Scala> rdd2.reduce ((x, y) => {
- | (X._1 + y._1, x._2 + y._2)
- |})
- Res21: (String, Int) = (CBBAA, 6)
Collect
Def collect (): Array [T]Collect is used to convert an RDD to an array.
For more information on the Spark operator, refer to the Spark operator series .
- Scala> var rdd1 = sc.makeRDD (1 to 10,2)
- Rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [36] at makeRDD at: 21
- Scala> rdd1.collect
- Res23: Array [Int] = Array (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Commentaires
Enregistrer un commentaire