Spark Operator: RDD Action Action (2) -take, top, takeOrdered

Take

Def take (num: Int): Array [T]
Take is used to get RDD from 0 to num-1 subscript elements, not sort.
  1. Scala> var rdd1 = sc.makeRDD (Seq (10, 4, 2, 12, 3))
  2. Rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [40] at makeRDD at: 21
  3.  
  4. Scala> rdd1.take (1)
  5. Res0: Array [Int] = Array (10)
  6.  
  7. Scala> rdd1.take (2)
  8. Res1: Array [Int] = Array (10, 4)
  9.  

Top

Def top (num: Int) (implicit ord: Ordering [T]): Array [T]
The top function is used to return the preceding num elements from RDD, by default (descending) or by the specified collation.
  1. Scala> var rdd1 = sc.makeRDD (Seq (10, 4, 2, 12, 3))
  2. Rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [40] at makeRDD at: 21
  3.  
  4. Scala> rdd1.top (1)
  5. Res2: Array [Int] = Array (12)
  6.  
  7. Scala> rdd1.top (2)
  8. Res3: Array [Int] = Array (12, 10)
  9.  
  10. // specify collation scala> implicit val myOrd = implicitly [Ordering [Int]]. Reverse
  11. MyOrd: scala.math.Ordering [Int] = scala.math.Ordering$$anon$4@767499ef
  12.  
  13. Scala> rdd1.top (1)
  14. Res4: Array [Int] = Array (2)
  15.  
  16. Scala> rdd1.top (2)
  17. Res5: Array [Int] = Array (2, 3)
  18.  

TakeOrdered

Def takeOrdered (num: Int) (implicit ord: Ordering [T]): Array [T]
TakeOrdered is similar to top, except that the elements are returned in the reverse order of top.
  1. Scala> var rdd1 = sc.makeRDD (Seq (10, 4, 2, 12, 3))
  2. Rdd1: org.apache.spark.rdd.RDD [Int] = ParallelCollectionRDD [40] at makeRDD at: 21
  3.  
  4. Scala> rdd1.top (1)
  5. Res4: Array [Int] = Array (2)
  6.  
  7. Scala> rdd1.top (2)
  8. Res5: Array [Int] = Array (2, 3)
  9.  
  10. Scala> rdd1.takeOrdered (1)
  11. Res6: Array [Int] = Array (12)
  12.  
  13. Scala> rdd1.takeOrdered (2)
  14. Res7: Array [Int] = Array (12, 10)
  15.  
For more information on the Spark operator, refer to the Spark operator series .

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization