Spark operator: RDD key conversion operation (5) -leftOuterJoin, rightOuterJoin, subtractByKey

LeftOuterJoin

Def leftOuterJoin [W] (other: RDD [(K, W)]): RDD [(K, (V, Option [W]))
Def leftOuterJoin [W] (other: RDD [(K, W)], numPartitions: Int): RDD [(K, (V, Option [W]))
(Left, respectively,): RDD [(K, (V, Option [W]))]
LeftOuterJoin similar to the left outer association in the left outer join, return to the front of the RDD-based, not on the record is empty. Can only be used for the association between the two RDD, if more than one RDD association, can be more than a few times.
Parameters numPartitions The number of partitions used to specify the result
The parameter partitioner is used to specify the partition function
  1. ("A", "1"), ("B", "2"), ("C", "3")), 2)
  2. ("A", "a"), ("C", "c"), ("D", "d")), 2)
  3.  
  4. Scala> rdd1.leftOuterJoin (rdd2) .collect
  5. (B, (2, None)), (A, (1, Some (a))), (C, (3, Some) (C))))
  6.  

RightOuterJoin

Def rightOuterJoin [W] (other: RDD [(K, W)]): RDD [(K, (Option [V], W))]
Def rightOuterJoin [W] (other: RDD [(K, W)], numPartitions: Int): RDD [(K, (Option [V], W))]
Def rightOuterJoin [W] (other: RDD [(K, W)], partitioner: Partitioner): RDD [(K, (Option [V], W))]
RightOuterJoin similar to the SQL in the outer association right outer join, the results returned to the parameters of the RDD-based, not on the record is empty. Can only be used for the association between the two RDD, if more than one RDD association, can be more than a few times.
Parameters numPartitions The number of partitions used to specify the result
The parameter partitioner is used to specify the partition function
  1. ("A", "1"), ("B", "2"), ("C", "3")), 2)
  2. ("A", "a"), ("C", "c"), ("D", "d")), 2)
  3. Scala> rdd1.rightOuterJoin (rdd2) .collect
  4. (A, (some (1), a)), (C, (some (3), () () () () () ), C)))
  5.  

SubtractByKey

Def subractor [W] (other: RDD [(K, W)]) (implicit arg0: ClassTag [W]): RDD [(K, V)]
DefimractByKey [W] (other: RDD [(K, W)], numPartitions: Int) (implicit arg0: ClassTag [W]): RDD [(K, V)]
Def subractor [W] (other: RDD [(K, W)], p: Partitioner) (implicit arg0: ClassTag [W]): RDD [(K, V)]
SubtractByKey is similar to subtract in basic conversion operations
( Http://lxw1234.com/archives/2015/07/345.htm ), but here is for K, return in the main RDD appears, and not appear in the otherRDD elements.
Parameters numPartitions The number of partitions used to specify the result
The parameter partitioner is used to specify the partition function
  1. ("A", "1"), ("B", "2"), ("C", "3")), 2)
  2. ("A", "a"), ("C", "c"), ("D", "d")), 2)
  3.  
  4. Scala> rdd1.subtractByKey (rdd2) .collect
  5. Res13: Array [(String, String)] = Array ((B, 2))
  6.  
For more information on the Spark operator, refer to the Spark operator series .
Http://lxw1234.com/archives/2015/07/363.htm

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization