Spark operator: RDD action Action action (5) -saveAsTextFile, saveAsSequenceFile, saveAsObjectFile

mai 26, 2017

SaveAsTextFile

Def saveAsTextFile (path: String): Unit
Def saveAsTextFile (path: String, codec: Class [_ <: CompressionCodec]): Unit
SaveAsTextFile is used to store RDD as a text file in a file system.
The codec parameter can specify the compressed class name.


  Var rdd1 = sc.makeRDD (1 to 10,2)
 Scala> rdd1.saveAsTextFile ("hdfs: //cdh5/tmp/lxw1234.com/") // save to HDFS
 Hadoop fs -ls /tmp/lxw1234.com
 Found 2 items
 -rw-r-r-2 lxw1234 supergroup 0 2015-07-10 09:15 /tmp/lxw1234.com/_SUCCESS
 -rw-r-r-2 lxw1234 supergroup 21 2015-07-10 09:15 /tmp/lxw1234.com/part-00000
 
 Hadoop fs -cat /tmp/lxw1234.com/part-00000
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10

Note: If you save a file to a local file system using rdd1.saveAsTextFile ("file: ///tmp/lxw1234.com"), only the local directory of the machine where the Executor is located is saved.
// specify the compressed format to save


  Rdd1.saveAsTextFile ("hdfs: //cdh5/tmp/lxw1234.com/", classOf [com.hadoop.compression.lzo.LzopCodec])
 
 Hadoop fs -ls /tmp/lxw1234.com
 -rw-r-r-2 lxw1234 supergroup 0 2015-07-10 09:20 /tmp/lxw1234.com/_SUCCESS
 -rw-r-r-2 lxw1234 supergroup 71 2015-07-10 09:20 /tmp/lxw1234.com/part-00000.lzo
 
 Hadoop fs -text /tmp/lxw1234.com/part-00000.lzo
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10

SaveAsSequenceFile

SaveAsSequenceFile is used to save RDD to HDFS in the file format of SequenceFile.
Usage with saveAsTextFile.

SaveAsObjectFile

Def saveAsObjectFile (path: String): Unit
SaveAsObjectFile is used to serialize the elements in the RDD into objects and store them in a file.
For HDFS, the default is saved with SequenceFile.


  Var rdd1 = sc.makeRDD (1 to 10,2)
 Rdd1.saveAsObjectFile ("hdfs: //cdh5/tmp/lxw1234.com/")
 
 Hadoop fs -cat /tmp/lxw1234.com/part-00000
 SEQ! Org.apache.hadoop.io.NullWritable "org.apache.hadoop.io.BytesWritableT

Rechercher dans ce blog

Big data