Spark operator: RDD action Action action (5) -saveAsTextFile, saveAsSequenceFile, saveAsObjectFile
SaveAsTextFile
Def saveAsTextFile (path: String): UnitDef saveAsTextFile (path: String, codec: Class [_ <: CompressionCodec]): Unit
SaveAsTextFile is used to store RDD as a text file in a file system.
The codec parameter can specify the compressed class name.
Note: If you save a file to a local file system using rdd1.saveAsTextFile ("file: ///tmp/lxw1234.com"), only the local directory of the machine where the Executor is located is saved.
- Var rdd1 = sc.makeRDD (1 to 10,2)
- Scala> rdd1.saveAsTextFile ("hdfs: //cdh5/tmp/lxw1234.com/") // save to HDFS
- Hadoop fs -ls /tmp/lxw1234.com
- Found 2 items
- -rw-r-r-2 lxw1234 supergroup 0 2015-07-10 09:15 /tmp/lxw1234.com/_SUCCESS
- -rw-r-r-2 lxw1234 supergroup 21 2015-07-10 09:15 /tmp/lxw1234.com/part-00000
- Hadoop fs -cat /tmp/lxw1234.com/part-00000
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
// specify the compressed format to save
- Rdd1.saveAsTextFile ("hdfs: //cdh5/tmp/lxw1234.com/", classOf [com.hadoop.compression.lzo.LzopCodec])
- Hadoop fs -ls /tmp/lxw1234.com
- -rw-r-r-2 lxw1234 supergroup 0 2015-07-10 09:20 /tmp/lxw1234.com/_SUCCESS
- -rw-r-r-2 lxw1234 supergroup 71 2015-07-10 09:20 /tmp/lxw1234.com/part-00000.lzo
- Hadoop fs -text /tmp/lxw1234.com/part-00000.lzo
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
SaveAsSequenceFile
SaveAsSequenceFile is used to save RDD to HDFS in the file format of SequenceFile.Usage with saveAsTextFile.
SaveAsObjectFile
Def saveAsObjectFile (path: String): UnitSaveAsObjectFile is used to serialize the elements in the RDD into objects and store them in a file.
For HDFS, the default is saved with SequenceFile.
- Var rdd1 = sc.makeRDD (1 to 10,2)
- Rdd1.saveAsObjectFile ("hdfs: //cdh5/tmp/lxw1234.com/")
- Hadoop fs -cat /tmp/lxw1234.com/part-00000
- SEQ! Org.apache.hadoop.io.NullWritable "org.apache.hadoop.io.BytesWritableT
Commentaires
Enregistrer un commentaire