Apache Spark DataFrames Getting Started Guide: Creating a DataFrame
First, create a DataFrame from a csv file
This article will show you how to create a DataFrame from a csv file.How to do?
Creating a DataFrame from a csv file consists of the following steps:1, in the build.sbt file inside add spark-csv support library;
2, create SparkConf object, which includes Spark to run all the environment information;
3, create SparkContext object, it is to enter Spark's core entry point, and then we can create a SQLContext object through it;
4, use the SQLContext object to load the CSV file;
5, Spark built-in does not support the analysis of CSV files, but Databricks company developed a class library can support the analysis of CSV files. So we need to load this dependency file into a dependency file (pom.xml or build.sbt)
If you are a SBT project, please add the following dependencies to the build.sbt file:
libraryDependencies + = "com.databricks" % "spark-csv_2.10" % "1.3.0" |
< dependency > < groupid >com.databricks</ groupid > < artifactid >spark-csv_2.10</ artifactid > < version >1.3.0</ version > </ dependency > |
import org.apache.spark.SparkConf val conf = new SparkConf().setAppName( "csvDataFrame" ).setMaster( "local[2]" ) |
val sc = new SparkContext(conf) |
val sqlContext = new SQLContext(sc) |
import com.databricks.spark.csv. _ val students = sqlContext.csvFile(filePath = "StudentData.csv" , useHeader = true , delimiter = '|' ) |
org.apache. spark.sql.DataFrame
. How to work
CsvFile method to receive the need to load the csv file path filePath, if the need to load the csv file with header information, we can useHeader set to true, so that the first line of information can be used as a column name to read; delimiter specified csv file The delimiter between columns.In addition to using the csvFile function, we can also use sqlContext inside the load to load the csv file:
val options = Map( "header" -> "true" , "path" -> "E:\\StudentData.csv" ) val newStudents = sqlContext.read.options(options).format( "com.databricks.spark.csv" ).load() |
appendix
In order to facilitate everyone to test, I provided part of the StudentData.csv file data set:id|studentName|phone|email 1 |Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit.in @ ametnullaDonec.co.uk 2 |Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ interdumenim.edu 3 |Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus @ dictumcursusNunc.edu 4 |Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec @ neque.co.uk 5 |Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturpisegestas.net 6 |Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consectetueripsum.edu 7 |Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEtiamimperdiet.edu 8 |Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ euismod.org 9 |Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsumdolor.com 10 |Maya| 1 - 271 - 683 - 2698 |accumsan.convallis @ ornarelectusjusto.edu 11 |Emi| 1 - 467 - 270 - 1337 |est @ nunc.com 12 |Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisque.edu 13 |Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ lacusAliquamrutrum.ca 14 |Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaelit.co.uk 15 |Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com 16 |Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ scelerisque.com 17 |Blossom| 1 - 154 - 406 - 9596 |Nunc.commodo.auctor @ eratSed.co.uk 18 |Guy| 1 - 869 - 521 - 3230 |senectus.et.netus @ lectusrutrum.com 19 |Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ estarcu.net 20 |Edward| 1 - 711 - 710 - 6552 |lectus @ aliquetlibero.co.uk |
Commentaires
Enregistrer un commentaire