Apache Spark DataFrames Getting Started Guide: Creating a DataFrame
First, create a DataFrame from a csv fileThis article will show you how to create a DataFrame from a csv file.
How to do?Creating a DataFrame from a csv file consists of the following steps:
1, in the build.sbt file inside add spark-csv support library;
2, create SparkConf object, which includes Spark to run all the environment information;
3, create SparkContext object, it is to enter Spark's core entry point, and then we can create a SQLContext object through it;
4, use the SQLContext object to load the CSV file;
5, Spark built-in does not support the analysis of CSV files, but Databricks company developed a class library can support the analysis of CSV files. So we need to load this dependency file into a dependency file (pom.xml or build.sbt)
If you are a SBT project, please add the following dependencies to the build.sbt file:
How to workCsvFile method to receive the need to load the csv file path filePath, if the need to load the csv file with header information, we can useHeader set to true, so that the first line of information can be used as a column name to read; delimiter specified csv file The delimiter between columns.
In addition to using the csvFile function, we can also use sqlContext inside the load to load the csv file:
appendixIn order to facilitate everyone to test, I provided part of the StudentData.csv file data set: