Apache Spark DataFrames Getting Started Guide: Working with DataFrame
Second, the operation of DataFrame
In the previous article, we introduced how to create a DataFrame. This article describes how to manipulate the data in the DataFrame and print out the data in the DataFramePrint the pattern inside the DataFrame
After the creation of DataFrame, we generally see the data inside the model, we canprintSchema
function to view. It prints the name and type of the column: students.printSchema root |-- id : string (nullable = true ) |-- studentName : string (nullable = true ) |-- phone : string (nullable = true ) |-- email : string (nullable = true ) |
students.printSchema
the output is as follows: root |-- id|studentName|phone|email : string (nullable = true ) |
Data on the DataFrame is sampled
After printing the pattern, the second thing we have to do is look at the data loaded into the DataFrame. There are many ways to sample data from a newly created DataFrame. Let's introduce it.The simplest is to use the show method, show method has four versions:
(1), the first number we need to specify the number of rows
def show(numRows: Int);
(2), the second does not require us to specify any parameters, in which case, show function will be loaded by default 20 lines of data
def show();
(3), the third need to specify a boolean value, this value shows whether the need for more than 20
def show(truncate: Boolean);
(4), the last need to specify the sampling of the line and whether the need for
def show(numRows: Int, truncate: Boolean)
of the column def show(numRows: Int, truncate: Boolean)
. In fact, the first three functions are called to achieve this function. The Show function differs from other functions in that it not only displays the lines that need to be printed, but also prints out the header information and directs it directly in the default output stream. To see how to use it:
students.show() //打印出20行 +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| | 17 | Blossom| 1 - 154 - 406 - 9596 |Nunc.commodo.auct...| | 18 | Guy| 1 - 869 - 521 - 3230 |senectus.et.netus...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 20 | Edward| 1 - 711 - 710 - 6552 |lectus @ aliquetlib...| +---+-----------+--------------+--------------------+ only showing top 20 rows students.show( 15 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| +---+-----------+--------------+--------------------+ only showing top 15 rows students.show( true ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| | 17 | Blossom| 1 - 154 - 406 - 9596 |Nunc.commodo.auct...| | 18 | Guy| 1 - 869 - 521 - 3230 |senectus.et.netus...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 20 | Edward| 1 - 711 - 710 - 6552 |lectus @ aliquetlib...| +---+-----------+--------------+--------------------+ only showing top 20 rows students.show( false ) +---+-----------+--------------+-----------------------------------------+ |id |studentName|phone |email | +---+-----------+--------------+-----------------------------------------+ | 1 |Burke | 1 - 300 - 746 - 8446 |ullamcorper.velit.in @ ametnullaDonec.co.uk| | 2 |Kamal | 1 - 668 - 571 - 5046 |pede.Suspendisse @ interdumenim.edu | | 3 |Olga | 1 - 956 - 311 - 1686 |Aenean.eget.metus @ dictumcursusNunc.edu | | 4 |Belle | 1 - 246 - 894 - 6340 |vitae.aliquet.nec @ neque.co.uk | | 5 |Trevor | 1 - 300 - 527 - 4967 |dapibus.id @ acturpisegestas.net | | 6 |Laurel | 1 - 691 - 379 - 9921 |adipiscing @ consectetueripsum.edu | | 7 |Sara | 1 - 608 - 140 - 1995 |Donec.nibh @ enimEtiamimperdiet.edu | | 8 |Kaseem | 1 - 881 - 586 - 2689 |cursus.et.magna @ euismod.org | | 9 |Lev | 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsumdolor.com | | 10 |Maya | 1 - 271 - 683 - 2698 |accumsan.convallis @ ornarelectusjusto.edu | | 11 |Emi | 1 - 467 - 270 - 1337 |est @ nunc.com | | 12 |Caleb | 1 - 683 - 212 - 0896 |Suspendisse @ Quisque.edu | | 13 |Florence | 1 - 603 - 575 - 2444 |sit.amet.dapibus @ lacusAliquamrutrum.ca | | 14 |Anika | 1 - 856 - 828 - 7883 |euismod @ ligulaelit.co.uk | | 15 |Tarik | 1 - 398 - 171 - 2268 |turpis @ felisorci.com | | 16 |Amena | 1 - 878 - 250 - 3129 |lorem.luctus.ut @ scelerisque.com | | 17 |Blossom | 1 - 154 - 406 - 9596 |Nunc.commodo.auctor @ eratSed.co.uk | | 18 |Guy | 1 - 869 - 521 - 3230 |senectus.et.netus @ lectusrutrum.com | | 19 |Malachi | 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ estarcu.net | | 20 |Edward | 1 - 711 - 710 - 6552 |lectus @ aliquetlibero.co.uk | +---+-----------+--------------+-----------------------------------------+ only showing top 20 rows students.show( 10 , false ) +---+-----------+--------------+-----------------------------------------+ |id |studentName|phone |email | +---+-----------+--------------+-----------------------------------------+ | 1 |Burke | 1 - 300 - 746 - 8446 |ullamcorper.velit.in @ ametnullaDonec.co.uk| | 2 |Kamal | 1 - 668 - 571 - 5046 |pede.Suspendisse @ interdumenim.edu | | 3 |Olga | 1 - 956 - 311 - 1686 |Aenean.eget.metus @ dictumcursusNunc.edu | | 4 |Belle | 1 - 246 - 894 - 6340 |vitae.aliquet.nec @ neque.co.uk | | 5 |Trevor | 1 - 300 - 527 - 4967 |dapibus.id @ acturpisegestas.net | | 6 |Laurel | 1 - 691 - 379 - 9921 |adipiscing @ consectetueripsum.edu | | 7 |Sara | 1 - 608 - 140 - 1995 |Donec.nibh @ enimEtiamimperdiet.edu | | 8 |Kaseem | 1 - 881 - 586 - 2689 |cursus.et.magna @ euismod.org | | 9 |Lev | 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsumdolor.com | | 10 |Maya | 1 - 271 - 683 - 2698 |accumsan.convallis @ ornarelectusjusto.edu | +---+-----------+--------------+-----------------------------------------+ only showing top 10 rows |
students.head( 5 ).foreach(println) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] [ 2 ,Kamal, 1 - 668 - 571 - 5046 ,pede.Suspendisse @ interdumenim.edu] [ 3 ,Olga, 1 - 956 - 311 - 1686 ,Aenean.eget.metus @ dictumcursusNunc.edu] [ 4 ,Belle, 1 - 246 - 894 - 6340 ,vitae.aliquet.nec @ neque.co.uk] [ 5 ,Trevor, 1 - 300 - 527 - 4967 ,dapibus.id @ acturpisegestas.net] println(students.head()) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] |
println(students.first()) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] students.take( 5 ).foreach(println) [ 1 ,Burke, 1 - 300 - 746 - 8446 ,ullamcorper.velit.in @ ametnullaDonec.co.uk] [ 2 ,Kamal, 1 - 668 - 571 - 5046 ,pede.Suspendisse @ interdumenim.edu] [ 3 ,Olga, 1 - 956 - 311 - 1686 ,Aenean.eget.metus @ dictumcursusNunc.edu] [ 4 ,Belle, 1 - 246 - 894 - 6340 ,vitae.aliquet.nec @ neque.co.uk] [ 5 ,Trevor, 1 - 300 - 527 - 4967 ,dapibus.id @ acturpisegestas.net] |
Query the columns inside the DataFrame
As you can see, all the columns in the DataFrame are named. The Select function can help us select the required columns from the DataFrame and return a new DataFrame, which I will introduce below.1, select only one column. If we only want to select this email from the DataFrame, the DataFrame is immutable, so this will return a new DataFrame:
val emailDataFrame : DataFrame = students.select( "email" ) |
emailDataFrame.show( 3 ) +--------------------+ | email| +--------------------+ |ullamcorper.velit...| |pede.Suspendisse @ ...| |Aenean.eget.metus...| +--------------------+ only showing top 3 rows |
val studentEmailDF = students.select( "studentName" , "email" ) studentEmailDF.show( 5 ) +-----------+--------------------+ |studentName| email| +-----------+--------------------+ | Burke|ullamcorper.velit...| | Kamal|pede.Suspendisse @ ...| | Olga|Aenean.eget.metus...| | Belle|vitae.aliquet.nec...| | Trevor|dapibus.id @ acturp...| +-----------+--------------------+ only showing top 5 rows |
printSchema
to ensure that the select column is printSchema
print out. If the column name is invalid, the org.apache.spark.sql.AnalysisException
will appear as follows: val studentEmailDF = students.select( "studentName" , "iteblog" ) studentEmailDF.show( 5 ) Exception in thread "main" org.apache.spark.sql.AnalysisException : cannot resolve 'iteblog' given input columns id, studentName, phone, email; |
Filter the data according to the criteria
Now that we know how to select the required columns in the DataFrame, let's look at how to filter the data in the DataFrame according to the criteria. Corresponding to Row-based data, we can view the DataFrame as a regular collection of Scala, and then we need to filter the relevant conditions, in order to show clearly, I did not use the show function behind the show filter results.students.filter( "id > 5" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | Emi| 1 - 467 - 270 - 1337 | est @ nunc.com| | 12 | Caleb| 1 - 683 - 212 - 0896 |Suspendisse @ Quisq...| | 13 | Florence| 1 - 603 - 575 - 2444 |sit.amet.dapibus @ ...| | 14 | Anika| 1 - 856 - 828 - 7883 |euismod @ ligulaeli...| | 15 | Tarik| 1 - 398 - 171 - 2268 |turpis @ felisorci.com| +---+-----------+--------------+--------------------+ only showing top 10 rows students.filter( "studentName =''" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| +---+-----------+--------------+--------------------+ |
students.filter( "studentName ='' OR studentName = 'NULL'" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 33 | NULL| 1 - 105 - 503 - 0141 |Donec @ Inmipede.co.uk| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| +---+-----------+--------------+--------------------+ |
students.filter( "SUBSTR(studentName,0,1) ='M'" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 19 | Malachi| 1 - 608 - 637 - 2772 |Proin.mi.Aliquam @ ...| | 24 | Marsden| 1 - 477 - 629 - 7528 |Donec.dignissim.m...| | 37 | Maggy| 1 - 910 - 887 - 6777 |facilisi.Sed.nequ...| | 61 | Maxine| 1 - 422 - 863 - 3041 |aliquet.molestie....| | 77 | Maggy| 1 - 613 - 147 - 4380 | pellentesque @ mi.net| | 97 | Maxwell| 1 - 607 - 205 - 1273 |metus.In @ musAenea...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
Sort the data inside the DataFrame
Using the sort function, we can sort the columns specified in the DataFrame:students.sort(students( "studentName" ).desc).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 50 | Yasir| 1 - 282 - 511 - 4445 |eget.odio.Aliquam...| | 52 | Xena| 1 - 527 - 990 - 8606 |in.faucibus.orci @ ...| | 86 | Xandra| 1 - 677 - 708 - 5691 |libero @ arcuVestib...| | 43 | Wynter| 1 - 440 - 544 - 1851 |amet.risus.Donec @ ...| | 31 | Wallace| 1 - 144 - 220 - 8159 | lorem.lorem @ non.net| | 66 | Vance| 1 - 268 - 680 - 0857 |pellentesque @ netu...| | 41 | Tyrone| 1 - 907 - 383 - 5293 |non.bibendum.sed @ ...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
students.sort( "studentName" , "id" ).show( 10 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 21 | | 1 - 598 - 439 - 7549 |consectetuer.adip...| | 32 | | 1 - 184 - 895 - 9602 |accumsan.laoreet @ ...| | 45 | | 1 - 245 - 752 - 0481 |Suspendisse.eleif...| | 83 | | 1 - 858 - 810 - 2204 |sociis.natoque @ eu...| | 94 | | 1 - 443 - 410 - 7878 |Praesent.eu.nulla...| | 91 | Abel| 1 - 530 - 527 - 7467 | urna @ veliteu.edu| | 69 | Aiko| 1 - 682 - 230 - 7013 |turpis.vitae.puru...| | 47 | Alma| 1 - 747 - 382 - 6775 | nec.enim @ non.org| | 26 | Amela| 1 - 526 - 909 - 2605 | in @ vitaesodales.edu| | 16 | Amena| 1 - 878 - 250 - 3129 |lorem.luctus.ut @ s...| +---+-----------+--------------+--------------------+ only showing top 10 rows |
students.sort(students( "studentName" ).asc, students( "id" ).asc).show( 10 ) |
Rename the column
If we are not interested in the default column name in the DataFrame, we can rename it with as if we selected it, and the following column willstudentName
to name and email the column name studentName
: students.select(students( "studentName" ).as( "name" ), students( "email" )).show( 10 ) +--------+--------------------+ | name| email| +--------+--------------------+ | Burke|ullamcorper.velit...| | Kamal|pede.Suspendisse @ ...| | Olga|Aenean.eget.metus...| | Belle|vitae.aliquet.nec...| | Trevor|dapibus.id @ acturp...| | Laurel|adipiscing @ consec...| | Sara|Donec.nibh @ enimEt...| | Kaseem|cursus.et.magna @ e...| | Lev|Vivamus.nisi @ ipsu...| | Maya|accumsan.convalli...| +--------+--------------------+ only showing top 10 rows |
Think of DataFrame as a relational data table
One of the strengths of DataFrame is that we can think of it as a relational data table and then run SQL queries on it as long as we do the following two steps:(1), the DataFrame registered as a student named table:
students.registerTempTable( "students" ) |
sqlContext.sql( "select * from students where studentName!='' order by email desc" ).show( 7 ) +---+-----------+--------------+--------------------+ | id|studentName| phone| email| +---+-----------+--------------+--------------------+ | 87 | Selma| 1 - 601 - 330 - 4409 |vulputate.velit @ p...| | 96 | Channing| 1 - 984 - 118 - 7533 |viverra.Donec.tem...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 78 | Finn| 1 - 213 - 781 - 6969 |vestibulum.massa @ ...| | 53 | Kasper| 1 - 155 - 575 - 9346 |velit.eget @ pedeCu...| | 63 | Dylan| 1 - 417 - 943 - 8961 |vehicula.aliquet @ ...| | 35 | Cadman| 1 - 443 - 642 - 5919 |ut.lacus @ adipisci...| +---+-----------+--------------+--------------------+ only showing top 7 rows |
Join two DataFrame operations
We already know how to register a DataFrame as a table, and now let's look at how to use normal SQL to join the two DataFrame.1, inline : the inline is the default Join operation, it only returns two DataFrame are matched to the results, take a look at the following example:
val students 1 = sqlContext.csvFile(filePath = "E:\\StudentPrep1.csv" , useHeader = true , delimiter = '|' ) val students 2 = sqlContext.csvFile(filePath = "E:\\StudentPrep2.csv" , useHeader = true , delimiter = '|' ) val studentsJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" )) studentsJoin.show(studentsJoin.count.toInt) +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 |BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 |KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 |BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 |LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| +---+-----------+--------------+--------------------+---+------------------+--------------+--------------------+ |
val studentsRightOuterJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" ), "right_outer" ) studentsRightOuterJoin.show(studentsRightOuterJoin.count.toInt) +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 | BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 | KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 | BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 | LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | null | null | null | null | 999 |LevUniqueToSecondRDD| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| +----+-----------+--------------+--------------------+---+--------------------+--------------+--------------------+ |
val studentsLeftOuterJoin = students 1 .join(students 2 , students 1 ( "id" ) === students 2 ( "id" ), "left_outer" ) studentsLeftOuterJoin.show(studentsLeftOuterJoin.count.toInt) +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ | id|studentName| phone| email| id| studentName| phone| email| +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ | 1 | Burke| 1 - 300 - 746 - 8446 |ullamcorper.velit...| 1 |BurkeDifferentName| 1 - 300 - 746 - 8446 |ullamcorper.velit...| | 2 | Kamal| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| 2 |KamalDifferentName| 1 - 668 - 571 - 5046 |pede.Suspendisse @ ...| | 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| 3 | Olga| 1 - 956 - 311 - 1686 |Aenean.eget.metus...| | 4 | Belle| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| 4 |BelleDifferentName| 1 - 246 - 894 - 6340 |vitae.aliquet.nec...| | 5 | Trevor| 1 - 300 - 527 - 4967 |dapibus.id @ acturp...| 5 | Trevor| 1 - 300 - 527 - 4967 |dapibusDifferentE...| | 6 | Laurel| 1 - 691 - 379 - 9921 |adipiscing @ consec...| 6 |LaurelInvalidPhone| 000000000 |adipiscing @ consec...| | 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| 7 | Sara| 1 - 608 - 140 - 1995 |Donec.nibh @ enimEt...| | 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| 8 | Kaseem| 1 - 881 - 586 - 2689 |cursus.et.magna @ e...| | 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| 9 | Lev| 1 - 916 - 367 - 5608 |Vivamus.nisi @ ipsu...| | 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| 10 | Maya| 1 - 271 - 683 - 2698 |accumsan.convalli...| | 11 | iteblog| 999999 | iteblog @ iteblog.com| null | null | null | null | +---+-----------+--------------+--------------------+----+------------------+--------------+--------------------+ |
Save the DataFrame as a file
Let me introduce how to save a DataFrame into a file. We used to load the csv file load function, and for the preservation of the file can use the save function. Specific operations include the following two steps:1, first create a map object, used to store some of the save function needs to use some of the properties. Here I will develop a save file to save the path and csv header information.
val saveOptions = Map( "header" -> "true" , "path" -> "iteblog.csv" ) |
val copyOfStudents = students.select(students( "studentName" ).as( "name" ), students( "email" )) |
copyOfStudents.write.format( "com.databricks.spark.csv" ).mode(SaveMode.Overwrite).options(saveOptions).save() |
It should be noted that the path parameter specified above is to save the folder, not the last save the file name.
Commentaires
Enregistrer un commentaire