Difference between Spark HiveContext and SQLContext
Difference between Spark HiveContext and SQLContext
Goal:
This article explains what is the difference between Spark HiveContext and SQLContext.Env:
Below tests are done on Spark 1.5.2Solution:
Per Spark SQL programming guide, HiveContext is a super set of the SQLContext. Additional features include the ability to write queries using the more complete HiveQL parser, access to Hive UDFs, and the ability to read data from Hive tables.For example, one the key difference is using HiveContext you can use the new window function feature.
In Spark 1.5.2, once you launch spark-shell, the default SQL context is already HiveContext, although below line still shows "SQL context":
1
| SQL context available as sqlContext. |
1
2
3
4
| import org.apache.spark.sql.hive.HiveContext; import org.apache.spark.api.java.JavaSparkContext; var hiveContext = new HiveContext(JavaSparkContext.toSparkContext(sc)) val sqlContext = new org.apache.spark.sql.SQLContext(sc) |
1
2
| scala> hiveContext.sql( "SELECT value, dense_rank() OVER (PARTITION BY key ORDER BY value DESC) as rank FROM src" ).collect res 7 : Array[org.apache.spark.sql.Row] = Array([val _ 431 , 1 ], [val _ 431 , 1 ], [val _ 431 , 1 ], [val _ 431 , 1 ], [val _ 431 , 1 ], [val _ 431 , 1 ], [val _ 432 , 1 ], [val _ 432 , 1 ], [val _ 33 , 1 ], [val _ 33 , 1 ], [val _ 233 , 1 ], [val _ 233 , 1 ], [val _ 233 , 1 ], [val _ 233 , 1 ], [val _ 34 , 1 ], [val _ 34 , 1 ], [val _ 35 , 1 ], [val _ 35 , 1 ], [val _ 35 , 1 ], [val _ 35 , 1 ], [val _ 35 , 1 ], [val _ 35 , 1 ], [val _ 235 , 1 ], [val _ 235 , 1 ], [val _ 435 , 1 ], [val _ 435 , 1 ], [val _ 436 , 1 ], [val _ 436 , 1 ], [val _ 37 , 1 ], [val _ 37 , 1 ], [val _ 37 , 1 ], [val _ 37 , 1 ], [val _ 237 , 1 ], [val _ 237 , 1 ], [val _ 237 , 1 ], [val _ 237 , 1 ], [val _ 437 , 1 ], [val _ 437 , 1 ], [val _ 238 , 1 ], [val _ 238 , 1 ], [val _ 238 , 1 ], [val _ 238 , 1 ], [val _ 438 , 1 ], [val _ 438 , 1 ], [val _ 438 , 1 ], [val _ 438 , 1 ], [val _ 438 , 1 ], [val _ 438 , 1 ], [val _ 239 , 1 ], [val _ 239 , 1 ], [val _ 239 , 1 ], [val _ 239 , 1 ], [val _ 439 , 1 ], [val _ 439 , 1 ], [val _ 439 , 1 ], [val _ 439 , 1 ], [val _ 41 , 1 ], [val _ 41 , 1 ], [val _ 241 , 1 ], ... |
1
2
3
4
| scala> sqlContext.sql( "SELECT value, dense_rank() OVER (PARTITION BY key ORDER BY value DESC) as rank FROM src" ).collect java.lang.RuntimeException : [ 1.33 ] failure : ``union '' expected but `(' found SELECT value, dense _ rank() OVER (PARTITION BY key ORDER BY value DESC) as rank FROM src |
==
Commentaires
Enregistrer un commentaire