Reading Data from HBase table using Spark
HBase is a data model, similar to Google’s big table, designed to
provide quick random access to huge amounts of structured data. It
leverages the fault tolerance provided by the Hadoop Distributed File
System (HDFS).
HBase is column-oriented database built on top of the HDFS. It is an open-source and is horizontally scalable. HBase is used to access very large tables — billions of rows X millions of columns — atop clusters of commodity hardware.
Let us consider we have a table with name “student_info” within our HBase with the columnfamily “details” and column qualifiers “sid, firstName, lastName, branch, emailId”. Create a pojo class as below:
Create JavaSparkContext object using SparkConf object
Read data from HBase table, providing the key space and table name using the code below:
Now studentRDD will have all the records from the table in the form
of Spark RDD. We can perform any aggregate or spark operation on top of
this RDD.
HBase is column-oriented database built on top of the HDFS. It is an open-source and is horizontally scalable. HBase is used to access very large tables — billions of rows X millions of columns — atop clusters of commodity hardware.
Let us consider we have a table with name “student_info” within our HBase with the columnfamily “details” and column qualifiers “sid, firstName, lastName, branch, emailId”. Create a pojo class as below:
Read data from HBase table, providing the key space and table name using the code below:
Commentaires
Enregistrer un commentaire