Hive Installation and Quick Start Guide

1. Objective

This Hive tutorial contains simple steps for installing and running hive on Ubuntu. Hive is a datawarehousing infrastructure on the top of Hadoop. This hive quickstart will help you setup and configure hive and run several Hive QL queries to learn the concepts of hive.

2. Introduction

Apache Hive is a warehouse infrastructure designed on high of Hadoop for providing information summarization, query, and ad-hoc analysis. Hence, in order to get your Hive running successfully, Java and Hadoop ought to be pre-installed and should be functioning well on your Linux OS. For installation procedure of Java and Hadoop you can refer Hadoop installation Guide

3. Hive Installation

Now in order to get Hive successfully installed on your system, please follow the below steps and execute them on your Linux OS:

3.1. Download Hive

In this tutorial we will use hive-0.13.1-cdh5.3.2. (you can also use any latest version of hive) Download hive using below mentioned link: http://apache.petsads.us/hive/hive-0.13.1-cdh5.3.2/ apache-hive-0.13.1-cdh5.3.2.tar.gz. This file gets downloaded on your Downloads directory.
After the successful download of Hive, we will get the following response:
1
apache-hive-0.13.1-cdh5.3.2 hive-0.13.1-cdh5.3.2.tar.gz

3.1.1. Untar the file

Move the setup file in home directory and untar/unzip the downloaded file by executing the below command:
1
$ tar zxvf hive-0.13.1-cdh5.3.2.tar.gz

3.2. Setting up Hive Environment Variables

3.2.1. Editing .bashrc file

In order to set up the Hive environment we need to append the following lines at the end of the ~/.bashrc file.
1
2
3
4
export HADOOP_USER_CLASSPATH_FIRST=true
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_HOME=/home/dataflair/hadoop-2.6.0-cdh5.5.1
export HIVE_HOME=/home/dataflair/hive-0.13.1-cdh5.3.2
Note: Here enter correct name & version of your hive and correct path of your Hive File “/home/dataflair/hive-0.13.1-cdh5.3.2” this is the path of my Hive File and “hive-0.13.1-cdh5.3.2” is the name of my hive file. So please enter correct path and name of your Hive file. After adding save this file.
And in order to execute this file use the following command:
1
$ source ~/.bashrc

4. Launching HIVE

1
$ hive
The following output gets displayed:
1
2
Logging initialized using configuration in jar:file:/home/dataflair/HADOOP/hive-0.13.1-cdh5.3.2/lib/hive-common-0.13.1-cdh5.3.2.jar!/hive-log4j.properties
hive>

5. Exit from Hive:

1
hive> exit;
Congratulations!! Hive gets successfully installed on your system. Now you can easily execute your commands.

Before using hive you should change the meta-store layer of hive, follow this tutorial to change meta-store of hive from derby to MySQL.

6. Hive Queries

Below are the some basic Hive queries which you will need while using Hive.

6.1. Show Databases

Syntax:
1
show databases;
Usage:
1
show databases;
This query gives a list of databases which are present in your Hive. If you had newly installed Hive and had not created any database, then by default a database named “default” is present there and would be shown up after executing above query.

6.2. Create Database

Syntax:
1
create database_name;
Usage:
1
create database test;
This will create a new database named “test”. And you can check this database by writing “show databases;” query.

6.3. Use

USE query is used to use the database created by you.
Syntax:
1
USE database_name;
Usage:
1
USE test;

6.4. Current Database

Syntax:
1
set hive.cli.print.current.db=true;
It is used to know the name of database in which you are currently working.

6.5. DROP

DROP query is used to delete a database
Syntax:
1
DROP database database_name;
Usage:
1
DROP database test1;

6.6. CREATE TABLE

This command is used to create new table.
Syntax:
1
2
3
4
5
6
CREATE TABLE TABLE_NAME (Parameters)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
Usage:
1
create table employee ( Name String comment ‘Employee Name’, Id int, MobileNumber String, Salary Float) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as textfile;

6.7. View tables

Syntax:
1
show tables;
It will list you all the tables created by you on the current directory.

6.8. Alter Table

It is used to change attributes inside a table.
Syntax: We can change a number of attributes inside a table what we want to change.
1
2
3
4
5
ALTER TABLE TableName RENAME TO new_name
ALTER TABLE TableName ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE TableName DROP [COLUMN] column_name
ALTER TABLE TableName CHANGE column_name new_name new_type
ALTER TABLE TableName REPLACE COLUMNS (col_spec[, col_spec ...])
Usage:
1
ALTER TABLE employee RENAME TO demo1;

6.9. Describe table

Syntax:
1
desc TableName;
Usage:
1
desc employee;
This command gives a description of the parameters inside the table.

6.10. Load data

Syntax:
1
LOAD DATA LOCAL INPATH 'Path of the File' OVERWRITE INTO TABLE 'Name of the Table';
Usage:
1
LOAD DATA LOCAL INPATH '/home/dataflair/Desktop/details.txt' OVERWRITE INTO TABLE employee;
This command loads the data from your file path to the selected table created by you in Hive.

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization