Configure Apache Phoenix in CDH 5.4

Apache Phoenix is an open source, relational database layer on top of noSQL store such as Apache HBase. Phoenix provides a JDBC driver that hides the intricacies of the noSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; upsert and delete rows singly and in bulk; and query data through SQL.
Installation:
Following are the steps that need to be followed to configure Apache Phoenix in Cloudera Distribution for Hadoop (CDH)
1. Login to Cloudera Manager, click on Hosts, then Parcels.
2. Select Edit Settings.
3. Click the + sign next to an existing Remote Parcel Repository URL, and add the URL: http://archive.cloudera.com/cloudera-labs/phoenix/parcels/1.0/ Click Save Changes.
4. Select Hosts, then Parcels.
5. In the list of Parcel Names, CLABS_PHOENIX is now available. Select it and choose Download.
6. The first cluster is selected by default. To choose a different cluster for distribution, select it. Find CLABS_PHOENIX in the list, and click Distribute.
7. If you to use secondary indexing, add the following to the hbase-site.xml advanced configuration snippet. Go to the HBase service, click Configuration, and choose/search for HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml. Paste in the following XML, then save the changes.
8. Restart the HBase service.
Using Apache Phoenix Utilities:
Several command-line utilities for Apache Phoenix are installed into /usr/bin.
Prerequisites:
Before using the Phoenix utilities, set the JAVA_HOME environment variable in your terminal session, and ensure that the java executable is in your path. Adjust the following commands to your operating system’s configuration.
phoenix-sqlline.py
A command-line interface to execute SQL from the command line. It takes a single argument, which is the ZooKeeper quorum of the corresponding HBase cluster. For example:
phoenix-psql.py
A command-line interface to load CSV data or execute SQL scripts. It takes two arguments, the ZooKeeper quorum and the CSV or SQL file to process. For example:
phoenix-performance
A command-line interface to create a given number of rows and run timed queries against the data. It takes two arguments, the ZooKeeper quorum and the number of rows to create. For example:
References:
https://en.wikipedia.org/wiki/Apache_Phoenix
https://phoenix.apache.org/

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization