Hands on Hbase shell

Hands on Hbase shell

1. General

statusShow cluster status summary. Below 2 are the same.
hbase(main):002:0> status
3 servers, 0 dead, 1.6667 average load
hbase(main):006:0> status 'summary'
3 servers, 0 dead, 1.6667 average load

Show region summary for each region server.
hbase(main):008:0> status 'simple'
3 live servers
    hdw2.openkb.com:60020 1397595530014
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=40, maxHeapMB=995
    hdw1.openkb.com:60020 1397595528761
        requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=50, maxHeapMB=995
    hdw3.openkb.com:60020 1397595529985
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=32, maxHeapMB=995
0 dead servers
Aggregate load: 0, regions: 5

Show each region details for each table.
hbase(main):007:0> status 'detailed'
version 0.94.8
0 regionsInTransition
master coprocessors: []
3 live servers
    hdw2.openkb.com:60020 1397595530014
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=39, maxHeapMB=995
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
    hdw1.openkb.com:60020 1397595528761
        requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=48, maxHeapMB=995
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=33, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=10, currentCompactedKVs=10, compactionProgressPct=1.0
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=8900, writeRequestsCount=7, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=11, currentCompactedKVs=11, compactionProgressPct=1.0
            numberOfStores=3, numberOfStorefiles=2, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=22, writeRequestsCount=7, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
    hdw3.openkb.com:60020 1397595529985
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=31, maxHeapMB=995
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=2, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
0 dead servers
versionShow Hbase version and release information.
hbase(main):010:0> version
0.94.8-xxx-x.x.x.x, rUnknown, Thu Oct 24 15:43:56 CST 2013
whoamiShow OS user and authentication method.
hbase(main):012:0> whoami
root (auth:SIMPLE)

2. Table Management


3. Data Manipulation

getPrepare the table with 3 versions for the same row and column.
create 't1','f1'
put 't1','row1','f1:col1','1'
put 't1','row1','f1:col1','2'
put 't1','row1','f1:col1','3'

> scan 't1', {VERSIONS => 3}
ROW                                  COLUMN+CELL
 row1                                column=f1:col1, timestamp=1400281044882, value=3
 row1                                column=f1:col1, timestamp=1400281044435, value=2
 row1                                column=f1:col1, timestamp=1400281044395, value=1
1 row(s) in 0.0260 seconds
Get the latest version in a time range:
(Note, the latest version is not showed.
If you want the latest version, upper bound should be 1400281044882+1 )
> get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882]}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
1 row(s) in 0.0100 seconds

Get the version at that timestamp:
> get 't1', 'row1', {TIMESTAMP => 1400281044435}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
1 row(s) in 0.0090 seconds

Get multiple versions in a time range:
> get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882], VERSIONS => 3}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
 f1:col1                                timestamp=1400281044395, value=1
2 row(s) in 0.0100 seconds

> get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882+1], VERSIONS => 3}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044882, value=3
 f1:col1                                timestamp=1400281044435, value=2
 f1:col1                                timestamp=1400281044395, value=1
3 row(s) in 0.0110 seconds

Add a filter to a Get, similar as where condition in SQL world:
> get 't1', 'row1', {FILTER => "ValueFilter(=, 'binary:2')"}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
1 row(s) in 0.0130 seconds
countPrepare a table with one row deleted.
create 't2','f1'
put 't2','row1','f1:col1','1'
put 't2','row1','f1:col1','2'
put 't2','row1','f1:col1','3'
put 't2','row1','f1:col1','4'
put 't2','row2','f1:col1','r2'
put 't2','row3','f1:col1','r3'
put 't2','row4','f1:col1','r4'
put 't2','row5','f1:col1','r5'
put 't2','row6','f1:col1','r6'
put 't2','row7','f1:col1','r7'
delete 't2', 'row7', 'f1:col1'
count can only see the live rows, not including deleted rows.
"Raw Scan" does not work.
> count 't2', INTERVAL => 2, CACHE => 1
Current count: 2, row: row2
Current count: 4, row: row4
Current count: 6, row: row6
6 row(s) in 0.0250 seconds

> count 't2', INTERVAL => 2, CACHE => 1, RAW=> true
Current count: 2, row: row2
Current count: 4, row: row4
Current count: 6, row: row6
6 row(s) in 0.0180 seconds
deletedelete will remove versions equal to or older than a timestamp.
> scan 't1', {VERSIONS => 3}
ROW                                     COLUMN+CELL
 row1                                   column=f1:col1, timestamp=1400281044882, value=3
 row1                                   column=f1:col1, timestamp=1400281044435, value=2
 row1                                   column=f1:col1, timestamp=1400281044395, value=1
1 row(s) in 0.0260 seconds

> delete 't1', 'row1', 'f1:col1', 1400281044435
0 row(s) in 0.0290 seconds

> scan 't1', {VERSIONS => 3}
ROW                                     COLUMN+CELL
 row1                                   column=f1:col1, timestamp=1400281044882, value=3
1 row(s) in 0.0110 seconds
deletealldeleteall can delete one row entirely.
It can also act as "delete" to delete one single version.
> scan 't2',VERSIONS=>3
ROW                                    COLUMN+CELL
 row1                                  column=f1:col1, timestamp=1400522715837, value=4
 row1                                  column=f1:col1, timestamp=1400522696025, value=3
 row1                                  column=f1:col1, timestamp=1400522695653, value=2
 row2                                  column=f1:col1, timestamp=1400523142819, value=r2
 row3                                  column=f1:col1, timestamp=1400523142844, value=r3
 row4                                  column=f1:col1, timestamp=1400523142867, value=r4
 row5                                  column=f1:col1, timestamp=1400523142885, value=r5
 row6                                  column=f1:col1, timestamp=1400523142917, value=r6
6 row(s) in 0.0460 seconds

 deleteall 't2','row1','f1:col1',1400522695653
0 row(s) in 0.0130 seconds

> scan 't2',VERSIONS=>3
ROW                                    COLUMN+CELL
 row1                                  column=f1:col1, timestamp=1400522715837, value=4
 row1                                  column=f1:col1, timestamp=1400522696025, value=3
 row2                                  column=f1:col1, timestamp=1400523142819, value=r2
 row3                                  column=f1:col1, timestamp=1400523142844, value=r3
 row4                                  column=f1:col1, timestamp=1400523142867, value=r4
 row5                                  column=f1:col1, timestamp=1400523142885, value=r5
 row6                                  column=f1:col1, timestamp=1400523142917, value=r6
6 row(s) in 0.0210 seconds

> deleteall 't2','row1'
0 row(s) in 0.0130 seconds

> scan 't2',VERSIONS=>3
ROW                                    COLUMN+CELL
 row2                                  column=f1:col1, timestamp=1400523142819, value=r2
 row3                                  column=f1:col1, timestamp=1400523142844, value=r3
 row4                                  column=f1:col1, timestamp=1400523142867, value=r4
 row5                                  column=f1:col1, timestamp=1400523142885, value=r5
 row6                                  column=f1:col1, timestamp=1400523142917, value=r6
5 row(s) in 0.0170 seconds
COUNTERS feature which can treat columns as counters. Very lightweight.
Prepare the table:
create 't3','f1'
put 't3','row1','f1:col1',0
incr can not increment field that isn't 64 bits wide:
> incr 't3', 'row1', 'f1:col1'
ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field that isn't 64 bits 

incr can create a new row with 64bits
>  incr 't3', 'row2', 'f1:col1'

> scan 't3'
 row        column=f1:col1, timestamp=1400530005533, value=0
 row2       column=f1:col1, timestamp=1400530215183, value=\x00\x00\x00\x00\x00\x00\x00\x01
2 row(s) in 0.0640 seconds

> get_counter 't3','row2','f1:col1'

> incr 't3', 'row2', 'f1:col1',10
scan can see the history;
put can change the history!
> scan 't2',VERSIONS=>3
ROW                                 COLUMN+CELL
 row2                               column=f1:col1, timestamp=1400525511563, value=123
 row2                               column=f1:col1, timestamp=1400523142819, value=r2
 row3                               column=f1:col1, timestamp=1400523142844, value=r3
 row4                               column=f1:col1, timestamp=1400523142867, value=r4
 row5                               column=f1:col1, timestamp=1400523142885, value=r5
 row6                               column=f1:col1, timestamp=1400523142917, value=r6
5 row(s) in 0.0370 seconds

> put 't2','row2','f1:col1', 'new_r2', 1400523142819
0 row(s) in 0.0330 seconds

> scan 't2',VERSIONS=>3
ROW                                 COLUMN+CELL
 row2                               column=f1:col1, timestamp=1400525511563, value=123
 row2                               column=f1:col1, timestamp=1400523142819, value=new_r2
 row3                               column=f1:col1, timestamp=1400523142844, value=r3
 row4                               column=f1:col1, timestamp=1400523142867, value=r4
 row5                               column=f1:col1, timestamp=1400523142885, value=r5
 row6                               column=f1:col1, timestamp=1400523142917, value=r6
5 row(s) in 0.0360 seconds

> get 't2','row2','f1:col1'
COLUMN                                 CELL
 f1:col1                               timestamp=1400525511563, value=123
1 row(s) in 0.0160 seconds
scanSimilar as offset+limit:

create 't4','f1'
put 't4','row1','f1:col1','c1'
put 't4','row1','f1:col2','c2'
put 't4','row2','f1:col1','2_col1'
put 't4','row3','f1:col1','3_col1'

>   scan 't4', {COLUMNS => ['f1:col1'], LIMIT => 2, STARTROW => 'row2'}
ROW                    COLUMN+CELL
 row2                  column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                  column=f1:col1, timestamp=1400609307724, value=3_col1
2 row(s) in 0.0190 seconds

Timerange scan(Look at the upper boundary):
> scan 't4', {COLUMNS => 'f1:col1', TIMERANGE => [1400609307664, 1400609307724]}
ROW                           COLUMN+CELL
 row2                         column=f1:col1, timestamp=1400609307691, value=2_col1
1 row(s) in 0.0170 seconds

> scan 't4', {COLUMNS => 'f1:col1', TIMERANGE => [1400609307664, 1400609307724+1]}
ROW                           COLUMN+CELL
 row2                         column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                         column=f1:col1, timestamp=1400609307724, value=3_col1
2 row(s) in 0.0140 seconds

More filters just like where condition:
> scan 't4', {FILTER => "(PrefixFilter ('row') AND (QualifierFilter (>=, 'binary:1'))) AND (TimestampsFilter ( 1400609307642, 1400609307724))"}
ROW                       COLUMN+CELL
 row1                     column=f1:col1, timestamp=1400609307642, value=c1
 row3                     column=f1:col1, timestamp=1400609307724, value=3_col1
2 row(s) in 0.0140 seconds

ColumnPaginationFilter(limit, offset)
can show the 1st,2nd,... column for each row and group them:
> scan 't4'
ROW                              COLUMN+CELL
 row1                            column=f1:col1, timestamp=1400609307642, value=c1
 row1                            column=f1:col2, timestamp=1400609307664, value=c2
 row2                            column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                            column=f1:col1, timestamp=1400609307724, value=3_col1
 row3                            column=f1:col3, timestamp=1400611139601, value=3_col3
 row3                            column=f1:col4, timestamp=1400611193699, value=3_col4
 row4                            column=f1:col1, timestamp=1400616374899, value=4_col1
 row4                            column=f1:col5, timestamp=1400611193718, value=4_col5
 row4                            column=f1:col6, timestamp=1400611194134, value=4_col6
 row5                            column=f1:col1, timestamp=1400616374933, value=5_col1
 row5                            column=f1:col4, timestamp=1400611260500, value=5_col4
 row6                            column=f1:col1, timestamp=1400616374951, value=6_col1
 row6                            column=f1:col4, timestamp=1400611260542, value=6_col4
 row7                            column=f1:col1, timestamp=1400616374970, value=7_col1
 row7                            column=f1:col4, timestamp=1400611260927, value=7_col4
 row8                            column=f1:col1, timestamp=1400616375410, value=8_col1
 row8                            column=f1:col10, timestamp=1400615943929, value=8_col10
 row8                            column=f1:col5, timestamp=1400615943808, value=8_col5
 row8                            column=f1:col6, timestamp=1400615943836, value=8_col6
 row8                            column=f1:col7, timestamp=1400615943856, value=8_col7
 row8                            column=f1:col8, timestamp=1400615943874, value=8_col8
 row8                            column=f1:col9, timestamp=1400615943895, value=8_col9
8 row(s) in 0.0320 seconds

>  scan 't4', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
ROW                              COLUMN+CELL
 row1                            column=f1:col1, timestamp=1400609307642, value=c1
 row2                            column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                            column=f1:col1, timestamp=1400609307724, value=3_col1
 row4                            column=f1:col1, timestamp=1400616374899, value=4_col1
 row5                            column=f1:col1, timestamp=1400616374933, value=5_col1
 row6                            column=f1:col1, timestamp=1400616374951, value=6_col1
 row7                            column=f1:col1, timestamp=1400616374970, value=7_col1
 row8                            column=f1:col1, timestamp=1400616375410, value=8_col1
8 row(s) in 0.0300 seconds

>  scan 't4', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 1)}
ROW                              COLUMN+CELL
 row1                            column=f1:col2, timestamp=1400609307664, value=c2
 row3                            column=f1:col3, timestamp=1400611139601, value=3_col3
 row4                            column=f1:col5, timestamp=1400611193718, value=4_col5
 row5                            column=f1:col4, timestamp=1400611260500, value=5_col4
 row6                            column=f1:col4, timestamp=1400611260542, value=6_col4
 row7                            column=f1:col4, timestamp=1400611260927, value=7_col4
 row8                            column=f1:col10, timestamp=1400615943929, value=8_col10
7 row(s) in 0.0170 seconds
truncatetruncate is actually disable+delete+create.
This can be proved by using debug mode in hbase shell.
hbase shell -d
truncate 't4'
Then you can see below words from debug log:
INFO client.HBaseAdmin: Started disable of t4
DEBUG client.HBaseAdmin: Sleeping= 1000ms, waiting for all regions to be disabled in t4
INFO client.HBaseAdmin: Disabled t4
 - Dropping table...
INFO client.HBaseAdmin: Deleted t4
 - Creating table...
4. HBase surgery tools
5. Cluster replication tools
6. Security tools


Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization