Hands on Hbase shell

Hands on Hbase shell

1. General

CommandSample
statusShow cluster status summary. Below 2 are the same.
hbase(main):002:0> status
3 servers, 0 dead, 1.6667 average load
hbase(main):006:0> status 'summary'
3 servers, 0 dead, 1.6667 average load

Show region summary for each region server.
hbase(main):008:0> status 'simple'
3 live servers
    hdw2.openkb.com:60020 1397595530014
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=40, maxHeapMB=995
    hdw1.openkb.com:60020 1397595528761
        requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=50, maxHeapMB=995
    hdw3.openkb.com:60020 1397595529985
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=32, maxHeapMB=995
0 dead servers
Aggregate load: 0, regions: 5

Show each region details for each table.
hbase(main):007:0> status 'detailed'
version 0.94.8
0 regionsInTransition
master coprocessors: []
3 live servers
    hdw2.openkb.com:60020 1397595530014
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=39, maxHeapMB=995
        test2,,1397857356085.5f07460cd224b77eb71e4386d054dde6.
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
    hdw1.openkb.com:60020 1397595528761
        requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=48, maxHeapMB=995
        -ROOT-,,0
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=33, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=10, currentCompactedKVs=10, compactionProgressPct=1.0
        .META.,,1
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=8900, writeRequestsCount=7, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=11, currentCompactedKVs=11, compactionProgressPct=1.0
        test3,,1400011790458.49daae95b96cceb1aafd595e0078bcde.
            numberOfStores=3, numberOfStorefiles=2, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=22, writeRequestsCount=7, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
    hdw3.openkb.com:60020 1397595529985
        requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=31, maxHeapMB=995
        test,,1397754779440.4fc6d9e31d226b5b213dd1c28f06f48e.
            numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=2, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
0 dead servers
versionShow Hbase version and release information.
hbase(main):010:0> version
0.94.8-xxx-x.x.x.x, rUnknown, Thu Oct 24 15:43:56 CST 2013
whoamiShow OS user and authentication method.
hbase(main):012:0> whoami
root (auth:SIMPLE)

2. Table Management

alter
create
describe
disable
disable_all
is_disabled
drop
drop_all
enable
enable_all
is_enabled
exists
list
show_filters
alter_status
alter_async

3. Data Manipulation

getPrepare the table with 3 versions for the same row and column.
create 't1','f1'
put 't1','row1','f1:col1','1'
put 't1','row1','f1:col1','2'
put 't1','row1','f1:col1','3'

> scan 't1', {VERSIONS => 3}
ROW                                  COLUMN+CELL
 row1                                column=f1:col1, timestamp=1400281044882, value=3
 row1                                column=f1:col1, timestamp=1400281044435, value=2
 row1                                column=f1:col1, timestamp=1400281044395, value=1
1 row(s) in 0.0260 seconds
Get the latest version in a time range:
(Note, the latest version is not showed.
If you want the latest version, upper bound should be 1400281044882+1 )
> get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882]}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
1 row(s) in 0.0100 seconds

Get the version at that timestamp:
> get 't1', 'row1', {TIMESTAMP => 1400281044435}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
1 row(s) in 0.0090 seconds

Get multiple versions in a time range:
> get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882], VERSIONS => 3}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
 f1:col1                                timestamp=1400281044395, value=1
2 row(s) in 0.0100 seconds

> get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882+1], VERSIONS => 3}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044882, value=3
 f1:col1                                timestamp=1400281044435, value=2
 f1:col1                                timestamp=1400281044395, value=1
3 row(s) in 0.0110 seconds

Add a filter to a Get, similar as where condition in SQL world:
> get 't1', 'row1', {FILTER => "ValueFilter(=, 'binary:2')"}
COLUMN                                  CELL
 f1:col1                                timestamp=1400281044435, value=2
1 row(s) in 0.0130 seconds
countPrepare a table with one row deleted.
create 't2','f1'
put 't2','row1','f1:col1','1'
put 't2','row1','f1:col1','2'
put 't2','row1','f1:col1','3'
put 't2','row1','f1:col1','4'
put 't2','row2','f1:col1','r2'
put 't2','row3','f1:col1','r3'
put 't2','row4','f1:col1','r4'
put 't2','row5','f1:col1','r5'
put 't2','row6','f1:col1','r6'
put 't2','row7','f1:col1','r7'
delete 't2', 'row7', 'f1:col1'
count can only see the live rows, not including deleted rows.
"Raw Scan" does not work.
> count 't2', INTERVAL => 2, CACHE => 1
Current count: 2, row: row2
Current count: 4, row: row4
Current count: 6, row: row6
6 row(s) in 0.0250 seconds

> count 't2', INTERVAL => 2, CACHE => 1, RAW=> true
Current count: 2, row: row2
Current count: 4, row: row4
Current count: 6, row: row6
6 row(s) in 0.0180 seconds
deletedelete will remove versions equal to or older than a timestamp.
> scan 't1', {VERSIONS => 3}
ROW                                     COLUMN+CELL
 row1                                   column=f1:col1, timestamp=1400281044882, value=3
 row1                                   column=f1:col1, timestamp=1400281044435, value=2
 row1                                   column=f1:col1, timestamp=1400281044395, value=1
1 row(s) in 0.0260 seconds

> delete 't1', 'row1', 'f1:col1', 1400281044435
0 row(s) in 0.0290 seconds

> scan 't1', {VERSIONS => 3}
ROW                                     COLUMN+CELL
 row1                                   column=f1:col1, timestamp=1400281044882, value=3
1 row(s) in 0.0110 seconds
deletealldeleteall can delete one row entirely.
It can also act as "delete" to delete one single version.
> scan 't2',VERSIONS=>3
ROW                                    COLUMN+CELL
 row1                                  column=f1:col1, timestamp=1400522715837, value=4
 row1                                  column=f1:col1, timestamp=1400522696025, value=3
 row1                                  column=f1:col1, timestamp=1400522695653, value=2
 row2                                  column=f1:col1, timestamp=1400523142819, value=r2
 row3                                  column=f1:col1, timestamp=1400523142844, value=r3
 row4                                  column=f1:col1, timestamp=1400523142867, value=r4
 row5                                  column=f1:col1, timestamp=1400523142885, value=r5
 row6                                  column=f1:col1, timestamp=1400523142917, value=r6
6 row(s) in 0.0460 seconds

 deleteall 't2','row1','f1:col1',1400522695653
0 row(s) in 0.0130 seconds

> scan 't2',VERSIONS=>3
ROW                                    COLUMN+CELL
 row1                                  column=f1:col1, timestamp=1400522715837, value=4
 row1                                  column=f1:col1, timestamp=1400522696025, value=3
 row2                                  column=f1:col1, timestamp=1400523142819, value=r2
 row3                                  column=f1:col1, timestamp=1400523142844, value=r3
 row4                                  column=f1:col1, timestamp=1400523142867, value=r4
 row5                                  column=f1:col1, timestamp=1400523142885, value=r5
 row6                                  column=f1:col1, timestamp=1400523142917, value=r6
6 row(s) in 0.0210 seconds

> deleteall 't2','row1'
0 row(s) in 0.0130 seconds

> scan 't2',VERSIONS=>3
ROW                                    COLUMN+CELL
 row2                                  column=f1:col1, timestamp=1400523142819, value=r2
 row3                                  column=f1:col1, timestamp=1400523142844, value=r3
 row4                                  column=f1:col1, timestamp=1400523142867, value=r4
 row5                                  column=f1:col1, timestamp=1400523142885, value=r5
 row6                                  column=f1:col1, timestamp=1400523142917, value=r6
5 row(s) in 0.0170 seconds
incr
get_
counter
COUNTERS feature which can treat columns as counters. Very lightweight.
Prepare the table:
create 't3','f1'
put 't3','row1','f1:col1',0
incr can not increment field that isn't 64 bits wide:
> incr 't3', 'row1', 'f1:col1'
ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field that isn't 64 bits 

incr can create a new row with 64bits
>  incr 't3', 'row2', 'f1:col1'
COUNTER VALUE = 1

> scan 't3'
ROW         COLUMN+CELL
 row        column=f1:col1, timestamp=1400530005533, value=0
 row2       column=f1:col1, timestamp=1400530215183, value=\x00\x00\x00\x00\x00\x00\x00\x01
2 row(s) in 0.0640 seconds

> get_counter 't3','row2','f1:col1'
COUNTER VALUE = 1

> incr 't3', 'row2', 'f1:col1',10
COUNTER VALUE = 11
put
scan can see the history;
put can change the history!
> scan 't2',VERSIONS=>3
ROW                                 COLUMN+CELL
 row2                               column=f1:col1, timestamp=1400525511563, value=123
 row2                               column=f1:col1, timestamp=1400523142819, value=r2
 row3                               column=f1:col1, timestamp=1400523142844, value=r3
 row4                               column=f1:col1, timestamp=1400523142867, value=r4
 row5                               column=f1:col1, timestamp=1400523142885, value=r5
 row6                               column=f1:col1, timestamp=1400523142917, value=r6
5 row(s) in 0.0370 seconds

> put 't2','row2','f1:col1', 'new_r2', 1400523142819
0 row(s) in 0.0330 seconds

> scan 't2',VERSIONS=>3
ROW                                 COLUMN+CELL
 row2                               column=f1:col1, timestamp=1400525511563, value=123
 row2                               column=f1:col1, timestamp=1400523142819, value=new_r2
 row3                               column=f1:col1, timestamp=1400523142844, value=r3
 row4                               column=f1:col1, timestamp=1400523142867, value=r4
 row5                               column=f1:col1, timestamp=1400523142885, value=r5
 row6                               column=f1:col1, timestamp=1400523142917, value=r6
5 row(s) in 0.0360 seconds

> get 't2','row2','f1:col1'
COLUMN                                 CELL
 f1:col1                               timestamp=1400525511563, value=123
1 row(s) in 0.0160 seconds
scanSimilar as offset+limit:

create 't4','f1'
put 't4','row1','f1:col1','c1'
put 't4','row1','f1:col2','c2'
put 't4','row2','f1:col1','2_col1'
put 't4','row3','f1:col1','3_col1'

>   scan 't4', {COLUMNS => ['f1:col1'], LIMIT => 2, STARTROW => 'row2'}
ROW                    COLUMN+CELL
 row2                  column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                  column=f1:col1, timestamp=1400609307724, value=3_col1
2 row(s) in 0.0190 seconds

Timerange scan(Look at the upper boundary):
> scan 't4', {COLUMNS => 'f1:col1', TIMERANGE => [1400609307664, 1400609307724]}
ROW                           COLUMN+CELL
 row2                         column=f1:col1, timestamp=1400609307691, value=2_col1
1 row(s) in 0.0170 seconds

> scan 't4', {COLUMNS => 'f1:col1', TIMERANGE => [1400609307664, 1400609307724+1]}
ROW                           COLUMN+CELL
 row2                         column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                         column=f1:col1, timestamp=1400609307724, value=3_col1
2 row(s) in 0.0140 seconds

More filters just like where condition:
> scan 't4', {FILTER => "(PrefixFilter ('row') AND (QualifierFilter (>=, 'binary:1'))) AND (TimestampsFilter ( 1400609307642, 1400609307724))"}
ROW                       COLUMN+CELL
 row1                     column=f1:col1, timestamp=1400609307642, value=c1
 row3                     column=f1:col1, timestamp=1400609307724, value=3_col1
2 row(s) in 0.0140 seconds

ColumnPaginationFilter(limit, offset)
can show the 1st,2nd,... column for each row and group them:
> scan 't4'
ROW                              COLUMN+CELL
 row1                            column=f1:col1, timestamp=1400609307642, value=c1
 row1                            column=f1:col2, timestamp=1400609307664, value=c2
 row2                            column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                            column=f1:col1, timestamp=1400609307724, value=3_col1
 row3                            column=f1:col3, timestamp=1400611139601, value=3_col3
 row3                            column=f1:col4, timestamp=1400611193699, value=3_col4
 row4                            column=f1:col1, timestamp=1400616374899, value=4_col1
 row4                            column=f1:col5, timestamp=1400611193718, value=4_col5
 row4                            column=f1:col6, timestamp=1400611194134, value=4_col6
 row5                            column=f1:col1, timestamp=1400616374933, value=5_col1
 row5                            column=f1:col4, timestamp=1400611260500, value=5_col4
 row6                            column=f1:col1, timestamp=1400616374951, value=6_col1
 row6                            column=f1:col4, timestamp=1400611260542, value=6_col4
 row7                            column=f1:col1, timestamp=1400616374970, value=7_col1
 row7                            column=f1:col4, timestamp=1400611260927, value=7_col4
 row8                            column=f1:col1, timestamp=1400616375410, value=8_col1
 row8                            column=f1:col10, timestamp=1400615943929, value=8_col10
 row8                            column=f1:col5, timestamp=1400615943808, value=8_col5
 row8                            column=f1:col6, timestamp=1400615943836, value=8_col6
 row8                            column=f1:col7, timestamp=1400615943856, value=8_col7
 row8                            column=f1:col8, timestamp=1400615943874, value=8_col8
 row8                            column=f1:col9, timestamp=1400615943895, value=8_col9
8 row(s) in 0.0320 seconds

>  scan 't4', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
ROW                              COLUMN+CELL
 row1                            column=f1:col1, timestamp=1400609307642, value=c1
 row2                            column=f1:col1, timestamp=1400609307691, value=2_col1
 row3                            column=f1:col1, timestamp=1400609307724, value=3_col1
 row4                            column=f1:col1, timestamp=1400616374899, value=4_col1
 row5                            column=f1:col1, timestamp=1400616374933, value=5_col1
 row6                            column=f1:col1, timestamp=1400616374951, value=6_col1
 row7                            column=f1:col1, timestamp=1400616374970, value=7_col1
 row8                            column=f1:col1, timestamp=1400616375410, value=8_col1
8 row(s) in 0.0300 seconds

>  scan 't4', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 1)}
ROW                              COLUMN+CELL
 row1                            column=f1:col2, timestamp=1400609307664, value=c2
 row3                            column=f1:col3, timestamp=1400611139601, value=3_col3
 row4                            column=f1:col5, timestamp=1400611193718, value=4_col5
 row5                            column=f1:col4, timestamp=1400611260500, value=5_col4
 row6                            column=f1:col4, timestamp=1400611260542, value=6_col4
 row7                            column=f1:col4, timestamp=1400611260927, value=7_col4
 row8                            column=f1:col10, timestamp=1400615943929, value=8_col10
7 row(s) in 0.0170 seconds
truncatetruncate is actually disable+delete+create.
This can be proved by using debug mode in hbase shell.
hbase shell -d
truncate 't4'
Then you can see below words from debug log:
INFO client.HBaseAdmin: Started disable of t4
DEBUG client.HBaseAdmin: Sleeping= 1000ms, waiting for all regions to be disabled in t4
INFO client.HBaseAdmin: Disabled t4
 - Dropping table...
INFO client.HBaseAdmin: Deleted t4
 - Creating table...
4. HBase surgery tools
5. Cluster replication tools
6. Security tools
==

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization