Hands on Hbase shell

Hands on Hbase shell

Hands on Hbase shell

1. General

Command	Sample
status	Show cluster status summary. Below 2 are the same. hbase(main):002:0> status 3 servers, 0 dead, 1.6667 average load hbase(main):006:0> status 'summary' 3 servers, 0 dead, 1.6667 average load
	Show region summary for each region server. hbase(main):008:0> status 'simple' 3 live servers hdw2.openkb.com:60020 1397595530014 requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=40, maxHeapMB=995 hdw1.openkb.com:60020 1397595528761 requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=50, maxHeapMB=995 hdw3.openkb.com:60020 1397595529985 requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=32, maxHeapMB=995 0 dead servers Aggregate load: 0, regions: 5
	Show each region details for each table. hbase(main):007:0> status 'detailed' version 0.94.8 0 regionsInTransition master coprocessors: [] 3 live servers hdw2.openkb.com:60020 1397595530014 requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=39, maxHeapMB=995 test2,,1397857356085.5f07460cd224b77eb71e4386d054dde6. numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN hdw1.openkb.com:60020 1397595528761 requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=48, maxHeapMB=995 -ROOT-,,0 numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=33, writeRequestsCount=1, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=10, currentCompactedKVs=10, compactionProgressPct=1.0 .META.,,1 numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=8900, writeRequestsCount=7, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=11, currentCompactedKVs=11, compactionProgressPct=1.0 test3,,1400011790458.49daae95b96cceb1aafd595e0078bcde. numberOfStores=3, numberOfStorefiles=2, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=22, writeRequestsCount=7, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN hdw3.openkb.com:60020 1397595529985 requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=31, maxHeapMB=995 test,,1397754779440.4fc6d9e31d226b5b213dd1c28f06f48e. numberOfStores=1, numberOfStorefiles=1, storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0, readRequestsCount=2, writeRequestsCount=0, rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN 0 dead servers
version	Show Hbase version and release information. hbase(main):010:0> version 0.94.8-xxx-x.x.x.x, rUnknown, Thu Oct 24 15:43:56 CST 2013
whoami	Show OS user and authentication method. hbase(main):012:0> whoami root (auth:SIMPLE)

2. Table Management

alter
create
describe
disable
disable_all
is_disabled
drop
drop_all
enable
enable_all
is_enabled
exists
list
show_filters
alter_status
alter_async

3. Data Manipulation

get	Prepare the table with 3 versions for the same row and column. create 't1','f1' put 't1','row1','f1:col1','1' put 't1','row1','f1:col1','2' put 't1','row1','f1:col1','3' > scan 't1', {VERSIONS => 3} ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400281044882, value=3 row1 column=f1:col1, timestamp=1400281044435, value=2 row1 column=f1:col1, timestamp=1400281044395, value=1 1 row(s) in 0.0260 seconds Get the latest version in a time range: (Note, the latest version is not showed. If you want the latest version, upper bound should be 1400281044882+1 ) > get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882]} COLUMN CELL f1:col1 timestamp=1400281044435, value=2 1 row(s) in 0.0100 seconds
	Get the version at that timestamp: > get 't1', 'row1', {TIMESTAMP => 1400281044435} COLUMN CELL f1:col1 timestamp=1400281044435, value=2 1 row(s) in 0.0090 seconds
	Get multiple versions in a time range: > get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882], VERSIONS => 3} COLUMN CELL f1:col1 timestamp=1400281044435, value=2 f1:col1 timestamp=1400281044395, value=1 2 row(s) in 0.0100 seconds > get 't1', 'row1', {TIMERANGE => [1400281044395, 1400281044882+1], VERSIONS => 3} COLUMN CELL f1:col1 timestamp=1400281044882, value=3 f1:col1 timestamp=1400281044435, value=2 f1:col1 timestamp=1400281044395, value=1 3 row(s) in 0.0110 seconds
	Add a filter to a Get, similar as where condition in SQL world: > get 't1', 'row1', {FILTER => "ValueFilter(=, 'binary:2')"} COLUMN CELL f1:col1 timestamp=1400281044435, value=2 1 row(s) in 0.0130 seconds
count	Prepare a table with one row deleted. create 't2','f1' put 't2','row1','f1:col1','1' put 't2','row1','f1:col1','2' put 't2','row1','f1:col1','3' put 't2','row1','f1:col1','4' put 't2','row2','f1:col1','r2' put 't2','row3','f1:col1','r3' put 't2','row4','f1:col1','r4' put 't2','row5','f1:col1','r5' put 't2','row6','f1:col1','r6' put 't2','row7','f1:col1','r7' delete 't2', 'row7', 'f1:col1' count can only see the live rows, not including deleted rows. "Raw Scan" does not work. > count 't2', INTERVAL => 2, CACHE => 1 Current count: 2, row: row2 Current count: 4, row: row4 Current count: 6, row: row6 6 row(s) in 0.0250 seconds > count 't2', INTERVAL => 2, CACHE => 1, RAW=> true Current count: 2, row: row2 Current count: 4, row: row4 Current count: 6, row: row6 6 row(s) in 0.0180 seconds
delete	delete will remove versions equal to or older than a timestamp. > scan 't1', {VERSIONS => 3} ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400281044882, value=3 row1 column=f1:col1, timestamp=1400281044435, value=2 row1 column=f1:col1, timestamp=1400281044395, value=1 1 row(s) in 0.0260 seconds > delete 't1', 'row1', 'f1:col1', 1400281044435 0 row(s) in 0.0290 seconds > scan 't1', {VERSIONS => 3} ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400281044882, value=3 1 row(s) in 0.0110 seconds
deleteall	deleteall can delete one row entirely. It can also act as "delete" to delete one single version. > scan 't2',VERSIONS=>3 ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400522715837, value=4 row1 column=f1:col1, timestamp=1400522696025, value=3 row1 column=f1:col1, timestamp=1400522695653, value=2 row2 column=f1:col1, timestamp=1400523142819, value=r2 row3 column=f1:col1, timestamp=1400523142844, value=r3 row4 column=f1:col1, timestamp=1400523142867, value=r4 row5 column=f1:col1, timestamp=1400523142885, value=r5 row6 column=f1:col1, timestamp=1400523142917, value=r6 6 row(s) in 0.0460 seconds deleteall 't2','row1','f1:col1',1400522695653 0 row(s) in 0.0130 seconds > scan 't2',VERSIONS=>3 ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400522715837, value=4 row1 column=f1:col1, timestamp=1400522696025, value=3 row2 column=f1:col1, timestamp=1400523142819, value=r2 row3 column=f1:col1, timestamp=1400523142844, value=r3 row4 column=f1:col1, timestamp=1400523142867, value=r4 row5 column=f1:col1, timestamp=1400523142885, value=r5 row6 column=f1:col1, timestamp=1400523142917, value=r6 6 row(s) in 0.0210 seconds > deleteall 't2','row1' 0 row(s) in 0.0130 seconds > scan 't2',VERSIONS=>3 ROW COLUMN+CELL row2 column=f1:col1, timestamp=1400523142819, value=r2 row3 column=f1:col1, timestamp=1400523142844, value=r3 row4 column=f1:col1, timestamp=1400523142867, value=r4 row5 column=f1:col1, timestamp=1400523142885, value=r5 row6 column=f1:col1, timestamp=1400523142917, value=r6 5 row(s) in 0.0170 seconds
incr get_ counter	COUNTERS feature which can treat columns as counters. Very lightweight. Prepare the table: create 't3','f1' put 't3','row1','f1:col1',0 incr can not increment field that isn't 64 bits wide: > incr 't3', 'row1', 'f1:col1' ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Attempted to increment field that isn't 64 bits
	incr can create a new row with 64bits > incr 't3', 'row2', 'f1:col1' COUNTER VALUE = 1 > scan 't3' ROW COLUMN+CELL row column=f1:col1, timestamp=1400530005533, value=0 row2 column=f1:col1, timestamp=1400530215183, value=\x00\x00\x00\x00\x00\x00\x00\x01 2 row(s) in 0.0640 seconds > get_counter 't3','row2','f1:col1' COUNTER VALUE = 1 > incr 't3', 'row2', 'f1:col1',10 COUNTER VALUE = 11
put	scan can see the history; put can change the history! > scan 't2',VERSIONS=>3 ROW COLUMN+CELL row2 column=f1:col1, timestamp=1400525511563, value=123 row2 column=f1:col1, timestamp=1400523142819, value=r2 row3 column=f1:col1, timestamp=1400523142844, value=r3 row4 column=f1:col1, timestamp=1400523142867, value=r4 row5 column=f1:col1, timestamp=1400523142885, value=r5 row6 column=f1:col1, timestamp=1400523142917, value=r6 5 row(s) in 0.0370 seconds > put 't2','row2','f1:col1', 'new_r2', 1400523142819 0 row(s) in 0.0330 seconds > scan 't2',VERSIONS=>3 ROW COLUMN+CELL row2 column=f1:col1, timestamp=1400525511563, value=123 row2 column=f1:col1, timestamp=1400523142819, value=new_r2 row3 column=f1:col1, timestamp=1400523142844, value=r3 row4 column=f1:col1, timestamp=1400523142867, value=r4 row5 column=f1:col1, timestamp=1400523142885, value=r5 row6 column=f1:col1, timestamp=1400523142917, value=r6 5 row(s) in 0.0360 seconds > get 't2','row2','f1:col1' COLUMN CELL f1:col1 timestamp=1400525511563, value=123 1 row(s) in 0.0160 seconds
scan	Similar as offset+limit: create 't4','f1' put 't4','row1','f1:col1','c1' put 't4','row1','f1:col2','c2' put 't4','row2','f1:col1','2_col1' put 't4','row3','f1:col1','3_col1' > scan 't4', {COLUMNS => ['f1:col1'], LIMIT => 2, STARTROW => 'row2'} ROW COLUMN+CELL row2 column=f1:col1, timestamp=1400609307691, value=2_col1 row3 column=f1:col1, timestamp=1400609307724, value=3_col1 2 row(s) in 0.0190 seconds
	Timerange scan(Look at the upper boundary): > scan 't4', {COLUMNS => 'f1:col1', TIMERANGE => [1400609307664, 1400609307724]} ROW COLUMN+CELL row2 column=f1:col1, timestamp=1400609307691, value=2_col1 1 row(s) in 0.0170 seconds > scan 't4', {COLUMNS => 'f1:col1', TIMERANGE => [1400609307664, 1400609307724+1]} ROW COLUMN+CELL row2 column=f1:col1, timestamp=1400609307691, value=2_col1 row3 column=f1:col1, timestamp=1400609307724, value=3_col1 2 row(s) in 0.0140 seconds
	More filters just like where condition: > scan 't4', {FILTER => "(PrefixFilter ('row') AND (QualifierFilter (>=, 'binary:1'))) AND (TimestampsFilter ( 1400609307642, 1400609307724))"} ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400609307642, value=c1 row3 column=f1:col1, timestamp=1400609307724, value=3_col1 2 row(s) in 0.0140 seconds
	ColumnPaginationFilter(limit, offset) can show the 1st,2nd,... column for each row and group them: > scan 't4' ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400609307642, value=c1 row1 column=f1:col2, timestamp=1400609307664, value=c2 row2 column=f1:col1, timestamp=1400609307691, value=2_col1 row3 column=f1:col1, timestamp=1400609307724, value=3_col1 row3 column=f1:col3, timestamp=1400611139601, value=3_col3 row3 column=f1:col4, timestamp=1400611193699, value=3_col4 row4 column=f1:col1, timestamp=1400616374899, value=4_col1 row4 column=f1:col5, timestamp=1400611193718, value=4_col5 row4 column=f1:col6, timestamp=1400611194134, value=4_col6 row5 column=f1:col1, timestamp=1400616374933, value=5_col1 row5 column=f1:col4, timestamp=1400611260500, value=5_col4 row6 column=f1:col1, timestamp=1400616374951, value=6_col1 row6 column=f1:col4, timestamp=1400611260542, value=6_col4 row7 column=f1:col1, timestamp=1400616374970, value=7_col1 row7 column=f1:col4, timestamp=1400611260927, value=7_col4 row8 column=f1:col1, timestamp=1400616375410, value=8_col1 row8 column=f1:col10, timestamp=1400615943929, value=8_col10 row8 column=f1:col5, timestamp=1400615943808, value=8_col5 row8 column=f1:col6, timestamp=1400615943836, value=8_col6 row8 column=f1:col7, timestamp=1400615943856, value=8_col7 row8 column=f1:col8, timestamp=1400615943874, value=8_col8 row8 column=f1:col9, timestamp=1400615943895, value=8_col9 8 row(s) in 0.0320 seconds > scan 't4', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} ROW COLUMN+CELL row1 column=f1:col1, timestamp=1400609307642, value=c1 row2 column=f1:col1, timestamp=1400609307691, value=2_col1 row3 column=f1:col1, timestamp=1400609307724, value=3_col1 row4 column=f1:col1, timestamp=1400616374899, value=4_col1 row5 column=f1:col1, timestamp=1400616374933, value=5_col1 row6 column=f1:col1, timestamp=1400616374951, value=6_col1 row7 column=f1:col1, timestamp=1400616374970, value=7_col1 row8 column=f1:col1, timestamp=1400616375410, value=8_col1 8 row(s) in 0.0300 seconds > scan 't4', {FILTER =>org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 1)} ROW COLUMN+CELL row1 column=f1:col2, timestamp=1400609307664, value=c2 row3 column=f1:col3, timestamp=1400611139601, value=3_col3 row4 column=f1:col5, timestamp=1400611193718, value=4_col5 row5 column=f1:col4, timestamp=1400611260500, value=5_col4 row6 column=f1:col4, timestamp=1400611260542, value=6_col4 row7 column=f1:col4, timestamp=1400611260927, value=7_col4 row8 column=f1:col10, timestamp=1400615943929, value=8_col10 7 row(s) in 0.0170 seconds
truncate	truncate is actually disable+delete+create. This can be proved by using debug mode in hbase shell. hbase shell -d truncate 't4' Then you can see below words from debug log: INFO client.HBaseAdmin: Started disable of t4 DEBUG client.HBaseAdmin: Sleeping= 1000ms, waiting for all regions to be disabled in t4 INFO client.HBaseAdmin: Disabled t4 - Dropping table... INFO client.HBaseAdmin: Deleted t4 - Creating table...

4. HBase surgery tools
5. Cluster replication tools
6. Security tools
==

Commentaires