How to populate a large hbase table

How to populate a large hbase table

Sometime if you want to create a large sample table in hbase to do some performance test, here is the way:
1. Create a table with pre-split points according to expected size of the table.
create 'mytesttable', 'mycf', { SPLITS => ['200000', '400000', '600000' , '800000'] }
2. Insert as much data as you want using hbase shell.
You can ctrl-c if you want to stop inserting.

for a in '0'..'9' do 
for b in '0'..'9' do 
for c in '0'..'9' do 
for d in '0'..'9' do 
for e in '0'..'9' do 
for f in '0'..'9' do 
for g in '0'..'9' do 
for h in '0'..'9' do 
put 'mytesttable', "#{a}#{b}#{c}#{d}#{e}#{f}#{g}#{h}", "mycf:col1", "data-value-is-#{a}#{b}#{c}#{d}#{e}#{f}#{g}#{h}" 
end end end end end end end end
3. Check the count of table
hbase(main):005:0> count 'mytesttable', CACHE=>1000000 ,INTERVAL => 1000000
Current count: 1000000, row: 00999999
1283384 row(s) in 13.8240 seconds

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization