How to populate a large hbase table

mai 24, 2017

How to populate a large hbase table

Sometime if you want to create a large sample table in hbase to do some performance test, here is the way:
1. Create a table with pre-split points according to expected size of the table.

create 'mytesttable', 'mycf', { SPLITS => ['200000', '400000', '600000' , '800000'] }

2. Insert as much data as you want using hbase shell.
You can ctrl-c if you want to stop inserting.

for a in '0'..'9' do 
for b in '0'..'9' do 
for c in '0'..'9' do 
for d in '0'..'9' do 
for e in '0'..'9' do 
for f in '0'..'9' do 
for g in '0'..'9' do 
for h in '0'..'9' do 
put 'mytesttable', "#{a}#{b}#{c}#{d}#{e}#{f}#{g}#{h}", "mycf:col1", "data-value-is-#{a}#{b}#{c}#{d}#{e}#{f}#{g}#{h}" 
end end end end end end end end

3. Check the count of table

hbase(main):005:0> count 'mytesttable', CACHE=>1000000 ,INTERVAL => 1000000
Current count: 1000000, row: 00999999
1283384 row(s) in 13.8240 seconds

Rechercher dans ce blog

Big data

How to populate a large hbase table

How to populate a large hbase table

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization