Hbase Region Split
Please read Apache HBase Region Splitting and Merging firstly.
https://fr.hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
This is a quick explanation on the Hbase Region Split policy.
Regions are the basic element of availability and distribution for tables, and are comprised of a Store per Column Family. The hierarchy of objects is as follows:
Pre-split
Here are 2 predefined Split Algorithm -- HexStringSplit and UniformSplit.1. HexStringSplit
The format of a HexStringSplit region boundary is the ASCII representation of an MD5 checksum, or any other uniformly distributed hexadecimal value. Row are hex-encoded long values in the range "00000000" => "FFFFFFFF" and are left-padded with zeros to keep the same order lexicographically as if they were binary.Sample:
Below command will create a table with 10 regions using HexStringSplit Algorithm:
1
2
| hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1DEBUG util.RegionSplitter: Creating table test_table with 1 column families. Presplitting to 10 regions |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| [root]# hadoop fs -ls /apps/hbase/data/test_tableFound 12 items-rw-r--r-- 3 hbase hadoop 673 2014-05-21 09:54 /apps/hbase/data/test_table/.tableinfo.0000000001drwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/.tmpdrwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/339d0eb61160df679c6ea628ee80b0d6drwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/86da408d174d83aae3fb0bcdb68145c8drwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/b0129aac1ec9f20a6a4ffe27b125cd27drwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/b94f184ee55374ed5d5db71b88a7bc05drwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/c003cd8b2ff3b4a9c6c653ce1a3c0fcedrwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/ca8cc09027606d6c51f189d61fe6eb4fdrwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/d41006f677c222b62695035364c528d6drwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/e8e2f820883ccd5771d1470f3a36b88fdrwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/ef2cb46e051fdcccb070b1e637bb5fd5drwxr-xr-x - hbase hadoop 0 2014-05-21 09:54 /apps/hbase/data/test_table/f7d3a744d584f44a890e398618c85c4f |
1
2
3
4
5
| hadoop fs -cat /apps/hbase/data/test_table/339d0eb61160df679c6ea628ee80b0d6/.regioninfoSTARTKEY => '99999996', ENDKEY => 'b333332f'hadoop fs -cat /apps/hbase/data/test_table/86da408d174d83aae3fb0bcdb68145c8/.regioninfoSTARTKEY => '', ENDKEY => '19999999' |
2. UniformSplit
A SplitAlgorithm that divides the space of possible keys evenly. Useful when the keys are approximately uniform random bytes (e.g. hashes). Rows are raw byte values in the range 00 => FF and are right-padded with zeros to keep the same memcmp() order. This is the natural algorithm to use for a byte[] environment and saves space, but is not necessarily the easiest for readability.Sample:
1
2
| hbase org.apache.hadoop.hbase.util.RegionSplitter test_table3 UniformSplit -c 3 -f f1DEBUG util.RegionSplitter: Creating table test_table3 with 1 column families. Presplitting to 3 regions |
1
2
3
4
5
6
7
| [root@hdm ~]# hadoop fs -ls /apps/hbase/data/test_table3Found 5 items-rw-r--r-- 3 hbase hadoop 675 2014-05-21 14:09 /apps/hbase/data/test_table3/.tableinfo.0000000001drwxr-xr-x - hbase hadoop 0 2014-05-21 14:09 /apps/hbase/data/test_table3/.tmpdrwxr-xr-x - hbase hadoop 0 2014-05-21 14:09 /apps/hbase/data/test_table3/02bcd58dc337bc28fac74ee0e36a11a2drwxr-xr-x - hbase hadoop 0 2014-05-21 14:09 /apps/hbase/data/test_table3/e24e8865d98d52605f166d2e9afa07ebdrwxr-xr-x - hbase hadoop 0 2014-05-21 14:09 /apps/hbase/data/test_table3/e9f130fc2ebbafb20e5ebc45ea3bc7bd |
1
2
3
4
5
6
7
8
| hadoop fs -cat /apps/hbase/data/test_table3/e24e8865d98d52605f166d2e9afa07eb/.regioninfoSTARTKEY => '', ENDKEY => 'UUUUUUUU'hadoop fs -cat /apps/hbase/data/test_table3/02bcd58dc337bc28fac74ee0e36a11a2/.regioninfoSTARTKEY => 'UUUUUUUU', ENDKEY => '\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xAA'hadoop fs -cat /apps/hbase/data/test_table3/e9f130fc2ebbafb20e5ebc45ea3bc7bd/.regioninfoSTARTKEY => '\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xAA', ENDKEY => '' |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| put 'test_table3','1','f1:col1','data_1_col1'# hadoop fs -ls /apps/hbase/data/test_table3/e24e8865d98d52605f166d2e9afa07eb/f1Found 1 items-rw-r--r-- 3 hbase hadoop 697 2014-05-21 14:12 /apps/hbase/data/test_table3/e24e8865d98d52605f166d2e9afa07eb/f1/585ad8fa038c434880848a260160eed2put 'test_table3','2','f1:col1','data_2_col1'# hadoop fs -ls /apps/hbase/data/test_table3/e24e8865d98d52605f166d2e9afa07eb/f1Found 2 items-rw-r--r-- 3 hbase hadoop 697 2014-05-21 14:13 /apps/hbase/data/test_table3/e24e8865d98d52605f166d2e9afa07eb/f1/1d26504ba52444309dc03c0a4ef92283-rw-r--r-- 3 hbase hadoop 697 2014-05-21 14:12 /apps/hbase/data/test_table3/e24e8865d98d52605f166d2e9afa07eb/f1/585ad8fa038c434880848a260160eed2put 'test_table3','zzz','f1:col1','data_zzz_col1'# hadoop fs -ls /apps/hbase/data/test_table3/02bcd58dc337bc28fac74ee0e36a11a2/f1Found 1 items-rw-r--r-- 3 hbase hadoop 705 2014-05-21 15:05 /apps/hbase/data/test_table3/02bcd58dc337bc28fac74ee0e36a11a2/f1/a10e61a74a394e5d93a20eb61372d674 |
3. Desired split points
If you have split points at hand, you can also use the HBase shell, to create the table with the desired split points.Sample:
1
| create 'test_table2', 'f1', {SPLITS => ['a', 'b', 'c']} |
1
2
3
4
5
6
7
8
| # hadoop fs -ls /apps/hbase/data/test_table2/Found 6 items-rw-r--r-- 3 hbase hadoop 675 2014-05-21 13:06 /apps/hbase/data/test_table2/.tableinfo.0000000001drwxr-xr-x - hbase hadoop 0 2014-05-21 13:06 /apps/hbase/data/test_table2/.tmpdrwxr-xr-x - hbase hadoop 0 2014-05-21 13:06 /apps/hbase/data/test_table2/17eb744fc9788cab51f92d4e9ed740d7drwxr-xr-x - hbase hadoop 0 2014-05-21 13:06 /apps/hbase/data/test_table2/b8ef19896ac8e43ab5c050c01f129329drwxr-xr-x - hbase hadoop 0 2014-05-21 13:06 /apps/hbase/data/test_table2/be78b0afc4ba7a4118234630104bfbbddrwxr-xr-x - hbase hadoop 0 2014-05-21 13:06 /apps/hbase/data/test_table2/c9baa85d4d5302d8fa53e807741d323d |
1
2
3
4
5
6
7
8
9
10
11
| hadoop fs -cat /apps/hbase/data/test_table2/c9baa85d4d5302d8fa53e807741d323d/.regioninfoSTARTKEY => '', ENDKEY => 'a'hadoop fs -cat /apps/hbase/data/test_table2/be78b0afc4ba7a4118234630104bfbbd/.regioninfoSTARTKEY => 'a', ENDKEY => 'b'hadoop fs -cat /apps/hbase/data/test_table2/17eb744fc9788cab51f92d4e9ed740d7/.regioninfoSTARTKEY => 'b', ENDKEY => 'c'hadoop fs -cat /apps/hbase/data/test_table2/17eb744fc9788cab51f92d4e9ed740d7/.regioninfoSTARTKEY => 'c', ENDKEY => '' |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| put 'test_table2','a','f1:col1','data_a_col1'# hadoop fs -ls /apps/hbase/data/test_table2/be78b0afc4ba7a4118234630104bfbbd/f1Found 1 items-rw-r--r-- 3 hbase hadoop 697 2014-05-21 13:11 /apps/hbase/data/test_table2/be78b0afc4ba7a4118234630104bfbbd/f1/4c54189cb64f452a98e722a6bfef23b7put 'test_table2','b','f1:col1','data_b_col1'# hadoop fs -ls /apps/hbase/data/test_table2/17eb744fc9788cab51f92d4e9ed740d7/f1Found 1 items-rw-r--r-- 3 hbase hadoop 697 2014-05-21 13:37 /apps/hbase/data/test_table2/17eb744fc9788cab51f92d4e9ed740d7/f1/4159bd0e73dd4dcaad49efbead735851put 'test_table2','123','f1:col1','data_123_col1'# hadoop fs -ls /apps/hbase/data/test_table2/c9baa85d4d5302d8fa53e807741d323d/f1Found 1 items-rw-r--r-- 3 hbase hadoop 705 2014-05-21 13:38 /apps/hbase/data/test_table2/c9baa85d4d5302d8fa53e807741d323d/f1/37c2532ae31a4904ad593887ce9dd70cput 'test_table2','abcd','f1:col1','data_abcd_col1'# hadoop fs -ls /apps/hbase/data/test_table2/be78b0afc4ba7a4118234630104bfbbd/f1Found 2 items-rw-r--r-- 3 hbase hadoop 697 2014-05-21 13:11 /apps/hbase/data/test_table2/be78b0afc4ba7a4118234630104bfbbd/f1/4c54189cb64f452a98e722a6bfef23b7-rw-r--r-- 3 hbase hadoop 709 2014-05-21 13:39 /apps/hbase/data/test_table2/be78b0afc4ba7a4118234630104bfbbd/f1/a27ae4c6a09644f7b7c9c23344a878fc |
Auto Split
Once a region gets to a certain limit, it is automatically split into two regions.Here are 3 predefined Auto Split Algorithm -- ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy, and KeyPrefixRegionSplitPolicy.
1
2
3
4
| hbase.regionserver.region.split.policyA split policy determines when a region should be split. The various other split policies that are available currently are:ConstantSizeRegionSplitPolicy, DisabledRegionSplitPolicy, DelimitedKeyPrefixRegionSplitPolicy, KeyPrefixRegionSplitPolicy etc.Default: org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy |
1. ConstantSizeRegionSplitPolicy
A RegionSplitPolicy implementation which splits a region as soon as any of its store files exceeds a maximum configurable size("hbase.hregion.max.filesize", default =10G).This is the default split policy. From 0.94.0 on the default split policy has changed to IncreasingToUpperBoundRegionSplitPolicy
1
2
3
| hbase.hregion.max.filesizeMaximum HStoreFile size. If any one of a column families' HStoreFiles has grown to exceed this value, the hosting HRegion is split in two.Default: 10737418240 |
2. IncreasingToUpperBoundRegionSplitPolicy
For 0.94:Split size is the number of regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller.
1
2
| Min (R^2 * "hbase.hregion.memstore.flush.size", "hbase.hregion.max.filesize"), where R is the number of regions of the same table hosted on the same region server. |
1
2
3
| hbase.hregion.memstore.flush.sizeMemstore will be flushed to disk if size of the memstore exceeds this number of bytes. Value is checked by a thread that runs every hbase.server.thread.wakefrequency.Default: 134217728 |
For 0.98:Split size is the number of regions that are on this server that all are of the same table, cubed, times 2x the region flush size OR the maximum region split size, whichever is smaller.
1
2
| Min (R^3 * 2 * "hbase.hregion.memstore.flush.size", "hbase.hregion.max.filesize"), where R is the number of regions of the same table hosted on the same region server. |
In all, different versions may have different algorithm.
3. KeyPrefixRegionSplitPolicy
A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a prefix of the row-key This ensures that a region is not split "inside" a prefix of a row key. I.e. rows can be co-located in a region by their prefix."prefix_split_key_policy.prefix_length" attribute of the table defines the prefix length.
Force Split
1
2
3
4
5
6
7
8
| hbase(main):004:0> help 'split'Split entire table or pass a region to split individual region. With thesecond parameter, you can specify an explicit split key for the region.Examples: split 'tableName' split 'regionName' # format: 'tableName,startKey,id' split 'tableName', 'splitKey' split 'regionName', 'splitKey' |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
| create 'testforce','f1'put 'testforce','row1','f1:col1','data1'put 'testforce','row2','f1:col1','data2'put 'testforce','row3','f1:col1','data3'put 'testforce','row4','f1:col1','data4'flush 'testforce'# hadoop fs -ls /apps/hbase/data/testforce/1a146d535be7662bb1102e44961ddb7e/f1Found 1 items-rw-r--r-- 3 hbase hadoop 808 2014-05-21 15:56 /apps/hbase/data/testforce/1a146d535be7662bb1102e44961ddb7e/f1/dbe88fd159324a9499405a8536c66c4b[root@hdm ~]# hadoop fs -ls /apps/hbase/data/testforceFound 3 items-rw-r--r-- 3 hbase hadoop 671 2014-05-21 15:55 /apps/hbase/data/testforce/.tableinfo.0000000001drwxr-xr-x - hbase hadoop 0 2014-05-21 15:55 /apps/hbase/data/testforce/.tmpdrwxr-xr-x - hbase hadoop 0 2014-05-21 15:56 /apps/hbase/data/testforce/1a146d535be7662bb1102e44961ddb7ehbase(main):034:0> split '1a146d535be7662bb1102e44961ddb7e','row2'0 row(s) in 0.0420 seconds# hadoop fs -ls /apps/hbase/data/testforceFound 5 items-rw-r--r-- 3 hbase hadoop 671 2014-05-21 15:55 /apps/hbase/data/testforce/.tableinfo.0000000001drwxr-xr-x - hbase hadoop 0 2014-05-21 15:55 /apps/hbase/data/testforce/.tmpdrwxr-xr-x - hbase hadoop 0 2014-05-21 15:58 /apps/hbase/data/testforce/1a146d535be7662bb1102e44961ddb7edrwxr-xr-x - hbase hadoop 0 2014-05-21 15:58 /apps/hbase/data/testforce/2645e315abec969864d5d5610b004c60drwxr-xr-x - hbase hadoop 0 2014-05-21 15:58 /apps/hbase/data/testforce/439b0a8e5306b24370fd5d61ff1eeb03# hadoop fs -ls /apps/hbase/data/testforceFound 4 items-rw-r--r-- 3 hbase hadoop 671 2014-05-21 15:55 /apps/hbase/data/testforce/.tableinfo.0000000001drwxr-xr-x - hbase hadoop 0 2014-05-21 15:55 /apps/hbase/data/testforce/.tmpdrwxr-xr-x - hbase hadoop 0 2014-05-21 15:59 /apps/hbase/data/testforce/2645e315abec969864d5d5610b004c60drwxr-xr-x - hbase hadoop 0 2014-05-21 15:59 /apps/hbase/data/testforce/439b0a8e5306b24370fd5d61ff1eeb03# hadoop fs -ls /apps/hbase/data/testforce/439b0a8e5306b24370fd5d61ff1eeb03/f1Found 1 items-rw-r--r-- 3 hbase hadoop 715 2014-05-21 15:58 /apps/hbase/data/testforce/439b0a8e5306b24370fd5d61ff1eeb03/f1/3ba626091f7948eb9b19a328fe108716# hadoop fs -ls /apps/hbase/data/testforce/2645e315abec969864d5d5610b004c60/f1Found 1 items-rw-r--r-- 3 hbase hadoop 645 2014-05-21 15:58 /apps/hbase/data/testforce/2645e315abec969864d5d5610b004c60/f1/eb093dbeb36a44c29d049135f0fcbfe8# hadoop fs -cat /apps/hbase/data/testforce/439b0a8e5306b24370fd5d61ff1eeb03/f1/3ba626091f7948eb9b19a328fe108716row2f1col1F data2 row3f1col1F data3 row4f1col1F data4# hadoop fs -cat /apps/hbase/data/testforce/2645e315abec969864d5d5610b004c60/f1/eb093dbeb36a44c29d049135f0fcbfe8row1f1col1F data1 |

Commentaires
Enregistrer un commentaire