你应该知道的PostgreSQL分区表知识
为什么需要分区表?
首先我们需要思考一个问题,为什么需要分区表?
对,没错,就是因为Big Table的性能差,而且容量有上限,所以才需要
分而治之
。一张表最大的容量是2^32次方的Page页面,而Page默认值是8K,所以一张表最大的上限是32TB,如果你把Page大小改成32KB,则单表大小可以达到128TB。但是在使用分区表的情况下,一个表可以获得2^32个子表,每个子表的大小又能达到2 ^ 32 * 8192字节。所以使用了分区表就不用担心容量上限的问题了。
至于性能差,我们这里可以看一下
2ndquadrant
的一个PPT中关于Big Table的基准测试。
从这两张图可以看出,当大量的insert情况下,如果存在索引,且上面存在自动增长列的时候,表一直增大也没有性能问题。而一旦表上的索引,没有自动增长列处于随机插入的时候,表大小达到6GB后,TPS就急剧下降好几倍,这种下降严重影响性能。
PostgreSQL分区表介绍?
PostgreSQL的分区表有两种形式,一种是
声明式分区
,在PostgreSQL 10版本开始支持。另外一种是
表继承
的方式,是10版本以前主要使用方式。两者的区别就是使用继承的方式需要使用约束来定义分区和规则,还需要使用触发器将数据路由到适合的分区,用户必须编写和维护这些代码。
以下是
表继承
的实现步骤:
- 创建父表,所有分区都将从该表继承。
- 创建几个子表(每个子表代表数据分区),每个表均从父表继承。
- 向分区表添加约束,以定义每个分区中的行值。
-
分别在父表和子表上创建索引。(索引不会从父表传播到子表)。
-
在主表上创建一个合适的触发器函数,以便将插入到父表中的内容重定向到相应的分区表中。
- 创建一个调用该触发器函数的触发器。
注:当子表规则发生变化时,要重新定义触发器函数。
表继承
相当的复杂。而
声明式分区
则简单许多。它的实现步骤如下:
-
使用
PARTITION BY
子句创建一个分区表,该子句包含了分区方法,可以使用Range、List、Hash三种方式。 - 创建分区表后,可以手动创建子分区。
- 然后在子分区上创建索引。
声明式分区
大大简化了
表继承
的若干过程,对用户来说更简单。插入自动化(勿需触发器)和直接对分区表进行操作更加简单和人性化。
下图是各个版本对分区功能的支持,在PostgreSQL 10版本中,声明式分区功能还比较弱,要到11版本声明式分区才会有大幅的提升。
PostgreSQL分区表基本使用
接下来我们来使用一下PostgreSQL的分区表功能,我们采用的是
声明式分区
方式。
List分区
列表分区明确指定根据某字段的某个具体值进行分区,默认分区(可选值)保存不属于任何指定分区的列表值。
CREATE TABLE students (id INTEGER, status character varying(30), name character varying(30)) PARTITION BY LIST(status);
CREATE TABLE stu_active PARTITION OF students FOR VALUES IN ('ACTIVE');
CREATE TABLE stu_exp PARTITION OF students FOR VALUES IN ('EXPIRED');
CREATE TABLE stu_others PARTITION OF students DEFAULT;
postgres=# \d+ students
Partitioned table "public.students"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
name | character varying(30) | | | | extended | |
Partition key: LIST (status)
Partitions: stu_active FOR VALUES IN ('ACTIVE'),
stu_exp FOR VALUES IN ('EXPIRED'),
stu_others DEFAULT
postgres=# \d+ stu_active
Table "public.stu_active"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
name | character varying(30) | | | | extended | |
Partition of: students FOR VALUES IN ('ACTIVE')
Partition constraint: ((status IS NOT NULL) AND ((status)::text = 'ACTIVE'::character varying(30)))
Access method: heap
postgres=# INSERT INTO students VALUES (1,'ACTIVE','zhangyit'), (2,'EXPIRED','lias'), (3,'EXPIRED','wangwei'), (4,'REACTIVATED','liulah');
postgres=# select tableoid::regclass,* from students;
tableoid | id | status | name
------------+----+-------------+----------
stu_active | 1 | ACTIVE | zhangyit
stu_exp | 2 | EXPIRED | lias
stu_exp | 3 | EXPIRED | wangwei
stu_others | 4 | REACTIVATED | liulah
Range分区
范围分区就是根据某个字段值的范围来进行分区。
CREATE TABLE students (id INTEGER, status character varying(30), grade INTEGER) PARTITION BY RANGE(grade);
CREATE TABLE stu_fail PARTITION OF students FOR VALUES FROM (MINVALUE) TO (60);
CREATE TABLE stu_pass PARTITION OF students FOR VALUES FROM (60) TO (75);
CREATE TABLE stu_excellent PARTITION OF students FOR VALUES FROM (75) TO (MAXVALUE);
postgres=# \d+ students
Partitioned table "public.students"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition key: RANGE (grade)
Partitions: stu_excellent FOR VALUES FROM (75) TO (MAXVALUE),
stu_fail FOR VALUES FROM (MINVALUE) TO (60),
stu_pass FOR VALUES FROM (60) TO (75)
postgres=# \d+ stu_fail
Table "public.stu_fail"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition of: students FOR VALUES FROM (MINVALUE) TO (60)
Partition constraint: ((grade IS NOT NULL) AND (grade < 60))
Access method: heap
postgres=# \d+ stu_pass
Table "public.stu_pass"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition of: students FOR VALUES FROM (60) TO (75)
Partition constraint: ((grade IS NOT NULL) AND (grade >= 60) AND (grade < 75))
Access method: heap
postgres=# \d+ stu_excellent
Table "public.stu_excellent"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition of: students FOR VALUES FROM (75) TO (MAXVALUE)
Partition constraint: ((grade IS NOT NULL) AND (grade >= 75))
Access method: heap
postgres=# INSERT INTO students VALUES (1,'ACTIVE',58), (2,'ACTIVE',60), (3,'ACTIVE',75), (4,'ACTIVE',90);
INSERT 0 4
postgres=# select tableoid::regclass,* from students;
tableoid | id | status | grade
---------------+----+--------+-------
stu_fail | 1 | ACTIVE | 58
stu_pass | 2 | ACTIVE | 60
stu_excellent | 3 | ACTIVE | 75
stu_excellent | 4 | ACTIVE | 90
注意:minvalue的上限值是小于,不是小于等于。而maxvalue的区间的下限是大于等于,不是大于。
Hash分区
通过对每个分区使用取模和余数来创建hash分区,modulus指定了对N取模,而remainder指定了除完后的余数。
CREATE TABLE students (id INTEGER, status character varying(30), grade INTEGER) PARTITION BY HASH(id);
CREATE TABLE stu_part1 PARTITION OF students FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE stu_part2 PARTITION OF students FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE stu_part3 PARTITION OF students FOR VALUES WITH (modulus 3, remainder 2);
postgres=# \d+ stu_part1
Table "public.stu_part1"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition of: students FOR VALUES WITH (modulus 3, remainder 0)
Partition constraint: satisfies_hash_partition('20026077'::oid, 3, 0, id)
Access method: heap
postgres=# \d+ stu_part2
Table "public.stu_part2"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition of: students FOR VALUES WITH (modulus 3, remainder 1)
Partition constraint: satisfies_hash_partition('20026077'::oid, 3, 1, id)
Access method: heap
postgres-# \d+ stu_part3
Table "public.stu_part3"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition of: students FOR VALUES WITH (modulus 3, remainder 2)
Partition constraint: satisfies_hash_partition('20026077'::oid, 3, 2, id)
Access method: heap
postgres=# INSERT INTO students VALUES (1,'ACTIVE',58), (2,'ACTIVE',60), (3,'ACTIVE',75), (4,'ACTIVE',90);
INSERT 0 4
postgres=# select tableoid::regclass,* from students;s
tableoid | id | status | grade
-----------+----+--------+-------
stu_part1 | 2 | ACTIVE | 60
stu_part1 | 4 | ACTIVE | 90
stu_part2 | 3 | ACTIVE | 75
stu_part3 | 1 | ACTIVE | 58
我们还可以建立子分区,也就是多级分区,使用PostgreSQL声明性分区的创建子分区方法有:LIST-LIST,LIST-RANGE,LIST-HASH,RANGE-RANGE,RANGE-LIST,RANGE-HASH,HASH-HASH,HASH-LIST和HASH-RANGE。
分区表转换成普通表
我们可以使用
DETACH PARTITION
命令把分区表转换成普通表。
postgres=# alter table students detach partition stu_part3;
ALTER TABLE
postgres=# \d+ students
Partitioned table "public.students"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition key: HASH (id)
Partitions: stu_part1 FOR VALUES WITH (modulus 3, remainder 0),
stu_part2 FOR VALUES WITH (modulus 3, remainder 1)
postgres=# \d+ stu_part3
Table "public.stu_part3"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Access method: heap
普通表转换成分区表
我们也可以把某个普通表在附加到分区表上去,使用
attach partition
命令。
postgres=# alter table students attach partition stu_part3 FOR VALUES WITH (modulus 3, remainder 2);
ALTER TABLE
postgres=# \d+ students
Partitioned table "public.students"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition key: HASH (id)
Partitions: stu_part1 FOR VALUES WITH (modulus 3, remainder 0),
stu_part2 FOR VALUES WITH (modulus 3, remainder 1),
stu_part3 FOR VALUES WITH (modulus 3, remainder 2)
postgres=# \d+ stu_part3
Table "public.stu_part3"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
status | character varying(30) | | | | extended | |
grade | integer | | | | plain | |
Partition of: students FOR VALUES WITH (modulus 3, remainder 2)
Partition constraint: satisfies_hash_partition('20026077'::oid, 3, 2, id)
Access method: heap
附加上去的表必须具有相同的列/类型,不能存在额外的列。
在运行ATTACH PARTITION
命令之前,建议在要附加的表上创建一个与所需分区约束匹配的约束。这样,系统将能够跳过扫描以验证隐式分区约束。如果没有约束,则将扫描附加表以验证分区约束正确性,同时对该分区持有ACCESS EXCLUSIVE
锁定,并在父表上持有SHARE UPDATE EXCLUSIVE
锁。 在完成ATTACH PARTITION
后,可能需要删除冗余CHECK约束。
分区裁剪
分区裁剪就是当已知分区不包含我们要的数据时,在查询的时候就不扫描这个分区,以获得更高的性能。在PostgreSQL 10中,这种消除是通过
constraint_exclusion
机制来实现的,这种机制是一种线性算法,需要一个一个地查看每个分区的元数据,以检查该分区是否与查询WHERE子句匹配,很显然这种效率并不高。而在PostgreSQL 11版本引入了新的算法,不再是详尽的搜索,而使用二分查找法搜索,可以快速识别匹配的LIST和RANGE分区,而HASH分区则通过散列函数查找匹配的分区。
PostgreSQL 11版本该功能通过参数
enable_partition_pruning
进行控制。默认是打开的。
postgres=# explain analyze select * from students where id in (1,2);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Append (cost=0.00..37.82 rows=14 width=86) (actual time=0.048..0.066 rows=2 loops=1)
-> Seq Scan on stu_part1 (cost=0.00..18.88 rows=7 width=86) (actual time=0.046..0.049 rows=1 loops=1)
Filter: (id = ANY ('{1,2}'::integer[]))
Rows Removed by Filter: 1
-> Seq Scan on stu_part3 (cost=0.00..18.88 rows=7 width=86) (actual time=0.013..0.014 rows=1 loops=1)
Filter: (id = ANY ('{1,2}'::integer[]))
Planning Time: 0.309 ms
Execution Time: 0.118 ms
(8 rows)
postgres=# set enable_partition_pruning=off;
postgres=# explain analyze select * from students where id in (1,2);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Append (cost=0.00..56.73 rows=21 width=86) (actual time=0.054..0.070 rows=2 loops=1)
-> Seq Scan on stu_part1 (cost=0.00..18.88 rows=7 width=86) (actual time=0.049..0.052 rows=1 loops=1)
Filter: (id = ANY ('{1,2}'::integer[]))
Rows Removed by Filter: 1
-> Seq Scan on stu_part2 (cost=0.00..18.88 rows=7 width=86) (actual time=0.006..0.006 rows=0 loops=1)
Filter: (id = ANY ('{1,2}'::integer[]))
Rows Removed by Filter: 1
-> Seq Scan on stu_part3 (cost=0.00..18.88 rows=7 width=86) (actual time=0.005..0.006 rows=1 loops=1)
Filter: (id = ANY ('{1,2}'::integer[]))
Planning Time: 0.322 ms
Execution Time: 0.133 ms
(11 rows)
参数为on的时候,只扫描了2个分区。而将参数改成off之后,会对所有表分区进行扫描。
Run-time 分区剪裁
PostgreSQL 11还有一项新技术就是Run-time分区剪裁,就是在执行时对分区进行裁剪。在某些情况下,这是一个优点。例如PREPARE查询,子查询的值、嵌套循环连接的内层参数值,这些都只有在SQL运行的时候才能知道具体的值。Run-time 分区剪裁发生有以下两种类型。
第一类,就是执行期间参数不变的情况。
下面案例将参数$ 1设置为70,并且该值在整个查询执行过程中不会更改,初始化Append节点时会进行剪裁。
[postgres@paas-telepg-test3 ~]$ psql
psql (12.3)
Type "help" for help.
postgres=# PREPARE a1 (INT) AS SELECT * FROM students WHERE grade<$1;
PREPARE
postgres=# explain analyze execute a1(70);
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Append (cost=0.00..40.12 rows=474 width=86) (actual time=0.012..0.012 rows=0 loops=1)
-> Seq Scan on stu_fail (cost=0.00..18.88 rows=237 width=86) (actual time=0.007..0.007 rows=0 loops=1)
Filter: (grade < 70)
-> Seq Scan on stu_pass (cost=0.00..18.88 rows=237 width=86) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (grade < 70)
Planning Time: 1.636 ms
Execution Time: 0.128 ms
(7 rows)
第二类,就是执行期间值可能动态发生变化,例如子查询可能查到多个值,因此在运行期间每次参数值更改时都将执行修剪。如果要排除分区,则该子计划的EXPLAIN输出中将显示(never executed)。
postgres=# explain analyze SELECT * FROM students WHERE id = (select 3);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Append (cost=0.01..56.70 rows=12 width=86) (actual time=0.046..0.048 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=1)
-> Seq Scan on stu_part1 (cost=0.00..18.88 rows=4 width=86) (never executed)
Filter: (id = $0)
-> Seq Scan on stu_part2 (cost=0.00..18.88 rows=4 width=86) (actual time=0.026..0.028 rows=1 loops=1)
Filter: (id = $0)
-> Seq Scan on stu_part3 (cost=0.00..18.88 rows=4 width=86) (never executed)
Filter: (id = $0)
Planning Time: 0.387 ms
Execution Time: 0.125 ms
(11 rows)
根据
2ndquadrant
提供的Run-time分区剪裁的基准测试结果,在分区数量少的情况下,PG12的性能要明显好于11,但是分区达到一定的数量之后,PG12的性能也会下降。
enable_partitionwise_join
允许查询优化器使用面向分区的连接,当分区表和分区表进行连接的时候,直接让分区和分区进行连接。面向分区的连接当前只适用于连接条件包括所有分区键的情况,连接条件必须是相同的数据类型并且子分区集合要完全匹配。由于面向分区的连接在规划期间会使用一些CPU Time和内存,所以默认值为off。
接下来我们来造点数据测试一下这个功能,分别创建a1和a2两张分区表,造300万数据。
CREATE TABLE a1 (a int primary key, b text) PARTITION BY RANGE (a);
CREATE TABLE a1_1 PARTITION OF a1 FOR VALUES FROM (1) TO (1000001);
CREATE TABLE a1_2 PARTITION OF a1 FOR VALUES FROM (1000001) TO (2000001);
CREATE TABLE a1_3 PARTITION OF a1 FOR VALUES FROM (2000001) TO (3000001);
CREATE TABLE a2 (c int primary key, d text) PARTITION BY RANGE (c);
CREATE TABLE a2_1 PARTITION OF a2 FOR VALUES FROM (1) TO (1000001);
CREATE TABLE a2_2 PARTITION OF a2 FOR VALUES FROM (1000001) TO (2000001);
CREATE TABLE a2_3 PARTITION OF a2 FOR VALUES FROM (2000001) TO (3000001);