PostgreSQL TOAST 技术理解

作者：胡彬腾讯云高级工程师

TOAST 是“ The Oversized-Attribute Storage Technique ”的缩写，主要用于存储一个大字段的值。要理解 TOAST ，我们要先理解页（ BLOCK ）的概念。在 PG 中，页是数据在文件存储中的基本单位，其大小是固定的且只能在编译期指定，之后无法修改，默认的大小为8 KB 。同时，PG 不允许一行数据跨页存储，那么对于超长的行数据，PG 就会启动 TOAST ，具体就是采用压缩和切片的方式。如果启用了切片，实际数据存储在另一张系统表的多个行中，这张表就叫 TOAST 表，这种存储方式叫行外存储。

在深入细节之前，我们要先了解，在 PG 中每个表字段有四种 TOAST 的策略：

PLAIN ：避免压缩和行外存储。只有那些不需要 TOAST 策略就能存放的数据类型允许选择（例如 int 类型），而对于 text 这类要求存储长度超过页大小的类型，是不允许采用此策略的
EXTENDED ：允许压缩和行外存储。一般会先压缩，如果还是太大，就会行外存储
EXTERNA ：允许行外存储，但不许压缩。类似字符串这种会对数据的一部分进行操作的字段，采用此策略可能获得更高的性能，因为不需要读取出整行数据再解压。
MAIN ：允许压缩，但不许行外存储。不过实际上，为了保证过大数据的存储，行外存储在其它方式（例如压缩）都无法满足需求的情况下，作为最后手段还是会被启动。因此理解为：尽量不使用行外存储更贴切。现在我们通过实际操作来研究 TOAST 的细节：

首先创建一张 blog 表：

postgres=# create table blog(id int, title text, content text);
CREATE TABLE
postgres=# \d+ blog;
                          Table "public.blog"
 Column  |  Type   | Modifiers | Storage  | Stats target | Description 
---------+---------+-----------+----------+--------------+-------------
 id      | integer |           | plain    |              | 
 title   | text    |           | extended |              | 
 content | text    |           | extended |              |

可以看到，interger 默认 TOAST 策略为 plain ，而 text 为 extended 。PG 资料告诉我们，如果表中有字段需要 TOAST ，那么系统会自动创建一张 TOAST 表负责行外存储，那么这张表在哪里？

postgres=# select relname,relfilenode,reltoastrelid from pg_class where relname='blog';
 relname | relfilenode | reltoastrelid 
---------+-------------+---------------
 blog    |       16441 |         16444
(1 row)

通过上诉语句，我们查到 blog 表的 oid 为16441，其对应 TOAST 表的 oid 为16444（关于 oid 和 pg_class 的概念，请参考PG官方文档），那么其对应 TOAST 表名则为： pg_toast.pg_toast_16441（注意这里是 blog 表的 oid ），我们看下其定义：

postgres=# \d+ pg_toast.pg_toast_16441;
TOAST table "pg_toast.pg_toast_16441"
   Column   |  Type   | Storage 
------------+---------+---------
 chunk_id   | oid     | plain
 chunk_seq  | integer | plain
 chunk_data | bytea   | plain

TOAST 表有3个字段：

chunk_id ：用来表示特定 TOAST 值的 OID ，可以理解为具有同样 chunk_id 值的所有行组成原表（这里的 blog ）的 TOAST 字段的一行数据
chunk_seq ：用来表示该行数据在整个数据中的位置
chunk_data ：实际存储的数据。

现在我们来实际验证下:

postgres=# insert into blog values(1, 'title', '0123456789');
INSERT 0 1
postgres=# select * from blog;
 id | title |  content   
----+-------+------------
  1 | title | 0123456789
(1 row)

postgres=# select * from pg_toast.pg_toast_16441;
 chunk_id | chunk_seq | chunk_data 
----------+-----------+------------
(0 rows)

可以看到因为 content 只有10个字符，所以没有压缩，也没有行外存储。然后我们使用如下 SQL 语句增加 content 的长度，每次增长1倍，同时观察 content 的长度，看看会发生什么情况？

postgres=# update blog set content=content||content where id=1;
UPDATE 1
postgres=# select id,title,length(content) from blog;
 id | title | length 
----+-------+--------
  1 | title |     20
(1 row)
postgres=# select * from pg_toast.pg_toast_16441;
 chunk_id | chunk_seq | chunk_data 
----------+-----------+------------
(0 rows)

反复执行如上过程，直到 pg_toast_16441 表中有数据：

postgres=# select id,title,length(content) from blog;
 id | title | length 
----+-------+--------
  1 | title | 327680
(1 row)

postgres=# select chunk_id,chunk_seq,length(chunk_data) from pg_toast.pg_toast_16441;
 chunk_id | chunk_seq | length 
----------+-----------+--------
    16439 |         0 |   1996
    16439 |         1 |   1773
(2 rows)

可以看到，直到 content 的长度为327680时（已远远超过页大小 8K），对应 TOAST 表中才有了2行数据，且长度都是略小于2K，这是因为 extended 策略下，先启用了压缩，然后才使用行外存储。

下面我们将 content 的 TOAST 策略改为 EXTERNA ，以禁止压缩。

postgres=# alter table blog alter content set storage external;
ALTER TABLE
postgres=# \d+ blog;
                          Table "public.blog"
 Column  |  Type   | Modifiers | Storage  | Stats target | Description 
---------+---------+-----------+----------+--------------+-------------
 id      | integer |           | plain    |              | 
 title   | text    |           | extended |              | 
 content | text    |           | external |              |

然后我们再插入一条数据：

postgres=# insert into blog values(2, 'title', '0123456789');
INSERT 0 1
postgres=# select id,title,length(content) from blog;
 id | title | length 
----+-------+--------
  1 | title | 327680
  2 | title |     10
(2 rows)

然后重复以上步骤，直到TOAST表中产生新的行：

postgres=# update blog set content=content||content where id=2;
UPDATE 1
postgres=# select id,title,length(content) from blog;
 id | title | length 
----+-------+--------
  2 | title |   2560
  1 | title | 327680
(2 rows)

postgres=# select chunk_id,chunk_seq,length(chunk_data) from pg_toast.pg_toast_16441;
 chunk_id | chunk_seq | length 
----------+-----------+--------
    16447 |         0 |   1996
    16447 |         1 |   1773
    16448 |         0 |   1996
    16448 |         1 |    564
(4 rows)

这次我们看到当 content 长度达到2560（按照官方文档，应该是超过2KB左右）， TOAST 表中产生了新的2条 chunk_id 为16448的行，且2行数据的 chunk_data 的长度之和正好等于2560。通过以上操作得出以下结论：