Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Дата
Msg-id CALj2ACX9vVYHYkX8e6w058EuAs8JL5EsnzadTxGhpiE_Ep_ByA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: New Table Access Methods for Multi and Single Inserts  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.

I added table_modify_begin, table_modify_buffer_insert, table_modify_buffer_flush and table_modify_end. Table Access Method (AM) authors now can define their own buffering strategy and flushing decisions based on their tuple storage kinds and various other AM specific factors. I also added a default implementation that falls back to single inserts when no implementation is provided for these AM by AM authors. See the attached v19-0001 patch.

> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)

I added a flush callback named TableModifyBufferFlushCallback; when provided by callers invoked after tuples are flushed to disk from the buffers but before the AM frees them up. Index insertions and AFTER ROW INSERT triggers can be executed in this callback. See the v19-0001 patch for how AM invokes the flush callback, and see either v19-0003 or v19-0004 or v19-0005 for how a caller can supply the callback and required context to execute index insertions and AR triggers.

> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).

I haven't thought about these things yet. I can only focus on them after seeing how the attached patches go from here.

> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.

Heap implementation for thes new Table AMs uses a separate memory context for all of the operations. Please have a look and let me know if we need anything more.

> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.

I used multi inserts for all of these including TABLE REWRITE commands such as ALTER TABLE. See the attached v19-0002 patch. Check the testing section below for benefits.

FWIW, following are some of the TABLE REWRITE commands that can get benefitted:

ALTER TABLE tbl ALTER c1 TYPE bigint;
ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ALTER TABLE itest3 ALTER COLUMN a TYPE int;
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
and so on.

> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.

I modified the COPY FROM code to use the new Table AMs, and performed some tests which show no signs of regression. Check the testing section below for more details. See the attached v19-0005 patch. With this, table_multi_insert can be deprecated.

> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.

I did that in v19-0003. I did place the table_modify_end call in multiple places including ExecEndModifyTable. I didn't find any issues with it. Please have a look and let me know if we need the end call in more places. Check the testing section below for benefits.

> 10. Use these new methods for logical apply.

I used multi inserts for Logical Replication apply. in v19-0004. Check the testing section below for benefits.

FWIW, open-source pglogical does have multi insert support, check code around https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960.

> 11. Deprecate the multi_insert API.

I did remove both table_multi_insert and table_finish_bulk_insert in v19-0006. Perhaps, removing them isn't a great idea, but adding a deprecation WARNING/ERROR until some more PG releases might be worth looking at.

> Thoughts on this plan? Does your patch make sense in v17 as a stepping
> stone, or should we try to make all of these API changes together in
> v18?

If the design, code and benefits that these new Table AMs bring to the table look good, I hope to see it for PG 18.

> Also, a sample AM code would be a huge benefit here. Writing a real AM
> is hard, but perhaps we can at least have an example one to demonstrate
> how to use these APIs?

The attached patches already have implemented these new Table AMs for Heap. I don't think we need a separate implementation to demonstrate. If others feel so, I'm open to thoughts here.

Having said above, I'd like to reiterate the motivation behind the new Table AMs for multi and single inserts.

1. A scan-like API with state being carried across is thought to be better as suggested by Andres Freund - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx@alap3.anarazel.de.
2. Allowing a Table AM to optimize operations across multiple inserts, define its own buffering strategy and take its own flushing decisions based on their tuple storage kinds and various other AM specific factors.
3. Improve performance of various SQL commands with multi inserts for Heap AM.

The attached v19 patches might need some more detailed comments, some documentation and some specific tests ensuring the multi inserts for Heap are kicked-in for various commands. I'm open to thoughts here.

I did some testing to see how various commands benefit with multi inserts using these new Table AM for heap. It's not only the improvement in performance these commands see, but also the amount of WAL that gets generated reduces greatly. After all, multi inserts optimize the insertions by writing less WAL. IOW, writing WAL record per page if multiple rows fit into a single data page as opposed to WAL record per row.

Test case 1: 100 million rows, 2 columns (int and float)

Command                        | HEAD (sec) | PATCHED (sec) | Faster by % | Faster by X
------------------------------ | ---------- | ------------- | ----------- | -----------
CREATE TABLE AS                | 121        | 77            | 36.3        | 1.57
CREATE MATERIALIZED VIEW       | 101        | 49            | 51.4        | 2.06
REFRESH MATERIALIZED VIEW      | 113        | 54            | 52.2        | 2.09
ALTER TABLE (TABLE REWRITE)    | 124        | 81            | 34.6        | 1.53
COPY FROM                      | 71         | 72            | 0           | 1
INSERT INTO ... SELECT         | 117        | 62            | 47          | 1.88
LOGICAL REPLICATION APPLY      | 393        | 306           | 22.1        | 1.28

Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) | Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- | ------------ | -----------
CREATE TABLE AS                | 6.8              | 2.4                 | 64.7         | 2.83
CREATE MATERIALIZED VIEW       | 7.2              | 2.3                 | 68           | 3.13
REFRESH MATERIALIZED VIEW      | 10               | 5.1                 | 49           | 1.96
ALTER TABLE (TABLE REWRITE)    | 8                | 3.2                 | 60           | 2.5
COPY FROM                      | 2.9              | 3                   | 0            | 1
INSERT INTO ... SELECT         | 8                | 3                   | 62.5         | 2.66
LOGICAL REPLICATION APPLY      | 7.5              | 2.3                 | 69.3         | 3.26

Test case 2: 1 billion rows, 1 column (int)

Command                        | HEAD (sec) | PATCHED (sec) | Faster by % | Faster by X
------------------------------ | ---------- | ------------- | ----------- | -----------
CREATE TABLE AS                | 794        | 386           | 51.38       | 2.05
CREATE MATERIALIZED VIEW       | 1006       | 563           | 44.03       | 1.78
REFRESH MATERIALIZED VIEW      | 977        | 603           | 38.28       | 1.62
ALTER TABLE (TABLE REWRITE)    | 1189       | 714           | 39.94       | 1.66
COPY FROM                      | 321        | 330           | -0.02       | 0.97
INSERT INTO ... SELECT         | 1084       | 586           | 45.94       | 1.84
LOGICAL REPLICATION APPLY      | 3530       | 2982          | 15.52       | 1.18

Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) | Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- | ------------ | -----------
CREATE TABLE AS                | 60               | 12                  | 80           | 5
CREATE MATERIALIZED VIEW       | 60               | 12                  | 80           | 5
REFRESH MATERIALIZED VIEW      | 60               | 12                  | 80           | 5
ALTER TABLE (TABLE REWRITE)    | 123              | 31                  | 60           | 2.5
COPY FROM                      | 12               | 12                  | 0            | 1
INSERT INTO ... SELECT         | 120              | 24                  | 80           | 5
LOGICAL REPLICATION APPLY      | 61               | 12                  | 80.32        | 5

Test setup:
./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" > install.log && make -j 8 install > install.log 2>&1 &

wal_level=logical
max_wal_size = 256GB
checkpoint_timeout = 1h


Test system is EC2 instance of type c5.4xlarge:
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            7
    BogoMIPS:            5999.99

Caches (sum of all):    
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    8 MiB (8 instances)
  L3:                    35.8 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15

RAM:
  MemTotal:       32036536 kB

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: WIP Incremental JSON Parser
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Remove unnecessary code rom be_lo_put()