Multi Inserts in CREATE TABLE AS - revived patch
От | Bharath Rupireddy |
---|---|
Тема | Multi Inserts in CREATE TABLE AS - revived patch |
Дата | |
Msg-id | CALj2ACUr8Vnu3dMkiU47v-dh55tnY2Lr8m2xoSaRZeiCaNeVqQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Multi Inserts in CREATE TABLE AS - revived patch
Re: Multi Inserts in CREATE TABLE AS - revived patch |
Список | pgsql-hackers |
Hi,
I would like to propose an updated patch on multi/bulk inserts in CTAS [1] that tries to address the review comments that came up in [1]. One of the main review comments was to calculate/estimate the tuple size to decide on when to flush. I tried to solve this point with a new function GetTupleSize()(see the patch for implementation).
I did some testing with custom configuration[2].
Use case 1- 100mn tuples, 2 integer columns, exec time in sec:
HEAD: 131.507 when the select part is not parallel, 128.832 when the select part is parallel
Patch: 98.925 when the select part is not parallel, 52.901 when the select part is parallel
Use case 2- 10mn tuples, 4 integer and 6 text columns, exec time in sec:
HEAD: 76.801 when the select part is not parallel, 66.074 when the select part is parallel
Patch: 74.083 when the select part is not parallel, 57.739 when the select part is parallel
Thoughts?
If the approach followed in the patch looks okay, I can work on a separate patch for multi inserts in refresh materialized view cases.
I thank Simon Riggs for the offlist discussion.
PS: I chose to start a new thread as the previous thread [1] was closed in the CF. I hope that's not a problem.
[1] - https://www.postgresql.org/message-id/CAEET0ZHRWxbRUgwzUK_tOFDWx7VE2-P%3DxMBT6-N%2BgAa9WQ%3DxxA%40mail.gmail.com
[2] - The postgresql.conf used:
shared_buffers = 40GB
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
I would like to propose an updated patch on multi/bulk inserts in CTAS [1] that tries to address the review comments that came up in [1]. One of the main review comments was to calculate/estimate the tuple size to decide on when to flush. I tried to solve this point with a new function GetTupleSize()(see the patch for implementation).
I did some testing with custom configuration[2].
Use case 1- 100mn tuples, 2 integer columns, exec time in sec:
HEAD: 131.507 when the select part is not parallel, 128.832 when the select part is parallel
Patch: 98.925 when the select part is not parallel, 52.901 when the select part is parallel
Use case 2- 10mn tuples, 4 integer and 6 text columns, exec time in sec:
HEAD: 76.801 when the select part is not parallel, 66.074 when the select part is parallel
Patch: 74.083 when the select part is not parallel, 57.739 when the select part is parallel
If the approach followed in the patch looks okay, I can work on a separate patch for multi inserts in refresh materialized view cases.
I thank Simon Riggs for the offlist discussion.
PS: I chose to start a new thread as the previous thread [1] was closed in the CF. I hope that's not a problem.
[1] - https://www.postgresql.org/message-id/CAEET0ZHRWxbRUgwzUK_tOFDWx7VE2-P%3DxMBT6-N%2BgAa9WQ%3DxxA%40mail.gmail.com
[2] - The postgresql.conf used:
shared_buffers = 40GB
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: