Re: POC: postgres_fdw insert batching
От | Tomas Vondra |
---|---|
Тема | Re: POC: postgres_fdw insert batching |
Дата | |
Msg-id | d9a17f23-81cc-fca1-56e6-602bd2ae06b4@enterprisedb.com обсуждение исходный текст |
Ответ на | RE: POC: postgres_fdw insert batching ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Ответы |
RE: POC: postgres_fdw insert batching
|
Список | pgsql-hackers |
On 11/17/20 10:11 AM, tsunakawa.takay@fujitsu.com wrote: > Hello, > > > Modified the patch as I talked with Tomas-san. The performance > results of loading one million records into a hash-partitioned table > with 8 partitions are as follows: > > unpatched, local: 8.6 seconds unpatched, fdw: 113.7 seconds patched, > fdw: 12.5 seconds (9x improvement) > > The test scripts are also attached. Run prepare.sql once to set up > tables and source data. Run local_part.sql and fdw_part.sql to load > source data into a partitioned table with local partitions and a > partitioned table with foreign tables respectively. > Unfortunately, this does not compile for me, because nodeModifyTable calls ExecGetTouchedPartitions, which is not defined anywhere. Not sure what's that about, so I simply commented-out this. That probably fails the partitioned cases, but it allowed me to do some review and testing. As for the patch, I have a couple of comments 1) As I mentioned before, I really don't think we should be doing deparsing in execute_foreign_modify - that's something that should happen earlier, and should be in a deparse.c function. 2) I think the GUC should be replaced with an server/table option, similar to fetch_size. The attached patch tries to address both of these points. Firstly, it adds a new deparseBulkInsertSql function, that builds a query for the "full" batch, and then uses those two queries - when we get a full batch we use the bulk query, otherwise we use the single-row query in a loop. IMO this is cleaner than deparsing queries ad hoc in the execute_foreign_modify. Of course, this might be worse when we don't have a full batch, e.g. for a query that insert only 50 rows with batch_size=100. If this case is common, one option would be lowering the batch_size accordingly. If we really want to improve this case too, I suggest we pass more info than just a position of the VALUES clause - that seems a bit too hackish. Secondly, it adds the batch_size option to server/foreign table, and uses that. This is not complete, though. postgresPlanForeignModify currently passes a hard-coded value at the moment, it needs to lookup the correct value for the server/table from RelOptInfo or something. And I suppose ModifyTable inftractructure will need to determine the value in order to pass the correct number of slots to the FDW API. The are a couple other smaller changes. E.g. it undoes changes to finish_foreign_modify, and instead calls separate functions to prepare the bulk statement. It also adds list_make5/list_make6 macros, so as to not have to do strange stuff with the parameter lists. A finally, this should probably add a bunch of regression tests. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: