Re: Parallel Inserts in CREATE TABLE AS
От | Bharath Rupireddy |
---|---|
Тема | Re: Parallel Inserts in CREATE TABLE AS |
Дата | |
Msg-id | CALj2ACUK58u-pQXmJjaF97b20v=ugppiwAiwZtgLFXn7SNSY8Q@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Parallel Inserts in CREATE TABLE AS ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>) |
Ответы |
Re: Parallel Inserts in CREATE TABLE AS
|
Список | pgsql-hackers |
On Mon, Jan 11, 2021 at 6:37 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote: > > Attaching v21 patch set, which has following changes: > > 1) 0001 - changed fpes->ins_cmd_type == > > PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type != > > PARALLEL_INSERT_CMD_UNDEF > > 2) 0002 - reworded the commit message. > > 3) 0003 - added cmin, xmin test case to one of the parallel insert cases > > to ensure leader and worker insert the tuples in the same xact and replaced > > memory usage output in numbers like 25kB to NkB to make the tests stable. > > 4) 0004 - updated one of the test output to be in NkB and made the assertion > > in SetParallelInsertState to be not under an if condition. > > > > There's one open point [1] on selective skipping of error "cannot insert > > tuples in a parallel worker" in heap_prepare_insert(), thoughts are > > welcome. > > > > Please consider the v21 patch set for further review. > > Hi, > > I took a look into the new patch and have some comments. Thanks. > 1. > + /* > + * Do not consider tuple cost in case of we intend to perform parallel > + * inserts by workers. We would have turned on the ignore flag in > + * apply_scanjoin_target_to_paths before generating Gather path for the > + * upper level SELECT part of the query. > + */ > + if ((root->parse->parallelInsCmdTupleCostOpt & > + PARALLEL_INSERT_SELECT_QUERY) && > + (root->parse->parallelInsCmdTupleCostOpt & > + PARALLEL_INSERT_CAN_IGN_TUP_COST)) > > Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ? > IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set. +1. Changed. > 2. > +static void > +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, > + void *ins_info) > ... > + info = (ParallelInsertCTASInfo *) ins_info; > + intoclause_str = nodeToString(info->intoclause); > + intoclause_len = strlen(intoclause_str) + 1; > > +static void > +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, > + void *ins_info) > ... > + info = (ParallelInsertCTASInfo *)ins_info; > + intoclause_str = nodeToString(info->intoclause); > + intoclause_len = strlen(intoclause_str) + 1; > + intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len); > > I noticed the above code will call nodeToString and strlen twice which seems unnecessary. > Do you think it's better to store the result of nodetostring and strlen first and pass them when used ? I wanted to keep the API generic, not do nodeToString, strlen outside and pass it to the APIs. I don't think it will add too much function call cost since it's run only in the leader. This way, the code and API looks more readable. Thoughts? > 3. > + if (node->need_to_scan_locally || node->nworkers_launched == 0) > + { > + EState *estate = node->ps.state; > + TupleTableSlot *outerTupleSlot; > + > + for(;;) > + { > + /* Install our DSA area while executing the plan. */ > + estate->es_query_dsa = > + node->pei ? node->pei->area : NULL; > ... > + node->ps.state->es_processed++; > + } > > How about use the variables estate like 'estate-> es_processed++;' > Instead of node->ps.state->es_processed++; +1. Changed. Attaching v22 patch set with changes only in 0001 and 0002. Please consider it for further review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: