RE: Parallel INSERT (INTO ... SELECT ...)
От | tsunakawa.takay@fujitsu.com |
---|---|
Тема | RE: Parallel INSERT (INTO ... SELECT ...) |
Дата | |
Msg-id | TYAPR01MB29902F37B88951CFA7B9AB51FE8D9@TYAPR01MB2990.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | RE: Parallel INSERT (INTO ... SELECT ...) ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Ответы |
RE: Parallel INSERT (INTO ... SELECT ...)
|
Список | pgsql-hackers |
From: tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> > From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com> > > It did have performance gain, but I think it's not huge enough to ignore the > > extra's index cost. > > What do you think ? > > Yes... as you suspect, I'm afraid the benefit from parallel bitmap scan may not > compensate for the loss of the parallel insert operation. > > The loss is probably due to 1) more index page splits, 2) more buffer writes > (table and index), and 3) internal locks for things such as relation extension > and page content protection. To investigate 3), we should want something > like [1], which tells us the wait event statistics (wait count and time for each > wait event) per session or across the instance like Oracke, MySQL and EDB > provides. I want to continue this in the near future. What would the result look like if you turn off parallel_leader_participation? If the leader is freed from reading/writingthe table and index, the index page splits and internal lock contention may decrease enough to recover partof the loss. https://www.postgresql.org/docs/devel/parallel-plans.html "In a parallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more indexesand builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperatingprocesses as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlyingindex scan is not." BTW, the following sentences seem to be revisited, because "the work to be done" is not the same for parallel INSERT as forserial INSERT - the order of rows stored, table and index sizes, and what else? https://www.postgresql.org/docs/devel/using-explain.html#USING-EXPLAIN-ANALYZE "It's worth noting that although the data-modifying node can take a considerable amount of run time (here, it's consumingthe lion's share of the time), the planner does not currently add anything to the cost estimates to account forthat work. That's because the work to be done is the same for every correct query plan, so it doesn't affect planningdecisions." Regards Takayuki Tsunakawa
В списке pgsql-hackers по дате отправления: