Re: [HACKERS] Cost model for parallel CREATE INDEX
От | Robert Haas |
---|---|
Тема | Re: [HACKERS] Cost model for parallel CREATE INDEX |
Дата | |
Msg-id | CA+TgmoaTwOUtyOM9GLG4LpANY2qxzy9NZXRKbP0bB26jRgyW4w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Cost model for parallel CREATE INDEX (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: [HACKERS] Cost model for parallel CREATE INDEX
|
Список | pgsql-hackers |
On Sun, Mar 5, 2017 at 7:14 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> So, I agree with Robert that we should actually use heap size for the >> main, initial determination of # of workers to use, but we still need >> to estimate the size of the final index [1], to let the cost model cap >> the initial determination when maintenance_work_mem is just too low. >> (This cap will rarely be applied in practice, as I said.) >> >> [1] https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect > > Having looked at it some more, this no longer seems worthwhile. In the > next revision, I will add a backstop that limits the use of > parallelism based on a lack of maintenance_work_mem in a simpler > manner. Namely, the worker will have to be left with a > maintenance_work_mem/nworkers share of no less than 32MB in order for > parallel CREATE INDEX to proceed. There doesn't seem to be any great > reason to bring the volume of data to be sorted into it. +1. > I expect the cost model to be significantly simplified in the next > revision in other ways, too. There will be no new index storage > parameter, nor a disable_parallelddl GUC. compute_parallel_worker() > will be called in a fairly straightforward way within > plan_create_index_workers(), using heap blocks, as agreed to already. +1. > pg_restore will avoid parallelism (that will happen by setting > "max_parallel_workers_maintenance = 0" when it runs), not because it > cannot trust the cost model, but because it prefers to parallelize > things its own way (with multiple restore jobs), and because execution > speed may not be the top priority for pg_restore, unlike a live > production system. This part I'm not sure about. I think people care quite a lot about pg_restore speed, because they are often down when they're running it. And they may have oodles mode CPUs that parallel restore can use without help from parallel query. I would be inclined to leave pg_restore alone and let the chips fall where they may. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: