parallel pg_restore design issues
От | Andrew Dunstan |
---|---|
Тема | parallel pg_restore design issues |
Дата | |
Msg-id | 48E957C4.8060008@dunslane.net обсуждение исходный текст |
Ответы |
Re: parallel pg_restore design issues
|
Список | pgsql-hackers |
There are a couple of open questions for parallel pg_restore. First, we need a way to decide the boundary between the serially run "pre-data" section and the remainder of the items in the TOC. Currently the code uses the first TABLEDATA item as the boundary. That's not terribly robust (what if there aren't any?). Also, people have wanted to steer clear of hardcoding much knowledge of archive member types into pg_restore as a way of future-proofing it somewhat. I'm wondering if we should have pg_dump explicitly mark items as pre-data,data or post-data. For legacy archives we could still check for either a TABLEDATA item or something known to sort after those (i.e. a BLOB, BLOB COMMENT, CONSTRAINT, INDEX, RULE, TRIGGER or FK CONSTRAINT item). Another item we have already discussed is how to prevent concurrent processes from trying to take conflicting locks. Her we really can't rely on pg_dump to help us out, as lock requirements might change (a little bird has already whispered in my ear about reducing the strength of FK CONSTRAINT locks taken). I haven't got a really good answer here. Last, there is the question of what algorithm to use in chosing the next item to run. Currently, I am using "next item in the queue whose dependencies have been met", with no queue reordering. Another possible algorithm would reorder the queue by elevating any item whose dependencies have been met. This will mean all the indexes for a table will tend to be grouped together, which might well be a good thing, and will tend to limit the tendency to do all the data loading at once. Both of these could be modified by explicitly limiting TABLEDATA items to a certain proportion (say, one quarter) of the processing slots available, if other items are available. I'm actually somewhat inclined to make provision for all of these possibilities via a command line option, with the first being the default. One size doesn't fit all, I suspect, and if it does we'll need lots of data before deciding what that size is. The extra logic won't really involve all that much code, and it will all be confined to a couple of functions. Thoughts? cheers andrew
В списке pgsql-hackers по дате отправления: