Misdesigned command/status APIs for parallel dump/restore
От | Tom Lane |
---|---|
Тема | Misdesigned command/status APIs for parallel dump/restore |
Дата | |
Msg-id | 17340.1464465717@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Misdesigned command/status APIs for parallel dump/restore
|
Список | pgsql-hackers |
pg_dump's APIs for parallel dump/restore include an archive-format-specific MasterStartParallelItem function, which is evidently intended to allow format-specific data to be sent to the worker process during startup of a parallel sub-job. However, no such data can be sent in practice, because parsing of those commands is hardwired in parallel.c's WaitForCommands() function; nor do the APIs for the format-specific WorkerJobDump/WorkerJobRestore functions allow any format-specific data to be passed to them. So it's unsurprising that the two existing instantiations of MasterStartParallelItem are effectively duplicates. I am pretty strongly tempted to get rid of MasterStartParallelItem altogether and just hard-code what it does in DispatchJobForTocEntry. That'd also allow getting rid of the ugly usage of static buffers there. Alternatively, we could do what some of the comments suggest and make the format-specific WorkerJobDump/WorkerJobRestore functions responsible for parsing the command strings. But that seems like it would just be adding more duplicative code in support of flexibility that there's no use for. The arguments to DispatchJobForTocEntry are just a TocEntry and a T_Action, and that seems unlikely to change. The situation for MasterEndParallelItem isn't a whole lot better. At least the status strings are both created and parsed by format- specific functions --- but those functions are completely duplicative, performing no useful format-specific work, and there's no good reason to think that we'd need format-specific work in future. So I'm tempted to get rid of the MasterEndParallelItem API as well. Instead, have WorkerJobDump/WorkerJobRestore just return integer status codes, and perform all construction and parsing of the status messages in parallel.c. It's possible to imagine that future archive formats might have use for, say, passing an amount-of-data-written number from a worker back up to the master. But you could also implement that by relying on the filesystem (that is, a master or another worker could check file size to see how much had been written). So I'm pretty skeptical that we need extra flexibility here. A different line of thought would be to fix the worker-command-parsing situation by allowing the parsing to happen in format-specific code, but avoid duplicative coding by letting archive formats share a common implementation function if they had no need for any custom data. Thoughts? regards, tom lane
В списке pgsql-hackers по дате отправления: