Re: zstd compression for pg_dump
От | Justin Pryzby |
---|---|
Тема | Re: zstd compression for pg_dump |
Дата | |
Msg-id | ZCHSSv3DgxUa0f3L@telsasoft.com обсуждение исходный текст |
Ответ на | Re: zstd compression for pg_dump (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: zstd compression for pg_dump
|
Список | pgsql-hackers |
On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote: > On 3/16/23 05:50, Justin Pryzby wrote: > > On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote: > >> On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion <jchampion@timescale.com> wrote: > >>> I did some smoke testing against zstd's GitHub release on Windows. To > >>> build against it, I had to construct an import library, and put that > >>> and the DLL into the `lib` folder expected by the MSVC scripts... > >>> which makes me wonder if I've chosen a harder way than necessary? > >> > >> It looks like pg_dump's meson.build is missing dependencies on zstd > >> (meson couldn't find the headers in the subproject without them). > > > > I saw that this was added for LZ4, but I hadn't added it for zstd since > > I didn't run into an issue without it. Could you check that what I've > > added works for your case ? > > > >>> Parallel zstd dumps seem to work as expected, in that the resulting > >>> pg_restore output is identical to uncompressed dumps and nothing > >>> explodes. I haven't inspected the threading implementation for safety > >>> yet, as you mentioned. > >> > >> Hm. Best I can tell, the CloneArchive() machinery is supposed to be > >> handling safety for this, by isolating each thread's state. I don't feel > >> comfortable pronouncing this new addition safe or not, because I'm not > >> sure I understand what the comments in the format-specific _Clone() > >> callbacks are saying yet. > > > > My line of reasoning for unix is that pg_dump forks before any calls to > > zstd. Nothing zstd does ought to affect the pg_dump layer. But that > > doesn't apply to pg_dump under windows. This is an opened question. If > > there's no solid answer, I could disable/ignore the option (maybe only > > under windows). > > I may be missing something, but why would the patch affect this? Why > would it even affect safety of the parallel dump? And I don't see any > changes to the clone stuff ... zstd supports using threads during compression, with -Z zstd:workers=N. When unix forks, the child processes can't do anything to mess up the state of the parent processes. But windows pg_dump uses threads instead of forking, so it seems possible that the pg_dump -j threads that then spawn zstd threads could "leak threads" and break the main thread. I suspect there's no issue, but we still ought to verify that before declaring it safe. > > The function is first checking if it was passed a filename which already > > has a suffix. And if not, it searches through a list of suffixes, > > testing for an existing file with each suffix. The search with stat() > > doesn't happen if it has a suffix. I'm having trouble seeing how the > > hasSuffix() branch isn't dead code. Another opened question. > > AFAICS it's done this way because of this comment in pg_backup_directory > > * ... > * ".gz" suffix is added to the filenames. The TOC files are never > * compressed by pg_dump, however they are accepted with the .gz suffix > * too, in case the user has manually compressed them with 'gzip'. > > I haven't tried, but I believe that if you manually compress the > directory, it may hit this branch. That would make sense, but when I tried, it didn't work like that. The filenames were all uncompressed names. Maybe it worked differently in an older release. Or maybe it changed during development of the parallel-directory-dump patch and it's actually dead code. This is rebased over the updated compression API. It seems like I misunderstood something you said before, so now I put back "supports_compression()". -- Justin
Вложения
В списке pgsql-hackers по дате отправления: