Обсуждение: splitting src/bin/scripts/vacuumdb.c
Hello, While working on Antonin Houska's REPACK code[1][2], I had chance to give a look at vacuumdb.c with the intention of creating a separate program, pg_repackdb from it, which would share most of the code. This patch creates vacuuming.c/h with that intent. While there, I noticed that that code could be modernized a little bit, so I did that. I think the most important changes are: - objfilter is declared as an enum but the values are bit-or'ed and tested for individual bits. We've discussed this coding pattern in other contexts and stayed away from it, on the grounds that the values so generated aren't really true values of the enum. I changed it from an enum to a bits32 with a few #defines for the values we use, the way we do elsewhere. Also, instead of being passed as a separate argument to the various functions, I added it to the vacuumingOpts struct. - Two booleans, analyze_in_stages and analyze_only, are really determining what "mode" the program runs in -- it's either vacuum, or one of those two modes. (Antonin came up with the idea of using "modes", but his patch only adds a new REPACK mode on top of the existing code without any further changes). I think the code flow is a little neater by making this change; consider for instance this change: - /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ - if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE && - !vacopts->analyze_only) - { + /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ + if (vacopts->mode == MODE_VACUUM && vacopts->skip_database_stats) + { IMO the original is quite unclear. Overall I'm quite happy with this small patch now, so I intend to push shortly barring objections. I'm not adding a new commitfest entry, just adding this new thread to the existing entry for REPACK. I kept the patch numbering as used in the other thread. [1] https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql [2] https://commitfest.postgresql.org/patch/5117/ -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El sabio habla porque tiene algo que decir; el tonto, porque tiene que decir algo" (Platon).
Вложения
Álvaro Herrera <alvherre@kurilemu.de> wrote: > - Two booleans, analyze_in_stages and analyze_only, are really > determining what "mode" the program runs in -- it's either vacuum, or > one of those two modes. (Antonin came up with the idea of using > "modes", but his patch only adds a new REPACK mode on top of the > existing code without any further changes). I think the code flow is a > little neater by making this change; consider for instance this change: > > - /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ > - if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE && > - !vacopts->analyze_only) > - { > > + /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */ > + if (vacopts->mode == MODE_VACUUM && vacopts->skip_database_stats) > + { > > IMO the original is quite unclear. I agree that redundant information makes things more difficult to think about. I just wonder if vacopts->mode != MODE_VACUUM should be used instead of (vacopts->mode == MODE_ANALYZE || vacopts->mode == MODE_ANALYZE_IN_STAGES) in some places. Nevertheless, I'm not proposing a global replacement: the verbose form may be easier to understand, depending on the context. Other than that, I checked differences between v21, v22 and v23. I've got no other comments worth posting. -- Antonin Houska Web: https://www.cybertec-postgresql.com
On 2025-Sep-26, Antonin Houska wrote: > I agree that redundant information makes things more difficult to > think about. I just wonder if > > vacopts->mode != MODE_VACUUM > > should be used instead of > > (vacopts->mode == MODE_ANALYZE || > vacopts->mode == MODE_ANALYZE_IN_STAGES) I don't think so, because it'd become (vacopts->mode != MODE_VACUUM && vacopts->mode != MODE_REPACK) after we introduce repack, which looks worse. I considered introducing a simple macro for this usage, something like #define ModeIsAnalyze(mode) (mode == MODE_ANALYZE || mode == MODE_ANALYZE_IN_STAGES) but thought it'd be overkill. > Other than that, I checked differences between v21, v22 and v23. I've got no > other comments worth posting. Thanks for looking! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
On Fri, Sep 26, 2025 at 01:55:50PM +0200, Álvaro Herrera wrote: > - objfilter is declared as an enum but the values are bit-or'ed and > tested for individual bits. We've discussed this coding pattern in > other contexts and stayed away from it, on the grounds that the values > so generated aren't really true values of the enum. I changed it from > an enum to a bits32 with a few #defines for the values we use, the way > we do elsewhere. Also, instead of being passed as a separate argument > to the various functions, I added it to the vacuumingOpts struct. > > - Two booleans, analyze_in_stages and analyze_only, are really > determining what "mode" the program runs in -- it's either vacuum, or > one of those two modes. (Antonin came up with the idea of using > "modes", but his patch only adds a new REPACK mode on top of the > existing code without any further changes). I think the code flow is a > little neater by making this change; consider for instance this change: Seems reasonable to me. > Overall I'm quite happy with this small patch now, so I intend to push > shortly barring objections. I'm not adding a new commitfest entry, just > adding this new thread to the existing entry for REPACK. I kept the > patch numbering as used in the other thread. Is there any way to separate the aforementioned changes into a separate patch from the refactoring changes? That would make it a lot easier to review. I think we have decent testing for vacuumdb, so I'm not too concerned if this isn't feasible. -- nathan
On 2025-Sep-26, Nathan Bossart wrote: > Seems reasonable to me. Thanks, I had pushed this before receiving your email. > On Fri, Sep 26, 2025 at 01:55:50PM +0200, Álvaro Herrera wrote: > > Overall I'm quite happy with this small patch now, so I intend to push > > shortly barring objections. I'm not adding a new commitfest entry, just > > adding this new thread to the existing entry for REPACK. I kept the > > patch numbering as used in the other thread. > > Is there any way to separate the aforementioned changes into a separate > patch from the refactoring changes? That would make it a lot easier to > review. I think we have decent testing for vacuumdb, so I'm not too > concerned if this isn't feasible. Hmm, I guess I should have posted it that way for ease of review, but didn't think of it. You can sort-of look at the patch as git show --color-moved=zebra and you'll see the lines that are moved unchanged marked with one color, and the ones that are edited in a different color. It's a bit of a weird Christmas tree at this point, but if you look at it hard enough you can tell the unchanged ones apart. I had looked at the coverage of this file earlier today, and I agree that we have pretty decent coverage -- although it's one of the two worst-covered files in that subdirectory, the lines that aren't covered are mostly a few error messages. It might be a good idea to add some tests for those anyway, to ensure we don't break anything later. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Y una voz del caos me habló y me dijo "Sonríe y sé feliz, podría ser peor". Y sonreí. Y fui feliz. Y fue peor.