Re: new heapcheck contrib module
От | Mark Dilger |
---|---|
Тема | Re: new heapcheck contrib module |
Дата | |
Msg-id | 411C72E5-23A6-485F-A101-FBF7CFA6D2D5@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: new heapcheck contrib module (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: new heapcheck contrib module
|
Список | pgsql-hackers |
> On Jan 14, 2021, at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jan 11, 2021 at 1:16 PM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: >> Added in v32, along with adding pg_amcheck to @contrib_uselibpq, @contrib_uselibpgport, and @contrib_uselibpgcommon > > exit_utils.c fails to achieve the goal of making this code independent > of pg_dump, because of: > > #ifdef WIN32 > if (parallel_init_done && GetCurrentThreadId() != mainThreadId) > _endthreadex(code); > #endif > > parallel_init_done is a pg_dump-ism. Perhaps this chunk of code could > be a handler that gets registered using exit_nicely() rather than > hard-coded like this. Note that the function comments for > exit_nicely() are heavily implicated in this problem, since they also > apply to stuff that only happens in pg_dump and not other utilities. The 0001 patch has been restructured to not have this problem. > I'm skeptical about the idea of putting functions into string_utils.c > with names as generic as include_filter() and exclude_filter(). > Existing cases like fmtId() and fmtQualifiedId() are not great either, > but I think this is worse and that we should do some renaming. On a > related note, it's not clear to me why these should be classified as > string_utils while stuff like expand_schema_name_patterns() gets > classified as option_utils. These are neither generic > string-processing functions nor are they generic options-parsing > functions. They are functions for expanding shell-glob style patterns > for database object names. And they seem like they ought to be > together, because they seem to do closely-related things. I'm open to > an argument that this is wrongheaded on my part, but it looks weird to > me the way it is. The logic to filter which relations are checked is completely restructured and is kept in pg_amcheck.c > I'm pretty unimpressed by query_utils.c. The CurrentResultHandler > stuff looks grotty, and you don't seem to really use it anywhere. And > it seems woefully overambitious to me anyway: this doesn't apply to > every kind of "result" we've got hanging around, absolutely nothing > even close to that, even though a name like CurrentResultHandler > sounds very broad. It also means more global variables, which is a > thing of which the PostgreSQL codebase already has a deplorable > oversupply. quiet_handler() and noop_handler() aren't used anywhere > either, AFAICS. > > I wonder if it would be better to pass in callbacks rather than > relying on global variables. e.g.: > > typedef void (*fatal_error_callback)(const char *fmt,...) > pg_attribute_printf(1, 2) pg_attribute_noreturn(); > > Then you could have a few helper functions that take an argument of > type fatal_error_callback and throw the right fatal error for (a) > wrong PQresultStatus() and (b) result is not one row. Do you need any > other cases? exiting_handler() seems to think that the caller might > want to allow any number of tuples, or any positive number, or any > particular cout, but I'm not sure if all of those cases are really > needed. The error callback stuff has been refactored in this next patch set, and also now includes handlers for parallel slots, asthe src/bin/scripts/scripts_parallel.c stuff has been moved to fe_utils and made more general. As it was, there were hardcodedassumptions that are valid for reindexdb and vacuumdb, but not general enough for pg_amcheck to use. The refactoringin patches 0002 through 0005 make it more generally usable. Patch 0008 uses it in pg_amcheck. > This stuff is finnicky and hard to get right. You don't really want to > create a situation where the same code keeps getting duplicated, or > the behavior's just a little bit inconsistent everywhere, but it also > isn't great to build layers upon layers of abstraction around > something like ExecuteSqlQuery which is, in the end, a four-line > function. I don't think there's any problem with something like > pg_dump having it's own function to execute-a-query-or-die. Maybe that > function ends up doing something like > TheGenericFunctionToExecuteOrDie(my_die_fn, the_query), or maybe > pg_dump can just open-code it but have a my_die_fn to pass down to the > glob-expansion stuff, or, well, I don't know. There are some real improvements in this next patch set. The number of queries issued to the database to determine the databases to use is much reduced. I had been following thepattern in pg_dump, but abandoned that for something new. The parallel slots stuff is now used for parallelism, much like what is done in vacuumdb and reindexdb. The pg_amcheck application can now be run over one database, multiple specified databases, or all databases. Relations, schemas, and databases can be included and excluded by pattern, like "(db1|db2|db3).myschema.(mytable|myindex)". The real-world use-cases for this that I have in mind are things like: pg_amcheck --jobs=12 --all \ --exclude-relation="db7.schema.known_corrupt_table" \ --exclude-relation="db*.schema.known_big_table" and pg_amcheck --jobs=20 \ --include-relation="*.compliance.audited" I might be missing something, but I think the interface is a superset of the interface from reindexdb and vacuumdb. Noneof the new interface stuff (patterns, allowing multiple databases to be given on the command line, etc) is required. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
- v33-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patch
- v33-0002-Introducing-PGresultHandler-abstraction.patch
- v33-0003-Preparing-for-move-of-parallel-slot-infrastructu.patch
- v33-0004-Moving-and-renaming-scripts_parallel.patch
- v33-0005-Parameterizing-parallel-slot-result-handling.patch
- v33-0006-Moving-handle_help_version_opts.patch
- v33-0007-Refactoring-processSQLNamePattern.patch
- v33-0008-Adding-contrib-module-pg_amcheck.patch
В списке pgsql-hackers по дате отправления: