Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Statistics Import and Export
Дата
Msg-id CADkLM=eJXPQD=pWwwm+4Y48haiMmznB8dCECNMrwr59niY8ESw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Statistics Import and Export  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers


On Thu, Mar 21, 2024 at 2:29 AM Jeff Davis <pgsql@j-davis.com> wrote:
On Tue, 2024-03-19 at 05:16 -0400, Corey Huinker wrote:
> v11 attached.

Thank you.

Comments on 0001:

This test:

   +SELECT
   +    format('SELECT pg_catalog.pg_set_attribute_stats( '
   ...

seems misplaced. It's generating SQL that can be used to restore or
copy the stats -- that seems like the job of pg_dump, and shouldn't be
tested within the plain SQL regression tests.

Fair enough.
 

And can the other tests use pg_stats rather than pg_statistic?

They can, but part of what I wanted to show was that the values that aren't directly passed in as parameters (staopN, stacollN) get set to the correct values, and those values aren't guaranteed to match across databases, hence testing them in the regression test rather than in a TAP test. I'd still like to be able to test that.
 

The function signature for pg_set_attribute_stats could be more
friendly -- how about there are a few required parameters, and then it
only sets the stats that are provided and the other ones are either
left to the existing value or get some reasonable default?

That would be problematic.

1. We'd have to compare the stats provided against the stats that are already there, make that list in-memory, and then re-order what remains
2. There would be no way to un-set statistics of a given stakind, unless we added an "actually set it null" boolean for each parameter that can be null. 
3. I tried that with the JSON formats, it made the code even messier than it already was.

Make sure all error paths ReleaseSysCache().

+1
 

Why are you calling checkCanModifyRelation() twice?

Once for the relation itself, and once for pg_statistic.
 
I'm confused about when the function should return false and when it
should throw an error. I'm inclined to think the return type should be
void and all failures should be reported as ERROR.

I go back and forth on that. I can see making it void and returning an error for everything that we currently return false for, but if we do that, then a statement with one pg_set_relation_stats, and N pg_set_attribute_stats (which we lump together in one command for the locking benefits and atomic transaction) would fail entirely if one of the set_attributes named a column that we had dropped. It's up for debate whether that's the right behavior or not.

replaces[] is initialized to {true}, which means only the first element
is initialized to true. Try following the pattern in AlterDatabase (or
similar) which reads the catalog tuple first, then updates a few fields
selectively, setting the corresponding element of replaces[] along the
way.

+1.
 

The test also sets the most_common_freqs in an ascending order, which
is weird.

I pulled most of the hardcoded values from pg_stats itself. The sample set is trivially small, and the values inserted were in-order-ish. So maybe that's why.
 
Relatedly, I got worried recently about the idea of plain users
updating statistics. In theory, that should be fine, and the planner
should be robust to whatever pg_statistic contains; but in practice
there's some risk of mischief there until everyone understands that the
contents of pg_stats should not be trusted. Fortunately I didn't find
any planner crashes or even errors after a brief test.

Maybe we could have the functions restricted to a role or roles:

1. pg_write_all_stats (can modify stats on ANY table)
2. pg_write_own_stats (can modify stats on tables owned by user)

I'm iffy on the need for the first one, I list it first purely to show how I derived the name for the second.
 
One thing we can do is some extra validation for consistency, like
checking that the arrays are properly sorted, check for negative
numbers in the wrong place, or fractions larger than 1.0, etc.

+1. All suggestions of validation checks welcome.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: John Naylor
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum