Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Statistics Import and Export
Дата
Msg-id CAExHW5sCAGbeD66vqR1kgVvEbOnmGiLGjSYYyqgAetposViJtw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Ответы Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Re: Statistics Import and Export  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
Hi Corey,


On Sat, Mar 23, 2024 at 7:21 AM Corey Huinker <corey.huinker@gmail.com> wrote:
v12 attached.

0001 - 


Some random comments

+SELECT
+    format('SELECT pg_catalog.pg_set_attribute_stats( '
+            || 'relation => %L::regclass::oid, attname => %L::name, '
+            || 'inherited => %L::boolean, null_frac => %L::real, '
+            || 'avg_width => %L::integer, n_distinct => %L::real, '
+            || 'most_common_vals => %L::text, '
+            || 'most_common_freqs => %L::real[], '
+            || 'histogram_bounds => %L::text, '
+            || 'correlation => %L::real, '
+            || 'most_common_elems => %L::text, '
+            || 'most_common_elem_freqs => %L::real[], '
+            || 'elem_count_histogram => %L::real[], '
+            || 'range_length_histogram => %L::text, '
+            || 'range_empty_frac => %L::real, '
+            || 'range_bounds_histogram => %L::text) ',
+        'stats_export_import.' || s.tablename || '_clone', s.attname,
+        s.inherited, s.null_frac,
+        s.avg_width, s.n_distinct,
+        s.most_common_vals, s.most_common_freqs, s.histogram_bounds,
+        s.correlation, s.most_common_elems, s.most_common_elem_freqs,
+        s.elem_count_histogram, s.range_length_histogram,
+        s.range_empty_frac, s.range_bounds_histogram)
+FROM pg_catalog.pg_stats AS s
+WHERE s.schemaname = 'stats_export_import'
+AND s.tablename IN ('test', 'is_odd')
+\gexec

Why do we need to construct the command and execute? Can we instead execute the function directly? That would also avoid ECHO magic.

+   <table id="functions-admin-statsimport">
+    <title>Database Object Statistics Import Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>

COMMENT: The functions throw many validation errors. Do we want to list the acceptable/unacceptable input values in the documentation corresponding to those? I don't expect one line per argument validation. Something like "these, these and these arguments can not be NULL" or "both arguments in each of the pairs x and y, a and b, and c and d should be non-NULL or NULL respectively".
 
 
The functions pg_set_relation_stats() and pg_set_attribute_stats() now return void. There just weren't enough conditions where a condition was considered recoverable to justify having it. This may mean that combining multiple pg_set_attribute_stats calls into one compound statement may no longer be desirable, but that's just one of the places where I'd like feedback on how pg_dump/pg_restore use these functions.


0002 -

This patch concerns invoking the functions in 0001 via pg_restore/pg_upgrade. Little has changed here. Dumping statistics is currently the default for pg_dump/pg_restore/pg_upgrade, and can be switched off with the switch --no-statistics. Some have expressed concern about whether stats dumping should be the default. I have a slight preference for making it the default, for the following reasons:


+ /* Statistics are dependent on the definition, not the data */
+ /* Views don't have stats */
+ if ((tbinfo->dobj.dump & DUMP_COMPONENT_STATISTICS) &&
+ (tbinfo->relkind == RELKIND_VIEW))
+ dumpRelationStats(fout, &tbinfo->dobj, reltypename,
+  tbinfo->dobj.dumpId);
+

Statistics are about data. Whenever pg_dump dumps some filtered data, the
statistics collected for the whole table are uselss. We should avoide dumping
statistics in such a case. E.g. when only schema is dumped what good is
statistics? Similarly the statistics on a partitioned table may not be useful
if some its partitions are not dumped. Said that dumping statistics on foreign
table makes sense since they do not contain data but the statistics still makes sense.
 

Key areas where I'm seeking feedback:

- What level of errors in a restore will a user tolerate, and what should be done to the error messages to indicate that the data itself is fine, but a manual operation to update stats on that particular table is now warranted?
- To what degree could pg_restore/pg_upgrade take that recovery action automatically?
- Should the individual attribute/class set function calls be grouped by relation, so that they all succeed/fail together, or should they be called separately, each able to succeed or fail on their own?
- Any other concerns about how to best use these new functions.



Whether or not I pass --no-statistics, there is no difference in the dump output. Am I missing something?
$ pg_dump -d postgres > /tmp/dump_no_arguments.out
$ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out
$ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out
$

IIUC, pg_dump includes statistics by default. That means all our pg_dump related tests will have statistics output by default. That's good since the functionality will always be tested. 1. We need additional tests to ensure that the statistics is installed after restore. 2. Some of those tests compare dumps before and after restore. In case the statistics is changed because of auto-analyze happening post-restore, these tests will fail.

I believe, in order to import statistics through IMPORT FOREIGN SCHEMA, postgresImportForeignSchema() will need to add SELECT commands invoking pg_set_relation_stats() on each imported table and pg_set_attribute_stats() on each of its attribute. Am I right? Do we want to make that happen in the first cut of the feature? How do you expect these functions to be used to update statistics of foreign tables?

--
Best Wishes,
Ashutosh Bapat

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning