Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Statistics Import and Export
Дата
Msg-id CADkLM=cvf7KEs-s-iTxheDuOBo3kTU0-RpjuQz5VpyF+K9MtLQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers


On Fri, Dec 15, 2023 at 3:36 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 13/12/2023 17:26, Corey Huinker wrote:> 4. I don't yet have a
complete vision for how these tools will be used
> by pg_upgrade and pg_dump/restore, the places where these will provide
> the biggest win for users.

Some issues here with docs:

func.sgml:28465: parser error : Opening and ending tag mismatch: sect1
line 26479 and sect2
   </sect2>
           ^

Apologies, will fix.
 

Also, as I remember, we already had some attempts to invent dump/restore
statistics [1,2]. They were stopped with the problem of type
verification. What if the definition of the type has changed between the
dump and restore? As I see in the code, Importing statistics you just
check the column name and don't see into the type.

We look up the imported statistics via column name, that is correct.

However, the values in stavalues and mcv and such are stored purely as text, so they must be casted using the input functions for that particular datatype. If that column definition changed, or the underlying input function changed, the stats import of that particular table would fail. It should be noted, however, that those same input functions were used to bring the data into the table via restore, so it would have already failed on that step. Either way, the structure of the table has effectively changed, so failure to import those statistics would be a good thing.
 

[1] Backup and recovery of pg_statistic
https://www.postgresql.org/message-id/flat/724322880.K8vzik8zPz%40abook

That proposal sought to serialize enough information on the old server such that rows could be directly inserted into pg_statistic on the new server. As was pointed out at the time, version N of a server cannot know what the format of pg_statistic will be in version N+1. 

This patch avoids that problem by inspecting the structure of the object to be faux-analyzed, and using that to determine what parts of the JSON to fetch, and what datatype to cast values to in cases like mcv and stavaluesN. The exported JSON has no oids in it whatseover, all elements subject to casting on import have already been cast to text, and the record returned has the server version number of the producing system, and the import function can use that to determine how it interprets the data it finds.
 

[2] Re: Ideas about a better API for postgres_fdw remote estimates
https://www.postgresql.org/message-id/7a40707d-1758-85a2-7bb1-6e5775518e64%40postgrespro.ru


This one seems to be pulling oids from the remote server, and we can't guarantee their stability across systems, especially for objects and operators from extensions. I tried to go the route of extracting the full text name of an operator, but discovered that the qualified names, in addition to being unsightly, were irrelevant because we can't insert stats that disagree about type with the attribute/expression. So it didn't matter what type the remote system thought it had, the local system was going to coerce it into the expected data type or ereport() trying.

I think there is hope for having do_analyze() run a remote query fetching the remote table's exported stats and then storing them locally, possibly after some modification, and that would save us from having to sample a remote table.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Clang optimiser vs preproc.c
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Clang optimiser vs preproc.c