Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Statistics Import and Export
Дата
Msg-id ZYxCvVTexwVww8Gs@momjian.us
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Wed, Dec 27, 2023 at 01:08:47PM +0100, Tomas Vondra wrote:
> On 12/26/23 20:19, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> >> I think we need a robust API to handle two cases:
> > 
> >> *  changes in how we store statistics
> >> *  changes in how how data type values are represented in the statistics
> > 
> >> We have had such changes in the past, and I think these two issues are
> >> what have prevented import/export of statistics up to this point.
> >> Developing an API that doesn't cleanly handle these will cause long-term
> >> pain.
> > 
> > Agreed.
> > 
> 
> I agree the format is important - we don't want to end up with a format
> that's cumbersome or inconvenient to use. But I don't think the proposed
> format is somewhat bad in those respects - it mostly reflects how we
> store statistics and if I was designing a format for humans, it might
> look a bit differently. But that's not the goal here, IMHO.
> 
> I don't quite understand the two cases above. Why should this affect how
> we store statistics? Surely, making the statistics easy to use for the
> optimizer is much more important than occasional export/import.

The two items above were to focus on getting a solution that can easily
handle future statistics storage changes.  I figured we would want to
manipulate the data as a table internally so I am confused why we would
export JSON instead of a COPY format.  I didn't think we were changing
how we internall store or use the statistics.

> >> In summary, I think we need an SQL-level command for this.
> > 
> > I think a SQL command is an actively bad idea.  It'll just add development
> > and maintenance overhead that we don't need.  When I worked on this topic
> > years ago at Salesforce, I had things set up with simple functions, which
> > pg_dump would invoke by writing more or less
> > 
> >     SELECT pg_catalog.load_statistics(....);
> > 
> > This has a number of advantages, not least of which is that an extension
> > could plausibly add compatible functions to older versions.  The trick,
> > as you say, is to figure out what the argument lists ought to be.
> > Unfortunately I recall few details of what I wrote for Salesforce,
> > but I think I had it broken down in a way where there was a separate
> > function call occurring for each pg_statistic "slot", thus roughly
> > 
> > load_statistics(table regclass, attname text, stakind int, stavalue ...);
> > 
> > I might have had a separate load_statistics_xxx function for each
> > stakind, which would ease the issue of deciding what the datatype
> > of "stavalue" is.  As mentioned already, we'd also need some sort of
> > version identifier, and we'd expect the load_statistics() functions
> > to be able to transform the data if the old version used a different
> > representation.  I agree with the idea that an explicit representation
> > of the source table attribute's type would be wise, too.
> > 
> 
> Yeah, this is pretty much what I meant by "functional" interface. But if
> I said maybe the format implemented by the patch is maybe too close to
> how we store the statistics, then this has exactly the same issue. And
> it has other issues too, I think - it breaks down the stats into
> multiple function calls, so ensuring the sanity/correctness of whole
> sets of statistics gets much harder, I think.

I was suggesting an SQL command because this feature is going to need a
lot of options and do a lot of different things, I am afraid, and a
single function might be too complex to manage.

> I'm not sure about the extension idea. Yes, we could have an extension
> providing such functions, but do we have any precedent of making
> pg_upgrade dependent on an external extension? I'd much rather have
> something built-in that just works, especially if we intend to make it
> the default behavior (which I think should be our aim here).

Uh, an extension seems nice to allow people in back branches to install
it, but not for normal usage.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_upgrade failing for 200+ million Large Objects
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: trying again to get incremental backup