Re: [PROPOSAL] VACUUM Progress Checker.
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: [PROPOSAL] VACUUM Progress Checker. |
Дата | |
Msg-id | 20151210.163005.29167515.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: [PROPOSAL] VACUUM Progress Checker. (Michael Paquier <michael.paquier@gmail.com>) |
Список | pgsql-hackers |
Hello, At Thu, 10 Dec 2015 15:28:14 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqRNw=w4mt-W+gtq0ED0KTR=B8Qgu6D+4BN3XmzFRuAgXQ@mail.gmail.com> > On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> Hm, I guess progress messages would not change that much over the course > >> of a long-running command. How about we pass only the array(s) that we > >> change and NULL for others: > >> > >> With the following prototype for pgstat_report_progress: > >> > >> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param, > >> bool *message_param[], int num_message_param); > >> > >> If we just wanted to change, say scanned_heap_pages, then we report that > >> using: > >> > >> uint32_param[1] = scanned_heap_pages; > >> pgstat_report_progress(uint32_param, 3, NULL, 0); > >> > >> Also, pgstat_report_progress() should check each of its parameters for > >> NULL before iterating over to copy. So in most reports over the course of > >> a command, the message_param would be NULL and hence not copied. > > > > It's going to be *really* important that this facility provides a > > lightweight way of updating progress, so I think this whole API is > > badly designed. VACUUM, for example, is going to want a way to update > > the individual counter for the number of pages it's scanned after > > every page. It should not have to pass all of the other information > > that is part of a complete report. It should just be able to say > > pgstat_report_progress_update_counter(1, pages_scanned) or something > > of this sort. Don't marshal all of the data and then make > > pgstat_report_progress figure out what's changed. Provide a series of > > narrow APIs where the caller tells you exactly what they want to > > update, and you update only that. It's fine to have a > > pgstat_report_progress() function to update everything at once, for > > the use at the beginning of a command, but the incremental updates > > within the command should do something lighter-weight. > > [first time looking really at the patch and catching up with this thread] > > Agreed. As far as I can guess from the code, those should be as > followed to bloat pgstat message queue a minimum: > > + pgstat_report_activity_flag(ACTIVITY_IS_VACUUM); > /* > * Loop to process each selected relation. > */ > Defining a new routine for this purpose is a bit surprising. Cannot we > just use pgstat_report_activity with a new BackendState STATE_INVACUUM > or similar if we pursue the progress tracking approach? The author might want to know vacuum status *after* it has been finished. But it is reset just after the end of a vacuum. One concern is that BackendState adds new value for pg_stat_activiry.state and it might confuse someone using it but I don't see other issue on it. > A couple of comments: > - The relation OID should be reported and not its name. In case of a > relation rename that would not be cool for tracking, and most users > are surely going to join with other system tables using it. +1 > - The progress tracking facility adds a whole level of complexity for > very little gain, and IMO this should *not* be part of PgBackendStatus > because in most cases its data finishes wasted. We don't expect > backends to run frequently such progress reports, do we? I strongly thought the same thing but I haven't came up with better place for it to be stored. but, > My opinion on > the matter if that we should define a different collector data for > vacuum, with something like PgStat_StatVacuumEntry, then have on top > of it a couple of routines dedicated at feeding up data with it when > some work is done on a vacuum job. +many. But I can't guess the appropriate number of the entry of it, or suitable replacing policy on excesive number of vacuums. Although sane users won't run vacuum on so many backends. > In short, it seems to me that this patch needs a rework, and should be > returned with feedback. Other opinions? This is important feature for DBAs so I agree with you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: