Обсуждение: Re: POC: track vacuum/analyze cumulative time per relation
Hi, On Thu, Jan 02, 2025 at 12:24:06PM -0600, Sami Imseih wrote: > Having the total (auto)vacuum elapsed time > along side the existing (auto)vaccum_count > allows a user to track the average time an > operating overtime and to find vacuum tuning > opportunities. > > The same can also be said for (auto)analyze. I think that makes sense to expose those metrics through SQL as you're proposing here. The more we expose through SQL the better I think. > attached a patch ( without doc changes) > that adds 4 new columns: Thanks! A few random comments: === 1 + endtime = GetCurrentTimestamp(); pgstat_report_vacuum(RelationGetRelid(rel), rel->rd_rel->relisshared, Max(vacrel->new_live_tuples, 0), vacrel->recently_dead_tuples + - vacrel->missed_dead_tuples); + vacrel->missed_dead_tuples, + starttime, + endtime); pgstat_progress_end_command(); if (instrument) { - TimestampTz endtime = GetCurrentTimestamp(); What about keeping the endtime assignment after the pgstat_progress_end_command() call? I think that it makes more sense that way. === 2 pgstat_report_vacuum(RelationGetRelid(rel), rel->rd_rel->relisshared, Max(vacrel->new_live_tuples, 0), vacrel->recently_dead_tuples + - vacrel->missed_dead_tuples); + vacrel->missed_dead_tuples, + starttime, + endtime); What about doing the elapsedtime computation prior the this call and passed it as a parameter (then remove the starttime one and keep the endtime as it is needed)? === 3 pgstat_report_analyze(onerel, totalrows, totaldeadrows, - (va_cols == NIL)); + (va_cols == NIL), starttime, endtime); Same as 2 for pgstat_report_analyze() except that the endtime could be removed too. === 4 +/* pg_stat_get_vacuum_time */ +PG_STAT_GET_RELENTRY_INT64(total_vacuum_time) + +/* pg_stat_get_autovacuum_time */ +PG_STAT_GET_RELENTRY_INT64(total_autovacuum_time) + +/* pg_stat_get_analyze_time */ +PG_STAT_GET_RELENTRY_INT64(total_analyze_time) + +/* pg_stat_get_autoanalyze_time */ +PG_STAT_GET_RELENTRY_INT64(total_autoanalyze_time) I wonder if it wouldn't be better to use FLOAT8 here (to match things like pg_stat_get_checkpointer_write_time(), pg_stat_get_checkpointer_sync_time() among others). === 5 + + PgStat_Counter total_vacuum_time; /* user initiated vacuum */ + PgStat_Counter total_autovacuum_time; /* autovacuum initiated */ + PgStat_Counter total_analyze_time; /* user initiated vacuum */ + PgStat_Counter total_autoanalyze_time; /* autovacuum initiated */ Those comments look weird to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Jan 10, 2025 at 04:26:05PM -0600, Sami Imseih wrote: > { > /* time is already in msec, just convert to double for presentation */ > PG_RETURN_FLOAT8((double) > pgstat_fetch_stat_checkpointer()->write_time); > } > > > + > > + PgStat_Counter total_vacuum_time; /* user initiated vacuum */ > > + PgStat_Counter total_autovacuum_time; /* autovacuum initiated */ > > + PgStat_Counter total_analyze_time; /* user initiated vacuum */ > > + PgStat_Counter total_autoanalyze_time; /* autovacuum initiated */ > > > > Those comments look weird to me. > > > > Ok, Will remove these. > > I also updated the comments for the instrument code path to reflect the > fact starttime is now set for all cases.Also, added documentation. > > See the attached v2. Thanks for the patch update! A few random comments: === 1 +/* pg_stat_get_vacuum_time */ +PG_STAT_GET_RELENTRY_FLOAT8(total_vacuum_time) + +/* pg_stat_get_autovacuum_time */ +PG_STAT_GET_RELENTRY_FLOAT8(total_autovacuum_time) + +/* pg_stat_get_analyze_time */ +PG_STAT_GET_RELENTRY_FLOAT8(total_analyze_time) + +/* pg_stat_get_autoanalyze_time */ +PG_STAT_GET_RELENTRY_FLOAT8(total_autoanalyze_time) + The comments do not reflect the function names ("total" is missing to give pg_stat_get_total_vacuum_time() and such). === 2 +#define PG_STAT_GET_RELENTRY_FLOAT8(stat) +Datum +CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + int64 result; + PgStat_StatTabEntry *tabentry; + + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) + result = 0; + else + result = (float8) (tabentry->stat); + + PG_RETURN_FLOAT8(result); +} + /* pg_stat_get_analyze_count */ PG_STAT_GET_RELENTRY_INT64(analyze_count I think it's better to define the macro just before its first usage (meaning just after pg_stat_get_vacuum_count()): that would be consistent with the places the other macros are defined. === 3 + int64 result; s/int64/double/? === 4 + Total time this table has spent in manual vacuum + </para></entry> Mention the unit? === 5 + /* + * When verbose or autovacuum logging is used, initialize a resource usage + * snapshot and optionally track I/O timing. + */ if (instrument) { Out of curiosity, why this extra comment? To be somehow consistent with do_analyze_rel()? === 6 @@ -343,6 +349,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, pgstat_progress_start_command(PROGRESS_COMMAND_VACUUM, RelationGetRelid(rel)); + starttime = GetCurrentTimestamp(); I wonder if it wouldn't make more sense to put the GetCurrentTimestamp() call before the pgstat_progress_start_command() call. That would be aligned with the "endtime" being after the pgstat_progress_end_command() and where it was before the patch. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Jan 14, 2025 at 05:01:52PM -0600, Sami Imseih wrote: > Please see the attached v3. Thanks! A few comments: === 1 + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>total_vacuum_time</structfield> <type>bigint</type> + </para> Those new fields should be documented as "double precision". === 2 +#define PG_STAT_GET_RELENTRY_FLOAT8(stat) \ +Datum \ +CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS) \ +{ \ + Oid relid = PG_GETARG_OID(0); \ + float8 result; \ + PgStat_StatTabEntry *tabentry; \ + \ + if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL) \ + result = 0; \ + else \ + result = (float8) (tabentry->stat); \ + \ + PG_RETURN_FLOAT8(result); \ +} I did propose "double" up-thread to be consistent with the code around. That would mean to also cast to "double". That's just for consistency purpose. What do you think? Appart from the above that LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Jan 15, 2025 at 11:09:00AM -0600, Sami Imseih wrote: > > Appart from the above that LGTM. > > thanks! I took care of your comments. Thanks! > I was not sure > about the need to cast "double" but as you mention this > is consistent with other parts of pgstatfuncs.c Yup. > v4 attached. One comment: + PG_RETURN_FLOAT8(result); \ The "\" indentation looks wrong for this line (was ok in v3). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> I was referring to the order of the fields in the structure itself, > but that's no big deal one way or the other. I understand your point now. I will group them with the related counters in the next rev and will use > This should be one comment for the whole block, or this should use the > singular for each comment. I will use a singular "/* time in milliseconds */" comment for each new field. This existing write_time field is plural, but I will leave that one alone for now. PgStat_Counter write_time; /* times in milliseconds */ > On HEAD in the ANALYZE path, the endtime is calculated after > index_vacuum_cleanup(). Your patch calculates it before > index_vacuum_cleanup(), which would result in an incorrect calculation > if the index cleanup takes a long time with less logs generated, no? > Sorry for not noticing that earlier on the thread.. Perhaps it would > be just better to pass the start time to pgstat_report_vacuum() and > pgstat_report_analyze() then let their internals do the elapsed time > calculations. Consistency of the arguments for both functions is > something worth having, IMO, even if it means a bit more > GetCurrentTimestamp() in this case. So currently, the report of the last_(autoanalyze|analyze)_time is set before the index_vacuum_cleanup, but for logging purposes the elapsed time is calculated afterwards. Most users will not notice this, but I think that is wrong as well. I think we should calculate endtime and elapsedtime and call pgstat_report_analyze after the index_vacuum_cleanup; and before vac_close_indexes. This is more accurate and will avoid incurring the extra GetCurrentTimestamp() call. What do you think? Regards, Sami